A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
Abstract
This work introduces SPEED, a scalable RISC-V vector (RVV) processor architecture designed to efficiently execute multi-precision deep neural network (DNN) inference, addressing limitations in precision support and dataflow of existing RVV designs. SPEED incorporates customized RVV instructions for fine-grained 4-to-16-bit precision control and utilizes a parameterized multi-precision systolic array unit for enhanced parallelism. Synthesized in 28nm technology, SPEED achieves 1335.79 GOPS/W at 4-bit precision and significantly improves area efficiency by up to 2.04x compared to the open-source Ara vector processor.
Report
Key Highlights
- Processor Name: SPEED (Scalable RISC-V Vector Processor).
- Target Application: Efficient multi-precision Deep Neural Network (DNN) inference.
- Performance Metric (Peak): Achieves 287.41 GOPS peak throughput and 1335.79 GOPS/W energy efficiency at 4-bit precision.
- Efficiency Improvement: Provides a significant area efficiency improvement of 2.04$\times$ (16-bit) and 1.63$\times$ (8-bit) compared to the pioneer open-source vector processor, Ara.
- Methodology: Innovation driven by customized instructions, specialized hardware architecture, and optimized dataflow mapping.
Technical Details
- Architecture Base: RISC-V Vector (RVV) extensions.
- Precision Support: Dedicated customized RISC-V instructions enable fine-grained control over processing precision, ranging from 4 bits to 16 bits.
- Hardware Accelerator: Incorporates a parameterized multi-precision systolic array unit within the scalable module to maximize parallel processing and data reuse.
- Dataflow Strategy: A mixed multi-precision dataflow strategy is proposed, ensuring compatibility with different convolution kernels and data precision requirements, thus improving data utilization and computational efficiency.
- Implementation: The design was synthesized using TSMC 28nm technology.
Implications
- Enabling Edge AI: By achieving high energy and area efficiency with flexible multi-precision support (4-16 bit), SPEED directly addresses the critical needs of resource-constrained edge computing devices performing DNN inference.
- Extending RISC-V ISA: The introduction of dedicated customized instructions based on the RVV extension demonstrates a viable and high-performing method for extending the RISC-V instruction set architecture for specialized AI acceleration.
- Benchmarking Advancement: Establishing a 2.04x area efficiency advantage over existing open-source RVV implementations (like Ara) raises the performance bar for commercial and academic RISC-V hardware dedicated to vector and matrix operations, accelerating ecosystem maturity in the AI domain.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.