SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
Abstract
SPEED is a scalable RISC-V Vector (RVV) processor designed to enable highly efficient multi-precision Deep Neural Network (MP-DNN) inference on resource-constrained edge platforms. It introduces dedicated customized RVV instructions and a parameterized multi-precision tensor unit supporting precision from 4-bit to 16-bit with minimized hardware overhead. Experimental results show SPEED achieves a peak throughput of 737.9 GOPS for 4-bit operations and exhibits superior area efficiency compared to prior RVV processors.
Report
Key Highlights
- Target: SPEED is a scalable RISC-V Vector (RVV) processor optimized specifically for efficient Multi-Precision DNN (MP-DNN) inference on edge platforms.
- Performance Metrics (4-bit): Achieves a peak throughput of 737.9 GOPS and high energy efficiency of 1383.4 GOPS/W for 4-bit operators.
- Area Efficiency: Demonstrates significantly superior area efficiency compared to previous RVV processors, showing enhancements of 5.9x to 26.9x for 8-bit operations and 8.2x to 18.5x for best integer performance.
- Precision Range: Supports computation precision ranging widely from 4-bit to 16-bit with minimal hardware increase.
Technical Details
- Instruction Set Innovation: Dedicated customized RISC-V instructions are introduced, built upon RVV extensions, specifically designed to reduce instruction complexity and support multi-precision processing (4-bit to 16-bit).
- Parallelism Enhancement: A parameterized multi-precision tensor unit is developed and integrated into the scalable module. This unit provides reconfigurable parallelism to optimally match the varying computation patterns required by diverse MP-DNNs.
- Dataflow Optimization: A flexible mixed dataflow method is adopted to dynamically improve both computational and energy efficiency based on the specific computing patterns of different DNN operators.
- Implementation: The processor was synthesized using TSMC 28nm technology.
Implications
- RISC-V Acceleration: SPEED significantly enhances the RISC-V ecosystem's capability in the specialized field of AI acceleration, validating the extensibility of the instruction set architecture (ISA) for domain-specific tasks.
- Edge AI Deployment: By efficiently tackling the complexity of multi-precision quantization, SPEED makes deploying advanced, highly-quantized DNNs viable even on severely resource-constrained edge devices.
- Competitive Advantage: The demonstrated superior area and energy efficiency positions RISC-V-based processors, such as SPEED, as strong competitive alternatives to proprietary architectures in the high-growth market of energy-efficient AI inference.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.