Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations
Abstract
This work introduces a novel hardware-software co-design framework featuring specialized Instruction Set Architecture (ISA) extensions and micro-architectural optimizations to efficiently execute mixed-precision neural networks on RISC-V cores. The design includes an expanded ALU supporting configurable arithmetic, multi-pumping for reduced latency, and three distinct ISA-level MAC instructions targeting mixed-precision operations. The implementation achieves significant energy savings, demonstrating an average 15x energy reduction with less than 1% accuracy loss over state-of-the-art RISC-V cores for DNN inference on standard datasets.
Report
Key Highlights
- Significant Energy Reduction: The proposed framework achieves an average 15x energy reduction for DNN inference compared to ISA-agnostic state-of-the-art RISC-V cores.
- Accuracy Preservation: This massive energy saving is accomplished while maintaining an accuracy loss below 1%.
- Hardware-Software Co-Design: The solution integrates hardware customizations (ALU, multi-pumping) with compiler-exposed ISA extensions, tailored specifically for mixed-precision quantization.
- ISA Extension Focus: Three new Multiply-Accumulate (MAC) instructions were encoded, extending the RISC-V ISA to eliminate performance bottlenecks caused by data packing/unpacking.
Technical Details
- Target Architecture: Leading RISC-V CPU architectures.
- Design Methodology: A hardware-software co-design framework implemented and evaluated via cycle-accurate emulations.
- Hardware Enhancements:
- ALU Expansion: The Arithmetic Logic Unit (ALU) is expanded to support configurable fine-grained mixed-precision arithmetic operations.
- Multi-Pumping: Implemented to minimize execution latency of the arithmetic units.
- Soft SIMD: An additional optimization applied specifically for executing 2-bit operations efficiently.
- ISA Extensions: Three distinct MAC instructions are introduced, corresponding to different mixed-precision operational modes, which are exposed directly up to the compiler level.
- Evaluation: Tested using widely used Deep Neural Networks (DNNs) and datasets, including CIFAR10 and ImageNet.
Implications
- Enabling Edge AI: This work significantly advances the feasibility of deploying highly energy-efficient and accurate mixed-precision AI models directly on embedded RISC-V microprocessors, which are critical for edge computing devices.
- Addressing Bottlenecks: By offering native ISA support for mixed-precision operations and incorporating hardware optimizations (like multi-pumping and soft SIMD), the framework overcomes traditional performance bottlenecks associated with low-precision arithmetic on general-purpose CPUs.
- RISC-V Customization Potential: It strongly validates the utility of RISC-V's extensible nature (Custom ISA extensions) for domain-specific acceleration, offering specialized performance gains that are competitive against cores with more complex vector or accelerator units.
- Industry Standard Performance: The 15x energy efficiency improvement sets a new benchmark for low-power DNN inference acceleration within the RISC-V ecosystem.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.