Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations

Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations

Abstract

This work introduces a novel hardware-software co-design framework featuring specialized Instruction Set Architecture (ISA) extensions and micro-architectural optimizations to efficiently execute mixed-precision neural networks on RISC-V cores. The design includes an expanded ALU supporting configurable arithmetic, multi-pumping for reduced latency, and three distinct ISA-level MAC instructions targeting mixed-precision operations. The implementation achieves significant energy savings, demonstrating an average 15x energy reduction with less than 1% accuracy loss over state-of-the-art RISC-V cores for DNN inference on standard datasets.

Report

Key Highlights

  • Significant Energy Reduction: The proposed framework achieves an average 15x energy reduction for DNN inference compared to ISA-agnostic state-of-the-art RISC-V cores.
  • Accuracy Preservation: This massive energy saving is accomplished while maintaining an accuracy loss below 1%.
  • Hardware-Software Co-Design: The solution integrates hardware customizations (ALU, multi-pumping) with compiler-exposed ISA extensions, tailored specifically for mixed-precision quantization.
  • ISA Extension Focus: Three new Multiply-Accumulate (MAC) instructions were encoded, extending the RISC-V ISA to eliminate performance bottlenecks caused by data packing/unpacking.

Technical Details

  • Target Architecture: Leading RISC-V CPU architectures.
  • Design Methodology: A hardware-software co-design framework implemented and evaluated via cycle-accurate emulations.
  • Hardware Enhancements:
    • ALU Expansion: The Arithmetic Logic Unit (ALU) is expanded to support configurable fine-grained mixed-precision arithmetic operations.
    • Multi-Pumping: Implemented to minimize execution latency of the arithmetic units.
    • Soft SIMD: An additional optimization applied specifically for executing 2-bit operations efficiently.
  • ISA Extensions: Three distinct MAC instructions are introduced, corresponding to different mixed-precision operational modes, which are exposed directly up to the compiler level.
  • Evaluation: Tested using widely used Deep Neural Networks (DNNs) and datasets, including CIFAR10 and ImageNet.

Implications

  • Enabling Edge AI: This work significantly advances the feasibility of deploying highly energy-efficient and accurate mixed-precision AI models directly on embedded RISC-V microprocessors, which are critical for edge computing devices.
  • Addressing Bottlenecks: By offering native ISA support for mixed-precision operations and incorporating hardware optimizations (like multi-pumping and soft SIMD), the framework overcomes traditional performance bottlenecks associated with low-precision arithmetic on general-purpose CPUs.
  • RISC-V Customization Potential: It strongly validates the utility of RISC-V's extensible nature (Custom ISA extensions) for domain-specific acceleration, offering specialized performance gains that are competitive against cores with more complex vector or accelerator units.
  • Industry Standard Performance: The 15x energy efficiency improvement sets a new benchmark for low-power DNN inference acceleration within the RISC-V ecosystem.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →