SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

Abstract

SPEED is a scalable RISC-V Vector (RVV) processor designed to enable highly efficient multi-precision Deep Neural Network (MP-DNN) inference on resource-constrained edge platforms. It introduces dedicated customized RVV instructions and a parameterized multi-precision tensor unit supporting precision from 4-bit to 16-bit with minimized hardware overhead. Experimental results show SPEED achieves a peak throughput of 737.9 GOPS for 4-bit operations and exhibits superior area efficiency compared to prior RVV processors.

Report

Key Highlights

  • Target: SPEED is a scalable RISC-V Vector (RVV) processor optimized specifically for efficient Multi-Precision DNN (MP-DNN) inference on edge platforms.
  • Performance Metrics (4-bit): Achieves a peak throughput of 737.9 GOPS and high energy efficiency of 1383.4 GOPS/W for 4-bit operators.
  • Area Efficiency: Demonstrates significantly superior area efficiency compared to previous RVV processors, showing enhancements of 5.9x to 26.9x for 8-bit operations and 8.2x to 18.5x for best integer performance.
  • Precision Range: Supports computation precision ranging widely from 4-bit to 16-bit with minimal hardware increase.

Technical Details

  • Instruction Set Innovation: Dedicated customized RISC-V instructions are introduced, built upon RVV extensions, specifically designed to reduce instruction complexity and support multi-precision processing (4-bit to 16-bit).
  • Parallelism Enhancement: A parameterized multi-precision tensor unit is developed and integrated into the scalable module. This unit provides reconfigurable parallelism to optimally match the varying computation patterns required by diverse MP-DNNs.
  • Dataflow Optimization: A flexible mixed dataflow method is adopted to dynamically improve both computational and energy efficiency based on the specific computing patterns of different DNN operators.
  • Implementation: The processor was synthesized using TSMC 28nm technology.

Implications

  • RISC-V Acceleration: SPEED significantly enhances the RISC-V ecosystem's capability in the specialized field of AI acceleration, validating the extensibility of the instruction set architecture (ISA) for domain-specific tasks.
  • Edge AI Deployment: By efficiently tackling the complexity of multi-precision quantization, SPEED makes deploying advanced, highly-quantized DNNs viable even on severely resource-constrained edge devices.
  • Competitive Advantage: The demonstrated superior area and energy efficiency positions RISC-V-based processors, such as SPEED, as strong competitive alternatives to proprietary architectures in the high-growth market of energy-efficient AI inference.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →