A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

Abstract

This work introduces SPEED, a scalable RISC-V vector (RVV) processor architecture designed to efficiently execute multi-precision deep neural network (DNN) inference, addressing limitations in precision support and dataflow of existing RVV designs. SPEED incorporates customized RVV instructions for fine-grained 4-to-16-bit precision control and utilizes a parameterized multi-precision systolic array unit for enhanced parallelism. Synthesized in 28nm technology, SPEED achieves 1335.79 GOPS/W at 4-bit precision and significantly improves area efficiency by up to 2.04x compared to the open-source Ara vector processor.

Report

Key Highlights

  • Processor Name: SPEED (Scalable RISC-V Vector Processor).
  • Target Application: Efficient multi-precision Deep Neural Network (DNN) inference.
  • Performance Metric (Peak): Achieves 287.41 GOPS peak throughput and 1335.79 GOPS/W energy efficiency at 4-bit precision.
  • Efficiency Improvement: Provides a significant area efficiency improvement of 2.04$\times$ (16-bit) and 1.63$\times$ (8-bit) compared to the pioneer open-source vector processor, Ara.
  • Methodology: Innovation driven by customized instructions, specialized hardware architecture, and optimized dataflow mapping.

Technical Details

  • Architecture Base: RISC-V Vector (RVV) extensions.
  • Precision Support: Dedicated customized RISC-V instructions enable fine-grained control over processing precision, ranging from 4 bits to 16 bits.
  • Hardware Accelerator: Incorporates a parameterized multi-precision systolic array unit within the scalable module to maximize parallel processing and data reuse.
  • Dataflow Strategy: A mixed multi-precision dataflow strategy is proposed, ensuring compatibility with different convolution kernels and data precision requirements, thus improving data utilization and computational efficiency.
  • Implementation: The design was synthesized using TSMC 28nm technology.

Implications

  • Enabling Edge AI: By achieving high energy and area efficiency with flexible multi-precision support (4-16 bit), SPEED directly addresses the critical needs of resource-constrained edge computing devices performing DNN inference.
  • Extending RISC-V ISA: The introduction of dedicated customized instructions based on the RVV extension demonstrates a viable and high-performing method for extending the RISC-V instruction set architecture for specialized AI acceleration.
  • Benchmarking Advancement: Establishing a 2.04x area efficiency advantage over existing open-source RVV implementations (like Ara) raises the performance bar for commercial and academic RISC-V hardware dedicated to vector and matrix operations, accelerating ecosystem maturity in the AI domain.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →