Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Abstract

Sparq is a custom RISC-V vector processor designed specifically to accelerate highly efficient sub-byte Quantized Neural Network (QNN) inference. This architecture modifies the open-source Ara core by removing the Floating-Point Unit and adding a specialized multiply-shift-accumulate instruction tailored for low-precision operations. Implemented in 22FDX technology, Sparq demonstrates significant performance gains, achieving 3.2 times faster computation for 2-bit quantization compared to optimized 16-bit convolution.

Report

Key Highlights

  • Targeted Acceleration: Sparq is a custom Sub-byte vector Processor (QNN) designed to efficiently handle ultra-low-precision inference (1-bit to 4-bit).
  • RISC-V Base: The processor is a modified version of Ara, an open-source 64-bit RISC-V 'V' compliant processor.
  • ISA Extension: A crucial new instruction—a multiply-shift-accumulate instruction—was added to the Instruction Set Architecture (ISA) specifically to improve sub-byte computation efficiency.
  • Performance: Achieves substantial acceleration benchmarks for vectorized conv2d operations: 3.2 times faster for 2-bit quantization and 1.7 times faster for 4-bit quantization, compared to optimized 16-bit 2D convolution.
  • Area/Power Optimization: The standard floating-point unit (FPU) was removed to minimize chip area and power consumption, aligning with the needs of low-power inference.

Technical Details

  • Processor Name: Sparq (Sub-byte vector Processor designed for the AcceleRation of QNN inference).
  • Architecture Modification: Based on the open-source Ara RISC-V vector processor, tailored for efficient sub-byte data path processing.
  • Core Feature: Introduction of a custom multiply-shift-accumulate instruction within the ISA extensions to optimize the core operation of sub-byte QNNs.
  • Precision Focus: Optimization targets the 1-bit to 4-bit precision range, where commodity hardware struggles.
  • Fabrication Technology: Implemented using GLOBAL FOUNDRIES 22FDX FD-SOI technology.
  • Demonstrated Workload: Focuses on accelerating the vectorized conv2d operation, a fundamental component of CNNs.

Implications

  • Validating RISC-V Customization: Sparq serves as a strong case study illustrating the power of the RISC-V ecosystem, where specific application needs (like extreme low-precision AI) can be met through targeted ISA extensions and hardware modifications.
  • Enabling Edge AI: By achieving high efficiency and low power usage (due to FPU removal and sub-byte optimization), Sparq helps pave the way for deploying highly accurate, ultra-low-power QNNs on resource-constrained edge devices and IoT systems.
  • Driving QNN Adoption: The demonstrated performance uplift (up to 3.2x) removes a major bottleneck for sub-byte quantization, encouraging wider adoption of these memory- and computation-saving techniques over full-precision models.
  • Open-Source Collaboration: Utilizing the open-source Ara core base promotes community innovation and reduces the barrier to entry for developing highly specialized AI acceleration hardware.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →