Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference
Abstract
Sparq is a custom RISC-V vector processor designed specifically to accelerate highly efficient sub-byte Quantized Neural Network (QNN) inference. This architecture modifies the open-source Ara core by removing the Floating-Point Unit and adding a specialized multiply-shift-accumulate instruction tailored for low-precision operations. Implemented in 22FDX technology, Sparq demonstrates significant performance gains, achieving 3.2 times faster computation for 2-bit quantization compared to optimized 16-bit convolution.
Report
Key Highlights
- Targeted Acceleration: Sparq is a custom Sub-byte vector Processor (QNN) designed to efficiently handle ultra-low-precision inference (1-bit to 4-bit).
- RISC-V Base: The processor is a modified version of Ara, an open-source 64-bit RISC-V 'V' compliant processor.
- ISA Extension: A crucial new instruction—a multiply-shift-accumulate instruction—was added to the Instruction Set Architecture (ISA) specifically to improve sub-byte computation efficiency.
- Performance: Achieves substantial acceleration benchmarks for vectorized
conv2doperations: 3.2 times faster for 2-bit quantization and 1.7 times faster for 4-bit quantization, compared to optimized 16-bit 2D convolution. - Area/Power Optimization: The standard floating-point unit (FPU) was removed to minimize chip area and power consumption, aligning with the needs of low-power inference.
Technical Details
- Processor Name: Sparq (Sub-byte vector Processor designed for the AcceleRation of QNN inference).
- Architecture Modification: Based on the open-source Ara RISC-V vector processor, tailored for efficient sub-byte data path processing.
- Core Feature: Introduction of a custom multiply-shift-accumulate instruction within the ISA extensions to optimize the core operation of sub-byte QNNs.
- Precision Focus: Optimization targets the 1-bit to 4-bit precision range, where commodity hardware struggles.
- Fabrication Technology: Implemented using GLOBAL FOUNDRIES 22FDX FD-SOI technology.
- Demonstrated Workload: Focuses on accelerating the vectorized
conv2doperation, a fundamental component of CNNs.
Implications
- Validating RISC-V Customization: Sparq serves as a strong case study illustrating the power of the RISC-V ecosystem, where specific application needs (like extreme low-precision AI) can be met through targeted ISA extensions and hardware modifications.
- Enabling Edge AI: By achieving high efficiency and low power usage (due to FPU removal and sub-byte optimization), Sparq helps pave the way for deploying highly accurate, ultra-low-power QNNs on resource-constrained edge devices and IoT systems.
- Driving QNN Adoption: The demonstrated performance uplift (up to 3.2x) removes a major bottleneck for sub-byte quantization, encouraging wider adoption of these memory- and computation-saving techniques over full-precision models.
- Open-Source Collaboration: Utilizing the open-source Ara core base promotes community innovation and reduces the barrier to entry for developing highly specialized AI acceleration hardware.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.