Research

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Admin

0 views • 3 years ago (Updated) • 2 min read •

•

Abstract

Sparq is a custom RISC-V vector processor designed specifically to accelerate highly efficient sub-byte Quantized Neural Network (QNN) inference. This architecture modifies the open-source Ara core by removing the Floating-Point Unit and adding a specialized multiply-shift-accumulate instruction tailored for low-precision operations. Implemented in 22FDX technology, Sparq demonstrates significant performance gains, achieving 3.2 times faster computation for 2-bit quantization compared to optimized 16-bit convolution.

Report

Key Highlights

Targeted Acceleration: Sparq is a custom Sub-byte vector Processor (QNN) designed to efficiently handle ultra-low-precision inference (1-bit to 4-bit).
RISC-V Base: The processor is a modified version of Ara, an open-source 64-bit RISC-V 'V' compliant processor.
ISA Extension: A crucial new instruction—a multiply-shift-accumulate instruction—was added to the Instruction Set Architecture (ISA) specifically to improve sub-byte computation efficiency.
Performance: Achieves substantial acceleration benchmarks for vectorized conv2d operations: 3.2 times faster for 2-bit quantization and 1.7 times faster for 4-bit quantization, compared to optimized 16-bit 2D convolution.
Area/Power Optimization: The standard floating-point unit (FPU) was removed to minimize chip area and power consumption, aligning with the needs of low-power inference.

Technical Details

Processor Name: Sparq (Sub-byte vector Processor designed for the AcceleRation of QNN inference).
Architecture Modification: Based on the open-source Ara RISC-V vector processor, tailored for efficient sub-byte data path processing.
Core Feature: Introduction of a custom multiply-shift-accumulate instruction within the ISA extensions to optimize the core operation of sub-byte QNNs.
Precision Focus: Optimization targets the 1-bit to 4-bit precision range, where commodity hardware struggles.
Fabrication Technology: Implemented using GLOBAL FOUNDRIES 22FDX FD-SOI technology.
Demonstrated Workload: Focuses on accelerating the vectorized conv2d operation, a fundamental component of CNNs.

Implications

Validating RISC-V Customization: Sparq serves as a strong case study illustrating the power of the RISC-V ecosystem, where specific application needs (like extreme low-precision AI) can be met through targeted ISA extensions and hardware modifications.
Enabling Edge AI: By achieving high efficiency and low power usage (due to FPU removal and sub-byte optimization), Sparq helps pave the way for deploying highly accurate, ultra-low-power QNNs on resource-constrained edge devices and IoT systems.
Driving QNN Adoption: The demonstrated performance uplift (up to 3.2x) removes a major bottleneck for sub-byte quantization, encouraging wider adoption of these memory- and computation-saving techniques over full-precision models.
Open-Source Collaboration: Utilizing the open-source Ara core base promotes community innovation and reduces the barrier to entry for developing highly specialized AI acceleration hardware.

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →