Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

Abstract

Quark is an integer RISC-V vector processor specifically tailored for highly efficient sub-byte quantized Deep Neural Network (DNN) inference. Built upon the open-source Ara processor, Quark achieves significant area and power savings by removing the Floating-Point Unit (FPU) and adding specialized vector instructions for sub-byte operations. This design results in vector lanes that are 2 times smaller and 1.9 times more power efficient than Ara, demonstrating strong acceleration for 1-bit and 2-bit quantized convolutional models.

Report

Key Highlights

  • Specialized Accelerator: Quark is a RISC-V vector processor optimized for sub-byte quantized DNN inference (e.g., 1-bit and 2-bit precision).
  • Efficiency Gains: The architectural modifications yield a 2x reduction in size and 1.9x improvement in power efficiency for the vector lanes compared to the baseline Ara processor.
  • Sub-Byte Focus: The design proves effective in accelerating computation for highly quantized models, specifically demonstrated using Conv2d.
  • Implementation: The processor is fabricated using GlobalFoundries' 22FDX FD-SOI technology.

Technical Details

  • Base Architecture: Quark extends and modifies Ara, an open-source 64-bit RISC-V vector processor.
  • Customization: Specialized vector instructions were added to handle sub-byte quantized operations efficiently.
  • FPU Removal: The floating-point unit was removed from Quark's vector lanes, contributing significantly to the reduction in physical size and power consumption.
  • Scalar Core Role: Re-scaling operations, necessary for quantized neural network inference, are handled by the integrated CVA6 RISC-V scalar core.

Implications

  • Validation of RISC-V Specialization: Quark demonstrates the agility and extensibility of the RISC-V instruction set architecture (ISA) for creating highly optimized, domain-specific accelerators, moving beyond general-purpose computing.
  • Advancing Edge AI: The substantial gains in power efficiency and area density (2x smaller) are crucial for edge and embedded AI applications where resources are severely constrained, enabling complex DNNs to run on low-power devices.
  • Supporting Aggressive Quantization: By providing specific hardware support for sub-byte operations (1-bit, 2-bit), Quark validates the growing trend of ultra-low-precision quantization as a viable strategy for model deployment without requiring complex software workarounds.
  • Open-Source Contribution: Building upon open-source components like Ara and CVA6 highlights a successful pathway for the rapid iteration and improvement of complex semiconductor designs within the RISC-V community.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →