Register Dispersion: Reducing the Footprint of the Vector Register File in Vector Engines of Low-Cost RISC-V CPUs
Abstract
The large area consumption of the Vector Register File (VRF) in RISC-V vector engines hinders their deployment in low-cost CPUs designed for edge Machine Learning acceleration. This paper introduces "Register Dispersion," an ISA-compliant technique that employs a physically smaller, cache-like VRF to dynamically store only the actively used vector registers. By mapping architectural registers to a compact physical set and offloading inactive data to memory, this method yields substantial area and power savings with negligible performance impact, making vector unit inclusion feasible.
Report
Key Highlights
- Problem Solved: Addresses the critical challenge of the Vector Register File (VRF) becoming a major area consumer in ultra low-cost RISC-V processors due to the required 32 architectural vector registers.
- Key Innovation: Proposes "Register Dispersion," a mechanism that utilizes a physically smaller VRF that operates as a dynamic cache for vector registers.
- Efficiency Gain: The technique exploits the insight that many vectorized Machine Learning (ML) kernels use only a small subset of the total available vector registers simultaneously.
- Result: Demonstrates substantial area and power savings compared to using a full-size VRF, while maintaining minimal impact on overall performance.
Technical Details
- Target Application: Efficient ML processing and data-level parallelism on resource-constrained, low-cost RISC-V devices.
- Compliance: The proposed system remains fully compliant with the RISC-V Vector Instruction Set Architecture (ISA), which mandates 32 architectural vector registers.
- Architecture: Register Dispersion maps the 32 architectural vector registers to a significantly smaller physical register set.
- Mechanism: The compact VRF functions like a conventional hardware cache, holding the most recently accessed vector registers. Vector registers that are not currently active are spilled or offloaded to the standard cache/memory sub-system.
Implications
- Enabling Technology: This method makes the integration of vector units—essential for high-throughput ML acceleration—both feasible and practical in the highly area-constrained segment of low-cost edge CPUs.
- RISC-V Ecosystem: Reduces a primary hardware constraint (VRF footprint) associated with adopting the RISC-V Vector extension (RVV), potentially accelerating RVV adoption in the IoT and embedded computing space.
- Power Efficiency: The area and power reduction achieved by the smaller physical register file directly contributes to better energy efficiency for edge ML applications.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.