Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
Abstract
Spatz is a novel, compact 64-bit floating-point RISC-V vector processor utilizing the Zve64d extension, designed to maximize computing efficiency in clustered architectures. The architecture achieves peak energy efficiency by employing a highly compact, latch-based 2 KiB Vector Register File (VRF), significantly smaller than typical implementations. Implemented in 12LPP technology, the Spatz-based cluster achieves an outstanding area/energy efficiency of 171 DP-GFLOPS/W/mm² and provides 30% higher energy efficiency than specialized scalar clusters.
Report
Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
Key Highlights
- Compact Vector Unit: Spatz is a compact 64-bit Floating-Point (FP) capable vector processor built upon the RISC-V Vector Extension (Zve64d).
- Minimal VRF: It achieves peak energy efficiency using a novel, latch-based Vector Register File (VRF) of only 2 KiB, contrasting sharply with traditional VRFs that are hundreds of KiB large.
- Clustered Architecture: The design uses Spatz as the main Processing Element (PE) within a modular, scalable dual-core cluster that shares a Scratchpad Memory (SCM).
- Superior Efficiency: The implemented cluster achieves an outstanding area/energy efficiency of 171 DP-GFLOPS/W/mm².
- Comparative Advantage: The Spatz-based cluster provides 30% higher energy efficiency compared to an equi-area cluster constructed using specialized scalar cores for stream-based FP computation.
Technical Details
- Base Architecture: RISC-V Vector Extension Zve64d (64-bit floating-point capable).
- Processing Element (PE): Spatz, a dual-core architecture utilizing a shared SCM.
- VRF Design: Latch-based design with only 2 KiB capacity, critical for power optimization.
- Fabrication: Implemented in GlobalFoundries' 12LPP process technology.
- Configuration: The evaluated cluster contains eight double-precision Floating Point Units (FPUs).
- Performance Metrics (1 GHz, Nominal Conditions):
- Matrix Multiplication: Achieves 7.7 FMA/cycle (15.7 DP-GFLOPS) and 95.7 DP-GFLOPS/W. FPU utilization is 96.6% (only 3.4% below the theoretical ideal).
- 2D Workloads (7x7 kernel): Achieves 95.0% FPU utilization (7.6 FMA/cycle) and 99.3 DP-GFLOPS/W.
- Power Distribution: Over 55% of the total power consumption is spent directly on the highly utilized FPUs.
Implications
- Mitigating Scaling Challenges: By focusing on ultra-compact register files and efficient clustering, Spatz offers a viable architectural solution to the challenges posed by slowing technology scaling and increasing application computational demands.
- RISC-V Vector Adoption: The success of Spatz validates the RISC-V Zve64d vector extension as a basis for building highly area- and power-efficient accelerators, encouraging broader adoption in the HPC and edge computing domains.
- Architectural Paradigm Shift: This work demonstrates that high computational throughput and efficiency do not necessarily require large, power-hungry Vector Register Files (VRFs). The 2 KiB VRF challenges conventional vector processor design philosophy.
- AI/GPGPU Relevance: The high utilization efficiency observed in both matrix multiplication and 2D kernel workloads suggests Spatz is highly suitable for demanding modern applications like AI inference, signal processing, and specialized scientific accelerators, especially where power budgets are constrained.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.