Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI
Abstract
Ara is a highly scalable and energy-efficient 64-bit vector processor based on the RISC-V vector extension (v0.5 draft), realized in 22 nm FD-SOI technology. Its lane-based microarchitecture achieves clock speeds exceeding 1 GHz and delivers up to 33 DP-GFLOPS of performance. Crucially, Ara achieves a leading energy efficiency of up to 41 DP-GFLOPS/W, surpassing similar vector processors documented in literature.
Report
Ara Processor Analysis Report
Key Highlights
- High Performance and Frequency: The processor operates at more than 1 GHz in the typical corner (TT/0.80V/25 °C).
- Leading Energy Efficiency: Ara achieves up to 41 DP-GFLOPS/W, which is noted as slightly superior to comparable vector processors in published literature.
- Scalable Architecture: The microarchitecture is highly scalable, built from a set of identical lanes, each containing parts of the vector register file and functional units.
- High Utilization: Achieved near-peak FPU utilization (up to 97%) during large double-precision matrix multiplication kernels (256 x 256) utilizing sixteen lanes.
- Technology Node: Implemented in GlobalFoundries 22FDX FD-SOI technology.
Technical Details
- ISA Implementation: Based on the 64-bit RISC-V vector extension (version 0.5 draft).
- Performance Metrics: Achieves peak performance of 33 DP-GFLOPS (Double Precision Giga Floating-point Operations Per Second).
- Microarchitecture: Utilizes a banked, lane-based approach where scalability is achieved by replicating identical processing lanes.
- Workload Analysis: Performance and bottleneck analysis were conducted using various vectorizable linear algebra computation kernels, including studies on performance limitations for small matrix sizes.
- Voltage/Temperature Corner: Operating metrics (1 GHz, 33 DP-GFLOPS) are quoted for the typical corner (TT/0.80V/25 °C).
Implications
- Validation of RISC-V Vector Standard: Ara demonstrates a successful, high-performance physical implementation of the nascent RISC-V vector extension (v0.5), bolstering confidence in the ISA's suitability for high-throughput computing.
- HPC and AI Acceleration: By demonstrating high efficiency (41 DP-GFLOPS/W) and high DP-GFLOPS throughput, Ara proves RISC-V's viability as a competitive, energy-efficient alternative to established architectures for use in HPC, data centers, and parallel processing tasks.
- FD-SOI Process Utilization: The successful implementation in 22 nm FD-SOI technology highlights the viability of this process node for developing advanced, high-speed, and low-power parallel processors.
- Future Architectural Guidance: The paper provides critical architectural insights into performance bottlenecks, particularly concerning suboptimal utilization on smaller matrix sizes, guiding subsequent RISC-V vector processor designs toward improved data flow and utilization strategies for diverse workloads.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.