Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI

Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI

Abstract

Ara is a highly scalable and energy-efficient 64-bit vector processor based on the RISC-V vector extension (v0.5 draft), realized in 22 nm FD-SOI technology. Its lane-based microarchitecture achieves clock speeds exceeding 1 GHz and delivers up to 33 DP-GFLOPS of performance. Crucially, Ara achieves a leading energy efficiency of up to 41 DP-GFLOPS/W, surpassing similar vector processors documented in literature.

Report

Ara Processor Analysis Report

Key Highlights

  • High Performance and Frequency: The processor operates at more than 1 GHz in the typical corner (TT/0.80V/25 °C).
  • Leading Energy Efficiency: Ara achieves up to 41 DP-GFLOPS/W, which is noted as slightly superior to comparable vector processors in published literature.
  • Scalable Architecture: The microarchitecture is highly scalable, built from a set of identical lanes, each containing parts of the vector register file and functional units.
  • High Utilization: Achieved near-peak FPU utilization (up to 97%) during large double-precision matrix multiplication kernels (256 x 256) utilizing sixteen lanes.
  • Technology Node: Implemented in GlobalFoundries 22FDX FD-SOI technology.

Technical Details

  • ISA Implementation: Based on the 64-bit RISC-V vector extension (version 0.5 draft).
  • Performance Metrics: Achieves peak performance of 33 DP-GFLOPS (Double Precision Giga Floating-point Operations Per Second).
  • Microarchitecture: Utilizes a banked, lane-based approach where scalability is achieved by replicating identical processing lanes.
  • Workload Analysis: Performance and bottleneck analysis were conducted using various vectorizable linear algebra computation kernels, including studies on performance limitations for small matrix sizes.
  • Voltage/Temperature Corner: Operating metrics (1 GHz, 33 DP-GFLOPS) are quoted for the typical corner (TT/0.80V/25 °C).

Implications

  • Validation of RISC-V Vector Standard: Ara demonstrates a successful, high-performance physical implementation of the nascent RISC-V vector extension (v0.5), bolstering confidence in the ISA's suitability for high-throughput computing.
  • HPC and AI Acceleration: By demonstrating high efficiency (41 DP-GFLOPS/W) and high DP-GFLOPS throughput, Ara proves RISC-V's viability as a competitive, energy-efficient alternative to established architectures for use in HPC, data centers, and parallel processing tasks.
  • FD-SOI Process Utilization: The successful implementation in 22 nm FD-SOI technology highlights the viability of this process node for developing advanced, high-speed, and low-power parallel processors.
  • Future Architectural Guidance: The paper provides critical architectural insights into performance bottlenecks, particularly concerning suboptimal utilization on smaller matrix sizes, guiding subsequent RISC-V vector processor designs toward improved data flow and utilization strategies for diverse workloads.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →