A
Abstract
This paper presents the first open-source implementation of the recently finalized RISC-V V 1.0 Vector extension, titled "New Ara," providing essential insights into the resulting micro-architecture and design choices. The open-source system is optimized for performance in coupled scalar-vector processors using a lane-based design methodology. Comparative benchmarks demonstrate superior efficiency, yielding 15% better area and 6% improved throughput compared to previous state-of-the-art vector engines running older RVV versions.
Report
Structured Report: A "New Ara" for Vector Computing
Key Highlights
- First Open-Source RVV 1.0 Implementation: The paper introduces the first publicly available, open-source implementation of the RISC-V V extension adhering to its final 1.0-Frozen specification.
- Superior PPA Metrics: The "New Ara" design achieves comparable or better Power, Performance, and Area (PPA) results than existing state-of-the-art vector engines that utilize older RVV specifications.
- Efficiency Gains: Specific improvements include 15% better area and 6% improved throughput compared to prior vector designs.
- High Utilization: The system demonstrates extremely high functional unit efficiency, achieving FPU utilization rates greater than 98.5% on crucial data-parallel kernels.
- Design Focus: The research provides critical insights into optimizing performance for coupled scalar-vector processors and the micro-architectural requirements of the new V 1.0 specification.
Technical Details
- ISA Standard: Implementation targets the RISC-V V extension at its official 1.0-Frozen status.
- Architecture Style: The design employs a lane-based micro-architecture, a common method for handling vector parallel processing efficiently.
- Coupled Processor: The vector unit is designed to be tightly integrated with a scalar processor for combined performance optimization.
- Optimization Target: Key micro-architectural changes are discussed relating to the new V 1.0 specification, focusing on optimizing instruction flow and data handling to maximize throughput.
- Performance Metrics: PPA comparisons are made against previous vector engines that implemented older RISC-V Vector versions (pre-1.0).
Implications
- Accelerating RVV 1.0 Adoption: Providing the first open-source, high-efficiency reference implementation of the finalized RISC-V V 1.0 standard significantly accelerates its adoption, verification, and use across the community and industry.
- Setting a Performance Benchmark: The demonstrated PPA improvements (especially the 15% area reduction) set a new, high standard for efficiency in RISC-V vector hardware design.
- Democratization of HPC Hardware: As an open-source design, it lowers the barrier to entry for research groups and companies looking to build highly efficient, data-parallel hardware accelerators based on RISC-V.
- Competitive Standing: This implementation helps RISC-V solidify its position as a major ISA for high-efficiency, data-parallel workloads, competing directly with established architectures like Arm SVE (as used in systems like the Fujitsu A64FX/Fugaku).
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.