A

A

Abstract

This paper presents the first open-source implementation of the recently finalized RISC-V V 1.0 Vector extension, titled "New Ara," providing essential insights into the resulting micro-architecture and design choices. The open-source system is optimized for performance in coupled scalar-vector processors using a lane-based design methodology. Comparative benchmarks demonstrate superior efficiency, yielding 15% better area and 6% improved throughput compared to previous state-of-the-art vector engines running older RVV versions.

Report

Structured Report: A "New Ara" for Vector Computing

Key Highlights

  • First Open-Source RVV 1.0 Implementation: The paper introduces the first publicly available, open-source implementation of the RISC-V V extension adhering to its final 1.0-Frozen specification.
  • Superior PPA Metrics: The "New Ara" design achieves comparable or better Power, Performance, and Area (PPA) results than existing state-of-the-art vector engines that utilize older RVV specifications.
  • Efficiency Gains: Specific improvements include 15% better area and 6% improved throughput compared to prior vector designs.
  • High Utilization: The system demonstrates extremely high functional unit efficiency, achieving FPU utilization rates greater than 98.5% on crucial data-parallel kernels.
  • Design Focus: The research provides critical insights into optimizing performance for coupled scalar-vector processors and the micro-architectural requirements of the new V 1.0 specification.

Technical Details

  • ISA Standard: Implementation targets the RISC-V V extension at its official 1.0-Frozen status.
  • Architecture Style: The design employs a lane-based micro-architecture, a common method for handling vector parallel processing efficiently.
  • Coupled Processor: The vector unit is designed to be tightly integrated with a scalar processor for combined performance optimization.
  • Optimization Target: Key micro-architectural changes are discussed relating to the new V 1.0 specification, focusing on optimizing instruction flow and data handling to maximize throughput.
  • Performance Metrics: PPA comparisons are made against previous vector engines that implemented older RISC-V Vector versions (pre-1.0).

Implications

  • Accelerating RVV 1.0 Adoption: Providing the first open-source, high-efficiency reference implementation of the finalized RISC-V V 1.0 standard significantly accelerates its adoption, verification, and use across the community and industry.
  • Setting a Performance Benchmark: The demonstrated PPA improvements (especially the 15% area reduction) set a new, high standard for efficiency in RISC-V vector hardware design.
  • Democratization of HPC Hardware: As an open-source design, it lowers the barrier to entry for research groups and companies looking to build highly efficient, data-parallel hardware accelerators based on RISC-V.
  • Competitive Standing: This implementation helps RISC-V solidify its position as a major ISA for high-efficiency, data-parallel workloads, competing directly with established architectures like Arm SVE (as used in systems like the Fujitsu A64FX/Fugaku).
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →