SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

Abstract

SARIS is a general methodology leveraging register-mapped indirect streams to significantly accelerate performance-critical stencil computations on energy-efficient RISC-V compute clusters. Utilizing this technique on an eight-core system, the authors demonstrate an average speedup of 2.72x and 1.58x energy efficiency improvement over an RV32G baseline, achieving near-ideal FPU utilization of 81%. Furthermore, scaling estimations for a 256-core manycore system show competitive performance, yielding up to 15% higher fractions of peak compute compared to a leading GPU code generator.

Report

Key Highlights

  • Novel Methodology: Introduction of SARIS (Stencil Acceleration using Register-mapped Indirect Streams), designed to eliminate overheads associated with address calculation and irregular memory access in stencil codes.
  • Significant Speedup: Achieved an average speedup of 2.72x across various stencil codes on an eight-core RISC-V cluster.
  • High Efficiency: Demonstrated a 1.58x average improvement in energy efficiency compared to the RV32G baseline.
  • FPU Utilization: Achieved near-ideal floating-point unit (FPU) utilization, reaching 81% on the eight-core implementation.
  • Scalability Potential: Estimates for a larger 256-core manycore system predict an average speedup of 2.14x and competitive performance, offering up to 15% higher fractions of peak compute compared to leading GPU code generators.

Technical Details

  • Methodology: SARIS utilizes register-mapped indirect streams to streamline data flow and mitigate significant address calculation and irregular memory access overheads typical in stencil codes.
  • Target Platform: Energy-efficient RISC-V Compute Clusters, specifically tested on an eight-core configuration.
  • Hardware Enhancement: The acceleration is achieved through the integration of custom indirect stream registers within the RISC-V cluster.
  • Performance Metrics (8-core): Average speedup of 2.72x and energy efficiency gain of 1.58x, relative to an RV32G baseline.
  • Performance Metrics (256-core Estimate): Projects an average FPU utilization of 64% and an average speedup of 2.14x, validating scaling capabilities.

Implications

  • Enhanced RISC-V Acceleration: SARIS significantly improves the capabilities of RISC-V platforms in handling memory-intensive, performance-critical workloads like stencil computations, crucial for domains such as scientific computing and mathematical software.
  • Energy Efficiency Advantage: The demonstrated 1.58x energy efficiency improvement strengthens RISC-V's position as a superior architecture for energy-constrained systems, including edge computing and embedded AI.
  • Competitive HPC Standing: By outperforming leading GPU code generators in terms of fractions of peak compute (up to 15% higher in scaled estimates), SARIS highlights that customized RISC-V clusters can be highly competitive alternatives to proprietary GPU solutions for highly parallel manycore environments.
  • Customization Validation: The success of SARIS underscores the RISC-V ecosystem's flexibility, demonstrating how tailored hardware extensions can unlock specialized performance bottlenecks that are difficult to address with general-purpose instruction sets.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →