Research

SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

Admin

0 views • 2 years ago (Updated) • 2 min read •

•

Abstract

SARIS is a general methodology leveraging register-mapped indirect streams to significantly accelerate performance-critical stencil computations on energy-efficient RISC-V compute clusters. Utilizing this technique on an eight-core system, the authors demonstrate an average speedup of 2.72x and 1.58x energy efficiency improvement over an RV32G baseline, achieving near-ideal FPU utilization of 81%. Furthermore, scaling estimations for a 256-core manycore system show competitive performance, yielding up to 15% higher fractions of peak compute compared to a leading GPU code generator.

Report

Key Highlights

Novel Methodology: Introduction of SARIS (Stencil Acceleration using Register-mapped Indirect Streams), designed to eliminate overheads associated with address calculation and irregular memory access in stencil codes.
Significant Speedup: Achieved an average speedup of 2.72x across various stencil codes on an eight-core RISC-V cluster.
High Efficiency: Demonstrated a 1.58x average improvement in energy efficiency compared to the RV32G baseline.
FPU Utilization: Achieved near-ideal floating-point unit (FPU) utilization, reaching 81% on the eight-core implementation.
Scalability Potential: Estimates for a larger 256-core manycore system predict an average speedup of 2.14x and competitive performance, offering up to 15% higher fractions of peak compute compared to leading GPU code generators.

Technical Details

Methodology: SARIS utilizes register-mapped indirect streams to streamline data flow and mitigate significant address calculation and irregular memory access overheads typical in stencil codes.
Target Platform: Energy-efficient RISC-V Compute Clusters, specifically tested on an eight-core configuration.
Hardware Enhancement: The acceleration is achieved through the integration of custom indirect stream registers within the RISC-V cluster.
Performance Metrics (8-core): Average speedup of 2.72x and energy efficiency gain of 1.58x, relative to an RV32G baseline.
Performance Metrics (256-core Estimate): Projects an average FPU utilization of 64% and an average speedup of 2.14x, validating scaling capabilities.

Implications

Enhanced RISC-V Acceleration: SARIS significantly improves the capabilities of RISC-V platforms in handling memory-intensive, performance-critical workloads like stencil computations, crucial for domains such as scientific computing and mathematical software.
Energy Efficiency Advantage: The demonstrated 1.58x energy efficiency improvement strengthens RISC-V's position as a superior architecture for energy-constrained systems, including edge computing and embedded AI.
Competitive HPC Standing: By outperforming leading GPU code generators in terms of fractions of peak compute (up to 15% higher in scaled estimates), SARIS highlights that customized RISC-V clusters can be highly competitive alternatives to proprietary GPU solutions for highly parallel manycore environments.
Customization Validation: The success of SARIS underscores the RISC-V ecosystem's flexibility, demonstrating how tailored hardware extensions can unlock specialized performance bottlenecks that are difficult to address with general-purpose instruction sets.

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →