SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers
Abstract
SARIS is a general methodology leveraging register-mapped indirect streams to significantly accelerate performance-critical stencil computations on energy-efficient RISC-V compute clusters. Utilizing this technique on an eight-core system, the authors demonstrate an average speedup of 2.72x and 1.58x energy efficiency improvement over an RV32G baseline, achieving near-ideal FPU utilization of 81%. Furthermore, scaling estimations for a 256-core manycore system show competitive performance, yielding up to 15% higher fractions of peak compute compared to a leading GPU code generator.
Report
Key Highlights
- Novel Methodology: Introduction of SARIS (Stencil Acceleration using Register-mapped Indirect Streams), designed to eliminate overheads associated with address calculation and irregular memory access in stencil codes.
- Significant Speedup: Achieved an average speedup of 2.72x across various stencil codes on an eight-core RISC-V cluster.
- High Efficiency: Demonstrated a 1.58x average improvement in energy efficiency compared to the RV32G baseline.
- FPU Utilization: Achieved near-ideal floating-point unit (FPU) utilization, reaching 81% on the eight-core implementation.
- Scalability Potential: Estimates for a larger 256-core manycore system predict an average speedup of 2.14x and competitive performance, offering up to 15% higher fractions of peak compute compared to leading GPU code generators.
Technical Details
- Methodology: SARIS utilizes register-mapped indirect streams to streamline data flow and mitigate significant address calculation and irregular memory access overheads typical in stencil codes.
- Target Platform: Energy-efficient RISC-V Compute Clusters, specifically tested on an eight-core configuration.
- Hardware Enhancement: The acceleration is achieved through the integration of custom indirect stream registers within the RISC-V cluster.
- Performance Metrics (8-core): Average speedup of 2.72x and energy efficiency gain of 1.58x, relative to an RV32G baseline.
- Performance Metrics (256-core Estimate): Projects an average FPU utilization of 64% and an average speedup of 2.14x, validating scaling capabilities.
Implications
- Enhanced RISC-V Acceleration: SARIS significantly improves the capabilities of RISC-V platforms in handling memory-intensive, performance-critical workloads like stencil computations, crucial for domains such as scientific computing and mathematical software.
- Energy Efficiency Advantage: The demonstrated 1.58x energy efficiency improvement strengthens RISC-V's position as a superior architecture for energy-constrained systems, including edge computing and embedded AI.
- Competitive HPC Standing: By outperforming leading GPU code generators in terms of fractions of peak compute (up to 15% higher in scaled estimates), SARIS highlights that customized RISC-V clusters can be highly competitive alternatives to proprietary GPU solutions for highly parallel manycore environments.
- Customization Validation: The success of SARIS underscores the RISC-V ecosystem's flexibility, demonstrating how tailored hardware extensions can unlock specialized performance bottlenecks that are difficult to address with general-purpose instruction sets.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.