SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations

SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations

Abstract

Solving sparse linear systems is critical in numerical methods but suffers from poor data reuse and complex dependencies on current hardware. This paper presents SuperUROP, an FPGA implementation of the Azul spatial accelerator designed to overcome these inefficiencies. The architecture utilizes a tiled grid of simple RISC-V cores connected by a Network-on-Chip (NoC), leveraging custom RISC-V ISA augmentations to enable a high-performance, task-based programming model.

Report

SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations

Key Highlights

  • Problem Addressed: Inefficiency in state-of-the-art iterative solvers for sparse linear systems, primarily caused by poor short-term data reuse (leading to irregular memory access) and complex data dependencies (limiting parallelism).
  • Solution: An FPGA implementation (SuperUROP) of the existing SRAM-only spatial accelerator known as Azul.
  • Performance Goal: Achieve high memory bandwidth utilization and arithmetic intensity.
  • Core Technology: The accelerator uses simple RISC-V CPU cores as Processing Elements (PEs) within a distributed, tiled architecture.
  • Verification: The FPGA implementation was functionally verified to match equivalent performance achieved via architectural simulation of the Azul framework.

Technical Details

  • Accelerator Architecture: Azul framework, an SRAM-only hardware accelerator.
  • Spatial Design: Features a grid of tiles, where each tile contains a Processing Element (PE) and its own small, independent SRAM memory.
  • Interconnect: All tiles are connected via a Network-on-Chip (NoC) for high-speed communication.
  • Processor Choice: Implementation uses simple RISC-V CPU cores as PEs.
  • Memory Hierarchy: Utilizes a memory hierarchy comprising different FPGA memory modules.
  • Programming Model: Implements a task-based programming model for the PEs, facilitated through custom augmentations to the standard RISC-V Instruction Set Architecture (ISA).

Implications

  • Validation of RISC-V Extensibility: This work powerfully demonstrates the utility of the RISC-V ISA, showcasing how simple RISC-V cores can be customized with ISA augmentations to serve as efficient processing elements in specialized, domain-specific accelerators (DSAs).
  • High-Performance Sparse Computing: By mapping the data structures of sparse matrix operations onto a localized, tiled memory architecture (SRAM-only), the design effectively bypasses the memory wall limitations prevalent in standard CPU/GPU architectures, offering a crucial speedup for fundamental scientific computing tasks.
  • FPGA Prototyping: Validating complex spatial architectures like Azul on FPGAs using RISC-V cores provides a fast, flexible pathway for developing and testing specialized hardware solutions before committing to ASIC fabrication, lowering the barrier to entry for custom hardware development.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →