GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator
Abstract
GRVI Phalanx is a massively parallel framework combining the FPGA-efficient RISC-V RV32I soft processor (GRVI) with a high-density parallel processor and accelerator array (Phalanx). This architecture utilizes shared memory clusters interconnected by a 300-bit-wide Hoplite Network-on-Chip (NOC) to manage extreme bandwidth. An example system on a Kintex UltraScale FPGA achieves immense performance, fielding 400 RISC-V cores for 100,000 MIPS peak throughput and 600 GB/s peak shared memory bandwidth at only 13 W.
Report
Key Highlights
- Massive Parallelism: The system integrates 400 RISC-V soft cores onto a single Kintex UltraScale KU040 FPGA.
- Extreme Performance Density: Achieves a peak throughput of 100,000 MIPS.
- High Bandwidth Interconnect: Utilizes a custom 300-bit-wide Hoplite Network-on-Chip (NOC).
- Power Efficiency: The entire 400-core system operates at a low power consumption of 13 W.
Technical Details
- Core Architecture (GRVI): An FPGA-efficient soft implementation of the RISC-V RV32I Instruction Set Architecture.
- Parallel Framework (Phalanx): A structured array framework designed to accommodate both GRVI processors and specialized accelerators.
- Clustering: Processors and accelerators are organized into shared memory clusters.
- Interconnect Specs: The clusters are linked via the Hoplite NOC, providing a NOC bisection bandwidth of 700 Gbps.
- Memory Performance: The system boasts a peak shared memory bandwidth of 600 GB/s.
- Target Hardware: The reference implementation is demonstrated on a Kintex UltraScale KU040 FPGA.
Implications
- RISC-V in FPGAs: This work strongly validates RISC-V (specifically RV32I) as an optimal ISA for building high-density, power-efficient soft processor arrays on FPGAs.
- FPGA Accelerator Leadership: GRVI Phalanx demonstrates a highly scalable approach to utilizing FPGAs for massive parallel acceleration tasks, providing performance traditionally associated with ASICs or large GPU clusters in a reconfigurable environment.
- Data Center and Embedded Acceleration: The high MIPS/Watt ratio achieved (100,000 MIPS at 13 W) makes this architecture highly attractive for energy-sensitive applications in areas like machine learning inference, signal processing, and low-latency network packet processing.
- Scalable Custom Computing: The Phalanx framework allows designers to easily mix and match general-purpose RISC-V cores and custom hardware accelerators within shared memory clusters, enabling specialized, highly optimized heterogeneous computing solutions.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.