GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator

GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator

Abstract

GRVI Phalanx is a massively parallel framework combining the FPGA-efficient RISC-V RV32I soft processor (GRVI) with a high-density parallel processor and accelerator array (Phalanx). This architecture utilizes shared memory clusters interconnected by a 300-bit-wide Hoplite Network-on-Chip (NOC) to manage extreme bandwidth. An example system on a Kintex UltraScale FPGA achieves immense performance, fielding 400 RISC-V cores for 100,000 MIPS peak throughput and 600 GB/s peak shared memory bandwidth at only 13 W.

Report

Key Highlights

  • Massive Parallelism: The system integrates 400 RISC-V soft cores onto a single Kintex UltraScale KU040 FPGA.
  • Extreme Performance Density: Achieves a peak throughput of 100,000 MIPS.
  • High Bandwidth Interconnect: Utilizes a custom 300-bit-wide Hoplite Network-on-Chip (NOC).
  • Power Efficiency: The entire 400-core system operates at a low power consumption of 13 W.

Technical Details

  • Core Architecture (GRVI): An FPGA-efficient soft implementation of the RISC-V RV32I Instruction Set Architecture.
  • Parallel Framework (Phalanx): A structured array framework designed to accommodate both GRVI processors and specialized accelerators.
  • Clustering: Processors and accelerators are organized into shared memory clusters.
  • Interconnect Specs: The clusters are linked via the Hoplite NOC, providing a NOC bisection bandwidth of 700 Gbps.
  • Memory Performance: The system boasts a peak shared memory bandwidth of 600 GB/s.
  • Target Hardware: The reference implementation is demonstrated on a Kintex UltraScale KU040 FPGA.

Implications

  • RISC-V in FPGAs: This work strongly validates RISC-V (specifically RV32I) as an optimal ISA for building high-density, power-efficient soft processor arrays on FPGAs.
  • FPGA Accelerator Leadership: GRVI Phalanx demonstrates a highly scalable approach to utilizing FPGAs for massive parallel acceleration tasks, providing performance traditionally associated with ASICs or large GPU clusters in a reconfigurable environment.
  • Data Center and Embedded Acceleration: The high MIPS/Watt ratio achieved (100,000 MIPS at 13 W) makes this architecture highly attractive for energy-sensitive applications in areas like machine learning inference, signal processing, and low-latency network packet processing.
  • Scalable Custom Computing: The Phalanx framework allows designers to easily mix and match general-purpose RISC-V cores and custom hardware accelerators within shared memory clusters, enabling specialized, highly optimized heterogeneous computing solutions.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →