Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

Abstract

Manticore is a 4096-core RISC-V chiplet architecture designed for ultra-efficient, general-purpose data-parallel floating-point (FP) workloads. It utilizes Snitch clusters, where small integer cores control large FPUs, achieving FPU utilization above 90% through two custom ISA extensions (SSR and FREP). A manufactured prototype demonstrated a remarkable energy efficiency improvement of more than 5x compared to conventional CPUs and GPUs on FP-intensive tasks.

Report

Key Highlights

  • 4096-Core Architecture: Manticore is a massive, highly parallel chiplet-based architecture built on the RISC-V instruction set.
  • Ultra-Efficiency: The primary design goal is achieving high energy and area efficiency for data-parallel floating-point computing.
  • Energy Performance: A prototype manufactured in Globalfoundries 22FDX process achieved over 5x improvement in energy efficiency compared to existing CPUs and GPUs on FP-intensive benchmarks.
  • High FPU Utilization: The architecture achieves Floating-Point Unit (FPU) utilization rates exceeding 90%.

Technical Details

  • Cluster Design: Compute capability is provided by "Snitch clusters," with each cluster containing eight small integer cores.
  • Asymmetric Core Structure: Each small integer core is tightly coupled with and controls a large FPU.
  • Area Allocation: More than 40% of the core area is specifically dedicated to the FPU to maximize FP throughput.
  • Custom ISA Extensions: The architecture incorporates two specialized RISC-V ISA extensions to boost efficiency:
    • SSR Extension: Streamlines memory access by encoding loads and stores as register reads and writes, thereby minimizing explicit instruction fetch bandwidth.
    • FREP Extension: Decouples the execution of the integer core from the FPU, allowing floating-point instructions to be issued independently and efficiently saturate the FPU.
  • Fabrication Process: The prototype chiplet core was manufactured using the Globalfoundries 22FDX technology.

Implications

  • Validation of RISC-V in HPC: Manticore provides strong validation for the RISC-V ecosystem's ability to create highly specialized, scalable, and ultra-efficient architectures for High-Performance Computing (HPC) and AI acceleration, challenging commercial CPU/GPU vendors.
  • Future of Chiplets: The 4096-core design demonstrates the potential of the chiplet paradigm for achieving extreme core counts and maximizing area efficiency for specialized computational needs.
  • ISA Customization Advantage: The significant efficiency gains realized through the custom SSR and FREP extensions showcase the critical flexibility of the open RISC-V ISA, allowing hardware designers to optimize the instruction set precisely for their target application (data-parallel FP workloads).
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →