Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing
Abstract
Manticore is a 4096-core RISC-V chiplet architecture designed for ultra-efficient, general-purpose data-parallel floating-point (FP) workloads. It utilizes Snitch clusters, where small integer cores control large FPUs, achieving FPU utilization above 90% through two custom ISA extensions (SSR and FREP). A manufactured prototype demonstrated a remarkable energy efficiency improvement of more than 5x compared to conventional CPUs and GPUs on FP-intensive tasks.
Report
Key Highlights
- 4096-Core Architecture: Manticore is a massive, highly parallel chiplet-based architecture built on the RISC-V instruction set.
- Ultra-Efficiency: The primary design goal is achieving high energy and area efficiency for data-parallel floating-point computing.
- Energy Performance: A prototype manufactured in Globalfoundries 22FDX process achieved over 5x improvement in energy efficiency compared to existing CPUs and GPUs on FP-intensive benchmarks.
- High FPU Utilization: The architecture achieves Floating-Point Unit (FPU) utilization rates exceeding 90%.
Technical Details
- Cluster Design: Compute capability is provided by "Snitch clusters," with each cluster containing eight small integer cores.
- Asymmetric Core Structure: Each small integer core is tightly coupled with and controls a large FPU.
- Area Allocation: More than 40% of the core area is specifically dedicated to the FPU to maximize FP throughput.
- Custom ISA Extensions: The architecture incorporates two specialized RISC-V ISA extensions to boost efficiency:
- SSR Extension: Streamlines memory access by encoding loads and stores as register reads and writes, thereby minimizing explicit instruction fetch bandwidth.
- FREP Extension: Decouples the execution of the integer core from the FPU, allowing floating-point instructions to be issued independently and efficiently saturate the FPU.
- Fabrication Process: The prototype chiplet core was manufactured using the Globalfoundries 22FDX technology.
Implications
- Validation of RISC-V in HPC: Manticore provides strong validation for the RISC-V ecosystem's ability to create highly specialized, scalable, and ultra-efficient architectures for High-Performance Computing (HPC) and AI acceleration, challenging commercial CPU/GPU vendors.
- Future of Chiplets: The 4096-core design demonstrates the potential of the chiplet paradigm for achieving extreme core counts and maximizing area efficiency for specialized computational needs.
- ISA Customization Advantage: The significant efficiency gains realized through the custom SSR and FREP extensions showcase the critical flexibility of the open RISC-V ISA, allowing hardware designers to optimize the instruction set precisely for their target application (data-parallel FP workloads).
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.