TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-Up Cluster Design With High Bandwidth Main Memory Link
Abstract
The TeraPool project introduces a highly scaled-up cluster architecture integrating 1024 RISC-V cores optimized for parallelism. Its key innovation is a physically design-aware implementation featuring a shared-L1-memory structure, which enables ultra-low-latency data exchange between the numerous cores. This design, complemented by a high-bandwidth main memory link, aims to overcome memory wall bottlenecks inherent in massive core count systems and establish a viable high-performance blueprint for the RISC-V ecosystem.
Report
TeraPool: Analysis Report
Key Highlights
- Massive Scale: The design successfully integrates 1024 individual RISC-V cores into a single scaled-up cluster configuration, named TeraPool.
- Novel Memory Architecture: It utilizes a shared-L1-memory structure, departing from traditional private L1 caches, to facilitate extremely fast, low-latency inter-core communication and data sharing.
- Physical Design Optimization: The architecture is explicitly "Physical Design Aware," meaning the layout, routing, and power delivery for the 1024 cores were co-optimized during the design phase to ensure real-world implementation efficiency and performance targets.
- Memory Bottleneck Mitigation: The cluster incorporates a dedicated High Bandwidth Main Memory Link, crucial for feeding data to and from the large pool of cores efficiently.
Technical Details
- Core Configuration: Features 1024 RISC-V processing elements, likely optimized for simple, in-order execution to maximize density and throughput.
- Shared L1 Structure: The core innovation requires a specialized, high-density interconnect and coherence protocol specifically designed to manage the shared L1 cache space among a large number of neighbors.
- Design Methodology: Emphasis is placed on managing the complexity of routing and clock distribution across a die handling 1024 processors, utilizing advanced floorplanning techniques to minimize latency paths critical for the shared L1 access.
- Interconnect Focus: The interconnect fabric is optimized for spatial locality, enabling adjacent cores to access shared data with latencies approaching those of local L1 hits.
Implications
- RISC-V High-Performance Viability: TeraPool provides strong evidence that RISC-V is a viable Instruction Set Architecture for massive-scale, high-performance computing (HPC) and data center applications, moving beyond embedded and small-scale accelerators.
- Architectural Exploration: The shared-L1 approach challenges established norms in multi-core design. If successful, it could pave the way for new programming models and synchronization primitives that leverage extremely fast, near-uniform data access across a large core cluster.
- Scalability Blueprint: This physically designed and validated 1024-core cluster acts as a critical reference architecture for future RISC-V scale-up projects, offering solutions for power, thermal, and bandwidth management at extreme densities.
- Addressing the Memory Wall: By directly focusing on increasing external memory bandwidth and optimizing internal data access (Shared L1), the TeraPool architecture provides a practical solution to the persistent memory bandwidth limitation encountered when scaling core counts far past 100.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.