Chiplet-Based RISC-V SoC with Modular AI Acceleration
Abstract
This paper introduces a novel chiplet-based RISC-V System-on-Chip designed to overcome the low manufacturing yields and rigidity of monolithic edge AI devices by utilizing a 30mm x 30mm silicon interposer. The architecture integrates a 7nm RISC-V CPU chiplet with dual 5nm AI accelerators and 16GB HBM3, featuring four key innovations including adaptive DVFS and AI-aware UCIe extensions. These system-level optimizations resulted in a 40.1% overall efficiency gain, achieving approximately 3.5 mJ per MobileNetV2 inference while maintaining sub-5ms real-time capability across demanding workloads.
Report
Key Highlights
- Architectural Solution: Presentation of a chiplet-based RISC-V SoC utilizing a 30mm x 30mm silicon interposer to solve the yield and flexibility limitations of monolithic edge AI designs.
- Efficiency Gain: Achieved a substantial 40.1% overall efficiency gain compared to previous basic chiplet implementations.
- Performance Metrics: Demonstrated a 14.7% latency reduction, 17.3% throughput improvement, and 16.2% power reduction.
- Energy Efficiency Benchmark: Achieves high energy efficiency suitable for edge applications, corresponding to approximately 3.5 mJ per MobileNetV2 inference.
- Real-Time Capability: Maintained sub-5ms real-time latency across all tested industry-standard and real-time video processing benchmarks.
Technical Details
- Interconnect Substrate: Integration is built upon a 30mm x 30mm silicon interposer.
- Core Configuration: The SoC includes a 7nm RISC-V CPU chiplet and dual 5nm AI accelerators.
- AI Compute Capacity: The total system provides 30 TOPS (INT8), with each accelerator contributing 15 TOPS.
- Memory: Integrated high-bandwidth memory using 16GB HBM3 stacks.
- Key Architectural Innovations: The design incorporates four primary system-level optimizations:
- Adaptive cross-chiplet Dynamic Voltage and Frequency Scaling (DVFS).
- AI-aware Universal Chiplet Interconnect Express (UCIe) protocol extensions, including streaming flow control units and compression-aware transfers.
- Distributed cryptographic security across heterogeneous chiplets.
- Intelligent sensor-driven load migration.
Implications
- Validation of Chiplets for Advanced Nodes: The results demonstrate that modular chiplet designs can achieve near-monolithic computational density while enabling cost efficiency, scalability, and upgradeability—directly addressing the challenges of low manufacturing yields (cited as below 16%) at advanced 360 mm² process nodes.
- Advancing RISC-V in Edge AI: By pairing RISC-V (known for customization) with modular AI acceleration, this work solidifies the viability of the RISC-V ecosystem for high-performance, energy-constrained edge computing environments.
- Evolving UCIe Standard: The introduction of AI-aware extensions to the UCIe protocol (specifically streaming and compression features) pushes the industry standard forward, optimizing inter-chiplet communication for data-intensive machine learning workloads.
- Setting a New Efficiency Standard: Achieving 3.5 mJ per inference sets a critical benchmark for energy efficiency in high-performance edge AI, which is essential for mass deployment in devices where power draw dictates operational lifetime and cost.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.