Chiplet-Based RISC-V SoC with Modular AI Acceleration

Chiplet-Based RISC-V SoC with Modular AI Acceleration

Abstract

This paper introduces a novel chiplet-based RISC-V System-on-Chip designed to overcome the low manufacturing yields and rigidity of monolithic edge AI devices by utilizing a 30mm x 30mm silicon interposer. The architecture integrates a 7nm RISC-V CPU chiplet with dual 5nm AI accelerators and 16GB HBM3, featuring four key innovations including adaptive DVFS and AI-aware UCIe extensions. These system-level optimizations resulted in a 40.1% overall efficiency gain, achieving approximately 3.5 mJ per MobileNetV2 inference while maintaining sub-5ms real-time capability across demanding workloads.

Report

Key Highlights

  • Architectural Solution: Presentation of a chiplet-based RISC-V SoC utilizing a 30mm x 30mm silicon interposer to solve the yield and flexibility limitations of monolithic edge AI designs.
  • Efficiency Gain: Achieved a substantial 40.1% overall efficiency gain compared to previous basic chiplet implementations.
  • Performance Metrics: Demonstrated a 14.7% latency reduction, 17.3% throughput improvement, and 16.2% power reduction.
  • Energy Efficiency Benchmark: Achieves high energy efficiency suitable for edge applications, corresponding to approximately 3.5 mJ per MobileNetV2 inference.
  • Real-Time Capability: Maintained sub-5ms real-time latency across all tested industry-standard and real-time video processing benchmarks.

Technical Details

  • Interconnect Substrate: Integration is built upon a 30mm x 30mm silicon interposer.
  • Core Configuration: The SoC includes a 7nm RISC-V CPU chiplet and dual 5nm AI accelerators.
  • AI Compute Capacity: The total system provides 30 TOPS (INT8), with each accelerator contributing 15 TOPS.
  • Memory: Integrated high-bandwidth memory using 16GB HBM3 stacks.
  • Key Architectural Innovations: The design incorporates four primary system-level optimizations:
    1. Adaptive cross-chiplet Dynamic Voltage and Frequency Scaling (DVFS).
    2. AI-aware Universal Chiplet Interconnect Express (UCIe) protocol extensions, including streaming flow control units and compression-aware transfers.
    3. Distributed cryptographic security across heterogeneous chiplets.
    4. Intelligent sensor-driven load migration.

Implications

  • Validation of Chiplets for Advanced Nodes: The results demonstrate that modular chiplet designs can achieve near-monolithic computational density while enabling cost efficiency, scalability, and upgradeability—directly addressing the challenges of low manufacturing yields (cited as below 16%) at advanced 360 mm² process nodes.
  • Advancing RISC-V in Edge AI: By pairing RISC-V (known for customization) with modular AI acceleration, this work solidifies the viability of the RISC-V ecosystem for high-performance, energy-constrained edge computing environments.
  • Evolving UCIe Standard: The introduction of AI-aware extensions to the UCIe protocol (specifically streaming and compression features) pushes the industry standard forward, optimizing inter-chiplet communication for data-intensive machine learning workloads.
  • Setting a New Efficiency Standard: Achieving 3.5 mJ per inference sets a critical benchmark for energy efficiency in high-performance edge AI, which is essential for mass deployment in devices where power draw dictates operational lifetime and cost.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →