A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

Abstract

This paper presents a heterogeneous, tightly-coupled clustered architecture integrating 8 RISC-V cores, an analog In-Memory Computing (IMC) accelerator, and digital accelerators, designed for flexible end-to-end deep neural network (DNN) inference in TinyML devices. The system overcomes the functional limitations of pure IMC by coupling it with programmable RISC-V elements, enabling practical deployment of real-world DNNs like MobileNetV2. Benchmarks show significant improvements, achieving 11.5x performance and 9.5x energy efficiency gains over highly optimized parallel execution on the standard cores alone.

Report

A Heterogeneous In-Memory Computing Cluster Analysis

Key Highlights

  • Heterogeneous Architecture: The system uses a tightly-coupled clustered architecture combining general-purpose RISC-V cores with specialized accelerators.
  • Core Components: It integrates 8 RISC-V cores, an In-Memory Computing Accelerator (IMA) using Analog Non-Volatile Memory (NVM), and dedicated digital accelerators.
  • Performance Metrics (Bottleneck Layer): Achieves 11.5x performance improvement and 9.5x energy efficiency improvement compared to highly optimized parallel execution solely on the cores.
  • End-to-End Latency: For complete MobileNetV2 inference, the solution offers an order of magnitude better execution latency than existing programmable architectures.
  • Target Application: Designed specifically to address the high computational energy efficiency required for TinyML tasks on battery-constrained IoT devices, handling highly heterogeneous workloads.

Technical Details

  • Architectural Focus: The key innovation is addressing the system-level challenge of integrating inherently inflexible Analog IMC arrays into a functionally flexible, programmable ecosystem.
  • IMC Role: Analog IMC arrays utilizing NVM serve as both the computation engine for matrix multiplication (DNN inference) and the on-chip memory storage for DNN weights.
  • Clustering: The components (RISC-V cores, IMA, digital accelerators) are organized in a tightly-coupled clustered structure, facilitating fast data movement and synchronization.
  • Benchmark Workload: Performance evaluation was conducted using the complex and highly heterogeneous Bottleneck layer of a MobileNetV2 DNN, and subsequently on the full end-to-end MobileNetV2 network.
  • Scalability: The architecture is designed to be scalable, exploring requirements for end-to-end inference by scaling up to a multi-array accelerator configuration.

Implications

  • Validation of RISC-V in Heterogeneous Edge: This work strongly validates the RISC-V instruction set architecture as the ideal programmable backbone for energy-constrained, highly specialized edge computing platforms (TinyML/IoT).
  • Enabling Emerging Technologies: RISC-V cores provide the crucial functional flexibility necessary to harness the massive energy efficiency potential of emerging memory technologies, such as Analog NVM-based IMC, which often lack the flexibility needed for complex real-world operations.
  • Custom Accelerator Integration: The success of this tightly-coupled cluster showcases the ease and efficiency with which RISC-V ecosystems can integrate specialized accelerators (IMA and digital units) for maximum efficiency, leveraging RISC-V's custom instruction capabilities and lightweight core design.
  • Competitive Performance: By achieving multi-order of magnitude latency improvement over competing heterogeneous IMC solutions, this RISC-V-based design positions the open ISA as a leader in high-performance, ultra-low-power DNN inference acceleration.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →