A near-threshold RISC-V core with DSP extensions for scalable IoT Endpoint Devices

A near-threshold RISC-V core with DSP extensions for scalable IoT Endpoint Devices

Abstract

This paper presents the design of an open-source RISC-V processor core optimized for Near-Threshold (NT) operation within scalable multi-core clusters targeting energy-constrained IoT endpoint devices. The core integrates instruction extensions for Digital Signal Processing (DSP), including SIMD dot-products, and utilizes a smart L0 buffer to significantly reduce pressure on the shared memory hierarchy. These microarchitectural improvements yield substantial energy efficiency gains, achieving a peak efficiency of 193 MOPS/mW in a low-power 28nm FDSOI process.

Report

Structured Report: A near-threshold RISC-V core with DSP extensions for scalable IoT Endpoint Devices

Key Highlights

  • NT RISC-V Core: The core is specifically designed for Near-Threshold (NT) operation, ensuring maximum energy efficiency crucial for low-power IoT endpoints operating under a tight power envelope (few milliwatts).
  • Performance & Efficiency Gains: For typical data-intensive sensor processing workloads, the proposed core is 3.5x faster and 3.2x more energy-efficient compared to prior art (details omitted from abstract).
  • High Energy Efficiency: The design achieves a peak efficiency of 67 MOPS/mW in 65nm bulk CMOS and an outstanding 193 MOPS/mW in 28nm FDSOI technology (at 40MHz, 1mW).
  • Memory Optimization: Integration of a built-in L0 storage (smart L0 buffer) reduces shared memory accesses by 8x and minimizes contentions by 3.2x.
  • Scalability: The architecture supports multi-core clustering and operates effectively across a wide voltage range, from 0.6V to 1.2V.

Technical Details

  • Architecture: Open-source RISC-V processor core designed for operation in tightly coupled multi-core clusters.
  • Instruction Extensions: Includes custom instruction extensions, notably SIMD dot-products, and general DSP extensions aimed at increasing computational density.
  • Microarchitectural Features: Key optimizations include support for compressed instructions and the implementation of a smart L0 buffer to manage data locality and reduce stress on the shared cache.
  • Technology Metrics (65nm): A four-core cluster operating in 65nm bulk CMOS achieves a peak efficiency of 67 MOPS/mW.
  • Technology Metrics (28nm): Utilizing a low-power 28nm FDSOI process, the peak efficiency dramatically increases to 193 MOPS/mW, benchmarked at 40MHz operating frequency and a 1mW power budget.

Implications

  • Democratization of Low-Power Design: By providing an open-source, energy-optimized RISC-V core, the work lowers the barrier to entry for developing highly efficient edge and IoT hardware.
  • Addressing IoT Power Constraints: The NT operation and exceptional energy efficiency (up to 193 MOPS/mW) directly address the critical need for ultra-low-power processing necessary for long-lasting, battery-powered IoT endpoint devices.
  • Validation of RISC-V Extensibility: The successful integration of custom DSP and SIMD extensions (like dot-products) demonstrates the strength of the RISC-V ISA for developing highly specialized, domain-specific accelerators without requiring entirely new instruction sets.
  • Future Multi-Core Edge Systems: The focus on scalability and multi-core clustering suggests a path for deploying high-performance systems capable of complex tasks (like local machine learning or advanced sensor fusion) while staying within restrictive milliwatt power budgets.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →