A near-threshold RISC-V core with DSP extensions for scalable IoT Endpoint Devices
Abstract
This paper presents the design of an open-source RISC-V processor core optimized for Near-Threshold (NT) operation within scalable multi-core clusters targeting energy-constrained IoT endpoint devices. The core integrates instruction extensions for Digital Signal Processing (DSP), including SIMD dot-products, and utilizes a smart L0 buffer to significantly reduce pressure on the shared memory hierarchy. These microarchitectural improvements yield substantial energy efficiency gains, achieving a peak efficiency of 193 MOPS/mW in a low-power 28nm FDSOI process.
Report
Structured Report: A near-threshold RISC-V core with DSP extensions for scalable IoT Endpoint Devices
Key Highlights
- NT RISC-V Core: The core is specifically designed for Near-Threshold (NT) operation, ensuring maximum energy efficiency crucial for low-power IoT endpoints operating under a tight power envelope (few milliwatts).
- Performance & Efficiency Gains: For typical data-intensive sensor processing workloads, the proposed core is 3.5x faster and 3.2x more energy-efficient compared to prior art (details omitted from abstract).
- High Energy Efficiency: The design achieves a peak efficiency of 67 MOPS/mW in 65nm bulk CMOS and an outstanding 193 MOPS/mW in 28nm FDSOI technology (at 40MHz, 1mW).
- Memory Optimization: Integration of a built-in L0 storage (smart L0 buffer) reduces shared memory accesses by 8x and minimizes contentions by 3.2x.
- Scalability: The architecture supports multi-core clustering and operates effectively across a wide voltage range, from 0.6V to 1.2V.
Technical Details
- Architecture: Open-source RISC-V processor core designed for operation in tightly coupled multi-core clusters.
- Instruction Extensions: Includes custom instruction extensions, notably SIMD dot-products, and general DSP extensions aimed at increasing computational density.
- Microarchitectural Features: Key optimizations include support for compressed instructions and the implementation of a smart L0 buffer to manage data locality and reduce stress on the shared cache.
- Technology Metrics (65nm): A four-core cluster operating in 65nm bulk CMOS achieves a peak efficiency of 67 MOPS/mW.
- Technology Metrics (28nm): Utilizing a low-power 28nm FDSOI process, the peak efficiency dramatically increases to 193 MOPS/mW, benchmarked at 40MHz operating frequency and a 1mW power budget.
Implications
- Democratization of Low-Power Design: By providing an open-source, energy-optimized RISC-V core, the work lowers the barrier to entry for developing highly efficient edge and IoT hardware.
- Addressing IoT Power Constraints: The NT operation and exceptional energy efficiency (up to 193 MOPS/mW) directly address the critical need for ultra-low-power processing necessary for long-lasting, battery-powered IoT endpoint devices.
- Validation of RISC-V Extensibility: The successful integration of custom DSP and SIMD extensions (like dot-products) demonstrates the strength of the RISC-V ISA for developing highly specialized, domain-specific accelerators without requiring entirely new instruction sets.
- Future Multi-Core Edge Systems: The focus on scalability and multi-core clustering suggests a path for deploying high-performance systems capable of complex tasks (like local machine learning or advanced sensor fusion) while staying within restrictive milliwatt power budgets.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.