Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Abstract

Dustin is a 16-core parallel ultra-low-power RISC-V cluster designed for edge devices, focusing on energy-intensive Deep Neural Network (DNN) workloads. Its key innovation is the support for fully flexible 2- to 32-bit mixed-precision arithmetic, optimizing memory footprint and throughput. The system introduces a Vector Lockstep Execution Mode (VLEM) which utilizes instruction broadcasting to achieve a significant 38% power reduction, resulting in a peak efficiency of 1.15 TOPS/W in 65 nm CMOS technology.

Report

Structured Report: Dustin Ultra-Low-Power Cluster

Key Highlights

  • Core Count and Architecture: A fully programmable compute cluster integrating 16 RISC-V cores.
  • Flexible Precision: Supports arithmetic operations ranging from 2-bit up to 32-bit, allowing for all possible mixed-precision permutations.
  • Vector Lockstep Execution Mode (VLEM): A novel execution paradigm designed to minimize power consumption during highly data-parallel kernels.
  • Power Efficiency: VLEM achieves a substantial 38% power reduction with negligible performance overhead (less than 3%).
  • Performance Metrics: The cluster delivers a peak performance of 58 GOPS and a peak energy efficiency of 1.15 TOPS/W.
  • Fabrication Technology: Implemented using 65 nm CMOS technology.

Technical Details

  • Target Application: Optimized specifically for computationally intensive algorithms like Deep Neural Networks (DNNs) on resource-constrained, battery-powered edge devices.
  • Standard Execution: Supports the conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm.
  • VLEM Mechanism: In VLEM, a single designated core acts as the "leader," fetching instructions and broadcasting them to the remaining 15 "follower" cores.
  • Power Optimization Technique: The power savings in VLEM are achieved by leveraging clock gating on the Instruction Fetch (IF) stages and private caches of the follower cores.
  • Precision Range: The core cluster is capable of handling any integer bit-width between 2b and 32b, essential for efficient quantization strategies in AI models.

Implications

  • Edge AI Acceleration: Dustin provides a specialized, ultra-efficient hardware solution directly addressing the critical challenges of memory footprint, throughput, and energy use inherent in deploying complex AI models at the far edge.
  • RISC-V Ecosystem Advancement: This development showcases the ability of the RISC-V ISA to serve as the foundation for highly parallel, specialized processing clusters that integrate advanced power-saving techniques like VLEM.
  • Energy Efficiency Benchmark: Achieving 1.15 TOPS/W sets a strong benchmark for energy efficiency in edge AI processors implemented in mature 65 nm technology, demonstrating excellent power scaling.
  • Future of Quantization: The full flexibility of 2b-to-32b precision supports cutting-edge research in mixed-precision and extreme quantization (e.g., binary/ternary networks), maximizing computational density and reducing memory bandwidth requirements in a single programmable solution.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →