Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices

Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices

Abstract

This paper introduces Decentor-V, a framework that enables lightweight Machine Learning training directly on low-power RISC-V edge devices, overcoming the architectural limitation imposed by the absence of dedicated Floating-Point Units (FPUs). By adapting the L-SGD algorithm, researchers identified severe performance degradation using standard 32-bit floating-point arithmetic. The implementation of an optimized 8-bit quantized version of L-SGD successfully mitigates these limitations, delivering a 4x reduction in memory usage and a 2.2x speedup in training time with minimal accuracy loss.

Report

Key Highlights

  • On-Device Training on RISC-V: Decentor-V enables resource-intensive ML training directly on low-power RISC-V Microcontroller Units (MCUs), an architecture previously lacking robust training support.
  • Federated Learning Enabler: The approach facilitates decentralized and collaborative training, addressing privacy concerns and reducing dependency on constant cloud connectivity.
  • L-SGD Optimization: The work successfully extends the lightweight L-SGD algorithm to the RISC-V platform.
  • Quantization Success: An 8-bit quantized implementation of L-SGD yielded massive efficiency improvements over standard 32-bit floating-point arithmetic.
  • Performance Gains: The optimized 8-bit solution achieved a nearly 4x reduction in memory usage and a 2.2x speedup in training time.

Technical Details

  • Core Algorithm: Lightweight Stochastic Gradient Descent (L-SGD), previously validated on Arm Cortex-M platforms, is ported and optimized for RISC-V.
  • Architectural Constraint: Performance evaluation highlighted that the primary bottleneck for 32-bit floating-point arithmetic on the target platforms was the lack of dedicated Floating-Point Units (FPUs) in low-power RISC-V MCUs.
  • Mitigation Strategy: The key innovation is the shift from FP32 to an 8-bit quantized implementation specifically designed to bypass the software emulation overhead of floating-point operations.
  • Evaluation Baseline: Performance comparisons were made against both Arm and RISC-V platforms using 32-bit floating-point arithmetic.

Implications

  • Maturation of RISC-V Edge ML: This work provides essential software infrastructure for deep learning training on the open RISC-V architecture, pushing it toward parity with more established platforms like Arm Cortex-M in edge AI capabilities.
  • Enhanced Privacy and Autonomy: By making lightweight training feasible locally, Decentor-V promotes true edge intelligence and enables practical adoption of privacy-preserving Federated Learning models in IoT deployments.
  • Efficiency Standard: The achieved 4x memory reduction and 2.2x speedup using 8-bit quantization sets a new benchmark for resource-efficient training on extremely constrained, FPU-less hardware, opening doors for more complex models to run on battery-powered devices.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →