KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer

KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer

Abstract

This paper presents KWT-Tiny, an aggressively quantized and retrained Keyword Transformer (KWT) model adapted for bare-metal RISC-V edge devices operating under strict 64kB RAM constraints. Model optimization reduced the size from 2.42 MB to 1.65 kB (a 369x reduction) while maintaining accuracy with only a 10% loss. Crucially, the integration of custom RISC-V instructions to accelerate GELU and SoftMax operations resulted in a substantial 5x speedup and corresponding power reduction during inference.

Report

Key Highlights

  • Extreme Model Reduction: The KWT-1 model was retrained and quantized, achieving a 369x reduction in size, dropping from 2.42 MB to just 1.65 kB.
  • RISC-V Acceleration: Custom RISC-V instructions were developed specifically to accelerate the GELU and SoftMax operations, critical for Transformer performance.
  • Performance Gain: The hardware acceleration yielded a 5x speedup and a corresponding ~5x power reduction during inference.
  • Efficiency Metric: Inference clock cycle counts decreased dramatically from 26 million to 5.5 million.
  • Resource Constraint: The entire bare-metal implementation successfully targeted devices with only 64kB of available RAM.

Technical Details

  • Target Model: Keyword Transformer (KWT), adapted from the ARM KWT model.
  • Model Optimization: The original KWT model's output classes were reduced from 35 to 2, facilitating model compression with only a 10% loss in accuracy.
  • Hardware Implementation: The model was run in bare-metal C using a custom-developed edge AI library on a RISC-V platform.
  • Custom Instructions: RISC-V ISA was extended to specifically accelerate computationally intensive functions like GELU (Gaussian Error Linear Unit) and SoftMax.
  • Area Overhead: The implementation of custom instructions resulted in a small hardware area overhead of approximately 29%.

Implications

  • Transformer Viability on Edge: This research demonstrates a viable method for porting complex, typically resource-heavy Transformer-based deep learning models onto severely low-power IoT devices (e.g., those with only 64kB RAM).
  • RISC-V Ecosystem Maturity: It showcases the powerful potential and flexibility of the RISC-V architecture, specifically its custom instruction extension capability, for tailoring hardware to accelerate specific AI/ML workloads (like KWS).
  • Advancing Embedded AI: By achieving high performance (5x speedup) and power efficiency in a tiny footprint, KWT-Tiny facilitates the widespread deployment of advanced voice interfaces and keyword spotting capabilities in the rapidly expanding low-power IoT market.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →