KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer
Abstract
This paper presents KWT-Tiny, an aggressively quantized and retrained Keyword Transformer (KWT) model adapted for bare-metal RISC-V edge devices operating under strict 64kB RAM constraints. Model optimization reduced the size from 2.42 MB to 1.65 kB (a 369x reduction) while maintaining accuracy with only a 10% loss. Crucially, the integration of custom RISC-V instructions to accelerate GELU and SoftMax operations resulted in a substantial 5x speedup and corresponding power reduction during inference.
Report
Key Highlights
- Extreme Model Reduction: The KWT-1 model was retrained and quantized, achieving a 369x reduction in size, dropping from 2.42 MB to just 1.65 kB.
- RISC-V Acceleration: Custom RISC-V instructions were developed specifically to accelerate the GELU and SoftMax operations, critical for Transformer performance.
- Performance Gain: The hardware acceleration yielded a 5x speedup and a corresponding ~5x power reduction during inference.
- Efficiency Metric: Inference clock cycle counts decreased dramatically from 26 million to 5.5 million.
- Resource Constraint: The entire bare-metal implementation successfully targeted devices with only 64kB of available RAM.
Technical Details
- Target Model: Keyword Transformer (KWT), adapted from the ARM KWT model.
- Model Optimization: The original KWT model's output classes were reduced from 35 to 2, facilitating model compression with only a 10% loss in accuracy.
- Hardware Implementation: The model was run in bare-metal C using a custom-developed edge AI library on a RISC-V platform.
- Custom Instructions: RISC-V ISA was extended to specifically accelerate computationally intensive functions like GELU (Gaussian Error Linear Unit) and SoftMax.
- Area Overhead: The implementation of custom instructions resulted in a small hardware area overhead of approximately 29%.
Implications
- Transformer Viability on Edge: This research demonstrates a viable method for porting complex, typically resource-heavy Transformer-based deep learning models onto severely low-power IoT devices (e.g., those with only 64kB RAM).
- RISC-V Ecosystem Maturity: It showcases the powerful potential and flexibility of the RISC-V architecture, specifically its custom instruction extension capability, for tailoring hardware to accelerate specific AI/ML workloads (like KWS).
- Advancing Embedded AI: By achieving high performance (5x speedup) and power efficiency in a tiny footprint, KWT-Tiny facilitates the widespread deployment of advanced voice interfaces and keyword spotting capabilities in the rapidly expanding low-power IoT market.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.