Research

KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer

Admin

0 views • 2 years ago (Updated) • 2 min read •

•

Abstract

This paper presents KWT-Tiny, an aggressively quantized and retrained Keyword Transformer (KWT) model adapted for bare-metal RISC-V edge devices operating under strict 64kB RAM constraints. Model optimization reduced the size from 2.42 MB to 1.65 kB (a 369x reduction) while maintaining accuracy with only a 10% loss. Crucially, the integration of custom RISC-V instructions to accelerate GELU and SoftMax operations resulted in a substantial 5x speedup and corresponding power reduction during inference.

Report

Key Highlights

Extreme Model Reduction: The KWT-1 model was retrained and quantized, achieving a 369x reduction in size, dropping from 2.42 MB to just 1.65 kB.
RISC-V Acceleration: Custom RISC-V instructions were developed specifically to accelerate the GELU and SoftMax operations, critical for Transformer performance.
Performance Gain: The hardware acceleration yielded a 5x speedup and a corresponding ~5x power reduction during inference.
Efficiency Metric: Inference clock cycle counts decreased dramatically from 26 million to 5.5 million.
Resource Constraint: The entire bare-metal implementation successfully targeted devices with only 64kB of available RAM.

Technical Details

Target Model: Keyword Transformer (KWT), adapted from the ARM KWT model.
Model Optimization: The original KWT model's output classes were reduced from 35 to 2, facilitating model compression with only a 10% loss in accuracy.
Hardware Implementation: The model was run in bare-metal C using a custom-developed edge AI library on a RISC-V platform.
Custom Instructions: RISC-V ISA was extended to specifically accelerate computationally intensive functions like GELU (Gaussian Error Linear Unit) and SoftMax.
Area Overhead: The implementation of custom instructions resulted in a small hardware area overhead of approximately 29%.

Implications

Transformer Viability on Edge: This research demonstrates a viable method for porting complex, typically resource-heavy Transformer-based deep learning models onto severely low-power IoT devices (e.g., those with only 64kB RAM).
RISC-V Ecosystem Maturity: It showcases the powerful potential and flexibility of the RISC-V architecture, specifically its custom instruction extension capability, for tailoring hardware to accelerate specific AI/ML workloads (like KWS).
Advancing Embedded AI: By achieving high performance (5x speedup) and power efficiency in a tiny footprint, KWT-Tiny facilitates the widespread deployment of advanced voice interfaces and keyword spotting capabilities in the rapidly expanding low-power IoT market.

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →