TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
Abstract
TT-Edge is a hardware-software co-designed framework engineered to overcome the high latency and energy costs associated with Tensor Train Decomposition (TTD) on resource-constrained edge AI devices. It achieves this by offloading compute-intensive Singular Value Decomposition (SVD) tasks to a specialized TTD Engine tightly integrated with a GEMM accelerator, significantly reducing data transfer overhead. Implemented on a RISC-V processor, the framework delivers a 1.7x speedup and a 40.2% reduction in energy consumption for model compression with minimal hardware overhead.
Report
TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
Key Highlights
- Target: Enables efficient, high-ratio model compression using Tensor Train Decomposition (TTD) directly on resource-constrained edge devices.
- Performance Gain: Achieves a 1.7x speedup compared to a GEMM-only baseline when compressing a ResNet-32 model via TTD.
- Energy Efficiency: Reduces overall energy usage by 40.2 percent by minimizing frequent matrix-vector transfers.
- Specialized Architecture: Features a dedicated TTD Engine that tightly integrates with an existing General Matrix Multiplication (GEMM) accelerator.
- Low Overhead: The design uses a lightweight approach, reusing GEMM resources and employing a shared floating-point unit (FPU), resulting in only a 4% increase in total power.
Technical Details
- Methodology: Hardware-Software Co-Design optimized for the challenging computational requirements of TTD, specifically focusing on the repeated Singular Value Decompositions (SVDs) and matrix multiplications.
- SVD Optimization: The compute-intensive SVD process is strategically split into two phases: bidiagonalization and diagonalization.
- TTD Engine Function: The specialized TTD Engine is responsible for executing the most computationally demanding phases of the split SVD, offloading the burden from the main processor.
- System Integration: The TTD Engine is designed for tight integration with the existing GEMM accelerator, which is crucial for curtailing the energy cost associated with frequent data movement between processing units.
- Platform and Testing: Implemented and validated on a RISC-V-based edge AI processor. Experimental results were verified using FPGA prototypes and post-synthesis power analysis conducted at the 45 nm technology node.
Implications
- Advancing Edge AI Model Compression: TT-Edge solves a fundamental bottleneck—the high cost of TTD—making high-quality, high-ratio model compression practical for distributed learning environments using low-power edge devices.
- Validation of RISC-V Customization: The successful implementation on a RISC-V platform showcases the platform's suitability for sophisticated domain-specific acceleration. It confirms that the RISC-V ecosystem can efficiently host custom, highly optimized IP (like the TTD Engine) alongside standard accelerators (GEMM) with minimal power penalty.
- Shifting the Co-Design Focus: This work emphasizes that maximizing efficiency in modern AI hardware requires specific architectural solutions not just for inference (GEMM), but also for critical training and optimization primitives like TTD and SVD.
- Standard for Energy Efficiency: By delivering both significant speedup and energy reduction simultaneously, TT-Edge sets a new efficiency benchmark for complex numerical linear algebra operations in resource-constrained computing.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.