e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications
Abstract
This work introduces e-GPU, an open-source and highly configurable RISC-V Graphic Processing Unit designed to deliver parallel acceleration to ultra-low-power TinyAI devices. The platform utilizes a lightweight Tiny-OpenCL framework for programming and is integrated into the X-HEEP architecture for evaluation. Implemented in 16 nm technology, the e-GPU achieves up to a 15.1x speed-up and 3.1x energy reduction while adhering to a strict 28 mW power budget.
Report
Key Highlights
- Open-Source RISC-V GPU: e-GPU provides an open-source, configurable GPU platform specifically tailored for low-power edge devices (TinyAI).
- Power/Area Optimization: The design addresses the traditional challenge of high power and area requirements of GPUs, making parallel processing viable for resource-constrained environments.
- Lightweight Programming Framework: A dedicated Tiny-OpenCL implementation is used to provide a tailored, low-overhead programming environment.
- Significant Efficiency Gains: Benchmarks show the e-GPU achieving up to a 15.1x speed-up and 3.1x energy consumption reduction in bio-signal processing workloads.
- Low Power Budget: The system operates well within TinyAI limits, maintaining a maximum power budget of 28 mW and incurring only a 2.5x area overhead for the high-range configuration.
Technical Details
- Architecture: Embedded GPU (e-GPU) integrated with the eXtendible Heterogeneous Energy-Efficient Platform (X-HEEP) to form an Accelerated Processing Unit (APU).
- Technology & Frequency: Implemented in TSMC's 16 nm SVT CMOS technology, operating at 300 MHz and 0.8 V.
- Configurability: The e-GPU's extensive configurability allows users to optimize area and power for specific application requirements.
- Programming Model: Uses Tiny-OpenCL, with GeMM benchmarks demonstrating that scheduling overhead becomes negligible for matrix sizes larger than 256x256.
- Performance Results (High-Range Configuration): The configuration featuring 16 threads demonstrated a 15.1x speed-up and a 3.1x reduction in energy consumption compared to the baseline host.
Implications
- Democratization of Parallel Computing: By offering an open-source and energy-efficient RISC-V based GPU, e-GPU lowers the barrier to entry for incorporating high-performance parallel processing into commercial TinyAI and IoT chips.
- Filling the RISC-V Accelerator Gap: This work provides a vital, validated IP core, expanding the available ecosystem of hardware accelerators for RISC-V, which often lacks high-performance, low-power GPU solutions.
- Enabling Complex Edge AI: The demonstrated energy efficiency and speed-up allow more sophisticated and computationally intensive AI algorithms (like complex bio-signal processing) to run locally on the edge, enhancing latency and privacy.
- Software Ecosystem Maturity: The introduction of Tiny-OpenCL pushes the development of standardized, lightweight programming frameworks specifically suited for heterogenous RISC-V architectures in resource-constrained environments.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.