OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling
Abstract
OpenGeMM is an open-source acceleration platform designed to address the high-utilization challenges of generic RISC-V systems when running DNNs on resource-constrained edge devices. It integrates a parameterized Chisel-coded GeMM core, a lightweight RISC-V control processor, and tightly coupled multi-banked scratchpad memory. This design consistently achieves hardware utilization between 81.89% and 99.34% and delivers a 3.58x to 16.40x speedup in normalized throughput over the state-of-the-art open-source Gemmini accelerator.
Report
Key Highlights
- High Utilization & Efficiency: OpenGeMM achieves consistently high hardware utilization, ranging from 81.89% to 99.34%, across diverse CNN and Transformer workloads.
- Performance Gain: The platform demonstrates a significant normalized throughput speedup, ranging from 3.58x to 16.40x, compared to the existing state-of-the-art open-source Gemmini accelerator.
- System Efficiency: OpenGeMM achieves a high system efficiency of 4.68 TOPS/W.
- Open-Source & Configurable: The platform is open-source and provides ease of configurability and programmability, addressing the rigid nature of bespoke accelerators.
Technical Details
- Core Architecture: The platform is built around a parameterized, Chisel-coded General Matrix Multiplication (GeMM) accelerator.
- Control System: Control is handled by a lightweight RISC-V processor, enabling flexibility and programmability.
- Memory Subsystem: Features a tightly coupled multi-banked scratchpad memory to minimize data transfer latency.
- Utilization Boosting Mechanisms: System efficiency and GeMM core utilization are optimized via three key mechanisms:
- Configuration pre-loading.
- Input pre-fetching combined with output buffering.
- Programmable strided memory access.
Implications
- Advancing RISC-V Acceleration: OpenGeMM provides a validated, highly efficient, and open-source template for integrating specialized accelerators with RISC-V cores. This directly addresses the common trade-off between flexibility (provided by RISC-V) and efficiency (provided by dedicated hardware).
- Edge AI Enablement: By ensuring near-perfect utilization (up to 99.34%) and high power efficiency (4.68 TOPS/W), OpenGeMM significantly lowers the barrier for deploying complex deep neural networks, including large Transformer models, onto extreme resource-constrained edge devices.
- Hardware Generation: The use of Chisel allows the platform to function as an accelerator generator, enabling fast design space exploration and customization for specific application requirements, further accelerating innovation in open hardware development.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.