CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device
Abstract
CIMR-V is an end-to-end SRAM-based Computing-in-Memory (CIM) accelerator integrated with a RISC-V core designed specifically for AI edge devices. The architecture addresses the long latency associated with loading large models from DRAM by employing techniques like layer and weight fusion, achieving an 85.14% latency reduction for keyword spotting. Leveraging custom CIM-type instructions, CIMR-V successfully synergizes RISC-V programmability with CIM's high energy efficiency, yielding 3707.84 TOPS/W in TSMC 28nm technology.
Report
Structured Analysis Report
Key Highlights
- Innovation: Introduction of CIMR-V, an end-to-end SRAM-based Computing-in-Memory (CIM) accelerator tightly coupled with a RISC-V core.
- Performance: Achieved 85.14% reduction in latency for the keyword spotting AI model by minimizing data movement.
- Energy Efficiency: Demonstrated exceptional energy efficiency of 3707.84 TOPS/W.
- Manufacturing: Implemented using TSMC 28nm technology.
- Throughput: Capable of 26.21 TOPS at a 50 MHz operating frequency.
Technical Details
- Core Technology: Utilizes SRAM-based Computing-in-Memory (CIM) for high parallelism and minimal data movement.
- Architecture Integration: The system integrates a RISC-V processor core, which manages the CIM accelerator.
- Customization: The design employs custom “CIM-type instructions” to enable seamless hardware/software interaction and end-to-end AI model inference.
- Optimization Strategies: Key architectural optimizations used to mitigate DRAM latency issues include:
- CIM layer fusion.
- Convolution/max pooling pipeline implementation.
- Weight fusion.
- Scope: CIMR-V supports full stack flow and end-to-end AI model inference, addressing a common shortcoming in previous partial SRAM-based CIM architectures.
Implications
- Validation of RISC-V Customization: CIMR-V serves as a strong case study demonstrating the flexibility of the RISC-V Instruction Set Architecture (ISA), proving that custom instructions can be effectively added to handle specialized, high-efficiency compute paradigms like CIM.
- Advancing Edge AI: By achieving massive energy efficiency (3707.84 TOPS/W) while maintaining high programmability via RISC-V, this design significantly lowers the power consumption barrier for deploying complex, large AI models on constrained edge devices.
- Solving CIM Bottlenecks: The successful implementation of fusion techniques addresses a critical weakness of CIM—the latency and energy cost incurred when fetching weights from external DRAM—making the CIM paradigm practical for commercial deployment.
- Ecosystem Development: The combination of a standardized, open core (RISC-V) with a high-performance accelerator provides a foundation for developing complete, efficient hardware/software stacks for next-generation machine learning silicon.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.