MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration
Abstract
MaRVIn is a cross-layer hardware-software co-design framework that introduces novel ISA extensions and micro-architectural enhancements to optimize mixed-precision Deep Neural Network (DNN) inference on RISC-V processors. It tackles the inefficiency of standard embedded cores by enhancing the ALU with configurable mixed-precision arithmetic (2, 4, and 8 bits) and employing techniques like soft SIMD and multi-pumping. This framework achieves significant performance gains, demonstrating an average 17.6x speedup and up to 1.8 TOPs/W energy efficiency for less than 1% accuracy loss over state-of-the-art ISA-agnostic RISC-V cores.
Report
Structured Report: MaRVIn Cross-Layer Mixed-Precision RISC-V Framework
Key Highlights
- Cross-Layer Innovation: MaRVIn is a comprehensive hardware-software co-design framework addressing the lack of efficient mixed-precision support in existing embedded microprocessors.
- Performance Leap: The framework achieves a significant performance improvement, demonstrating an average 17.6x speedup over ISA-agnostic state-of-the-art RISC-V cores.
- High Energy Efficiency: MaRVIn delivers excellent power efficiency, reaching up to 1.8 Tera Operations per Watt (TOPs/W).
- Accuracy Preservation: Performance gains are achieved with minimal impact on model quality, maintaining less than 1% accuracy loss on widely used DNNs and datasets (e.g., CIFAR10, ImageNet).
- Custom RISC-V ISA Extension: The solution is built around novel ISA extensions specifically designed to eliminate overheads associated with mixed-precision data handling (e.g., excessive packing/unpacking).
Technical Details
- Hardware Architecture: The micro-architecture enhances the Arithmetic Logic Unit (ALU) to support configurable mixed-precision arithmetic (2-bit, 4-bit, and 8-bit) for both weights and activations.
- Acceleration Methods: It implements soft Single Instruction, Multiple Data (SIMD) for efficient 2-bit operations and utilizes multi-pumping techniques to reduce execution latency.
- Software Optimization: The software layer integrates a pruning-aware fine-tuning method to optimize model compression alongside a greedy-based Design Space Exploration (DSE) approach. This DSE method efficiently searches for Pareto-optimal mixed-quantized models.
- Power Optimization: The system incorporates voltage scaling capabilities to further boost overall power efficiency.
Implications
- Validating RISC-V Extensibility: MaRVIn serves as a strong example of how RISC-V's open and extensible ISA can be leveraged to create highly specialized, domain-specific accelerators, moving beyond general-purpose computing.
- Advancing Edge AI Inference: By drastically increasing speedup (17.6x) and energy efficiency (1.8 TOPs/W), MaRVIn makes RISC-V cores highly competitive for energy-constrained edge and IoT devices requiring high-performance Deep Learning inference capabilities.
- Mitigating Quantization Overhead: The introduced ISA extensions directly address a major bottleneck in mixed-precision execution—the inefficiency of data handling—thereby setting a new standard for how low-precision operations should be architecturally supported.
- Future RISC-V Core Design: The findings suggest that future high-performance RISC-V cores aimed at AI workloads should incorporate similar configurable, mixed-precision ALU enhancements and specialized instructions rather than relying on standard vector or SIMD extensions alone.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.