MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration

MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration

Abstract

MaRVIn is a cross-layer hardware-software co-design framework that introduces novel ISA extensions and micro-architectural enhancements to optimize mixed-precision Deep Neural Network (DNN) inference on RISC-V processors. It tackles the inefficiency of standard embedded cores by enhancing the ALU with configurable mixed-precision arithmetic (2, 4, and 8 bits) and employing techniques like soft SIMD and multi-pumping. This framework achieves significant performance gains, demonstrating an average 17.6x speedup and up to 1.8 TOPs/W energy efficiency for less than 1% accuracy loss over state-of-the-art ISA-agnostic RISC-V cores.

Report

Structured Report: MaRVIn Cross-Layer Mixed-Precision RISC-V Framework

Key Highlights

  • Cross-Layer Innovation: MaRVIn is a comprehensive hardware-software co-design framework addressing the lack of efficient mixed-precision support in existing embedded microprocessors.
  • Performance Leap: The framework achieves a significant performance improvement, demonstrating an average 17.6x speedup over ISA-agnostic state-of-the-art RISC-V cores.
  • High Energy Efficiency: MaRVIn delivers excellent power efficiency, reaching up to 1.8 Tera Operations per Watt (TOPs/W).
  • Accuracy Preservation: Performance gains are achieved with minimal impact on model quality, maintaining less than 1% accuracy loss on widely used DNNs and datasets (e.g., CIFAR10, ImageNet).
  • Custom RISC-V ISA Extension: The solution is built around novel ISA extensions specifically designed to eliminate overheads associated with mixed-precision data handling (e.g., excessive packing/unpacking).

Technical Details

  • Hardware Architecture: The micro-architecture enhances the Arithmetic Logic Unit (ALU) to support configurable mixed-precision arithmetic (2-bit, 4-bit, and 8-bit) for both weights and activations.
  • Acceleration Methods: It implements soft Single Instruction, Multiple Data (SIMD) for efficient 2-bit operations and utilizes multi-pumping techniques to reduce execution latency.
  • Software Optimization: The software layer integrates a pruning-aware fine-tuning method to optimize model compression alongside a greedy-based Design Space Exploration (DSE) approach. This DSE method efficiently searches for Pareto-optimal mixed-quantized models.
  • Power Optimization: The system incorporates voltage scaling capabilities to further boost overall power efficiency.

Implications

  • Validating RISC-V Extensibility: MaRVIn serves as a strong example of how RISC-V's open and extensible ISA can be leveraged to create highly specialized, domain-specific accelerators, moving beyond general-purpose computing.
  • Advancing Edge AI Inference: By drastically increasing speedup (17.6x) and energy efficiency (1.8 TOPs/W), MaRVIn makes RISC-V cores highly competitive for energy-constrained edge and IoT devices requiring high-performance Deep Learning inference capabilities.
  • Mitigating Quantization Overhead: The introduced ISA extensions directly address a major bottleneck in mixed-precision execution—the inefficiency of data handling—thereby setting a new standard for how low-precision operations should be architecturally supported.
  • Future RISC-V Core Design: The findings suggest that future high-performance RISC-V cores aimed at AI workloads should incorporate similar configurable, mixed-precision ALU enhancements and specialized instructions rather than relying on standard vector or SIMD extensions alone.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →