MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI

MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI

Abstract

MARVEL is an automated, end-to-end framework designed to efficiently deploy deep neural networks (DNNs) on highly resource-constrained IoT devices operating in bare-metal environments. It achieves this by generating custom RISC-V Instruction Set Architecture (ISA) extensions tailored specifically to target DNN model classes, with a focus on convolutional neural networks (CNNs). The framework demonstrates significant performance gains, achieving a 2x speedup and up to 2x reduction in energy per inference compared to a baseline RISC-V core, at the cost of a 28.23% area overhead.

Report

MARVEL: Framework for Generating Model-Class Aware Custom RISC-V Extensions

Key Highlights

  • Goal: Deploy DNNs efficiently on highly resource-constrained IoT devices operating in bare-metal environments (without an OS).
  • Innovation: MARVEL is an end-to-end framework that automatically profiles high-level DNN models and generates custom RISC-V ISA extensions specifically optimized for the targeted model class.
  • Bare-Metal Deployment: The flow produces an optimized bare-metal C implementation, entirely eliminating the need for conventional heavy software dependencies or runtimes like TensorFlow/PyTorch.
  • Performance Results: Achieved a 2x speedup in inference and up to 2x reduction in energy per inference across various tested models.
  • Hardware Cost: The implementation resulted in a measurable 28.23% area overhead when synthesized on the target FPGA platform.

Technical Details

MARVEL utilizes a three-stage toolchain flow to transition from Python-based high-level DNN representations to a specialized hardware core:

  1. Model Translation: Apache TVM is leveraged to translate the Python-based DNN models into highly optimized intermediate C code.
  2. ASIP Generation: Synopsys ASIP Designer is used to identify compute-intensive kernels, model the optimized instructions, and generate the custom, ISA-extended RISC-V core and associated compiler tools.
  3. FPGA Implementation: Xilinx Vivado is utilized for the physical implementation of the custom core onto the target hardware.
  • Baseline Core: Synopsys trv32p3 RISC-V core.
  • Target Platform: AMD Zynq UltraScale+ ZCU104 FPGA platform.
  • Evaluated Models: LeNet-5, MobileNetV1, ResNet50, VGG16, MobileNetV2, and DenseNet121.

Implications

  • Advancing Edge AI: MARVEL provides a crucial solution for deploying deep learning acceleration in deeply embedded systems (IoT endpoints) where power budgets and memory constraints prohibit traditional software stacks and operating systems.
  • Validation of RISC-V Customization: The framework powerfully showcases the core advantage of the RISC-V architecture—its extensibility. By automating the generation of custom instructions based on application workload (model class), it maximizes hardware efficiency for specialized tasks.
  • Reduced Software Overhead and Security: The ability to run AI tasks using minimal, bare-metal C code drastically reduces the software stack complexity, lowering memory usage, improving boot times, and potentially reducing the attack surface compared to solutions reliant on full runtime environments.
  • Design Automation: By providing an automated, end-to-end toolchain, MARVEL lowers the barrier to entry for creating application-specific integrated processors (ASIPs) for the emerging lightweight AI domain.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →