MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI
Abstract
MARVEL is an automated, end-to-end framework designed to efficiently deploy deep neural networks (DNNs) on highly resource-constrained IoT devices operating in bare-metal environments. It achieves this by generating custom RISC-V Instruction Set Architecture (ISA) extensions tailored specifically to target DNN model classes, with a focus on convolutional neural networks (CNNs). The framework demonstrates significant performance gains, achieving a 2x speedup and up to 2x reduction in energy per inference compared to a baseline RISC-V core, at the cost of a 28.23% area overhead.
Report
MARVEL: Framework for Generating Model-Class Aware Custom RISC-V Extensions
Key Highlights
- Goal: Deploy DNNs efficiently on highly resource-constrained IoT devices operating in bare-metal environments (without an OS).
- Innovation: MARVEL is an end-to-end framework that automatically profiles high-level DNN models and generates custom RISC-V ISA extensions specifically optimized for the targeted model class.
- Bare-Metal Deployment: The flow produces an optimized bare-metal C implementation, entirely eliminating the need for conventional heavy software dependencies or runtimes like TensorFlow/PyTorch.
- Performance Results: Achieved a 2x speedup in inference and up to 2x reduction in energy per inference across various tested models.
- Hardware Cost: The implementation resulted in a measurable 28.23% area overhead when synthesized on the target FPGA platform.
Technical Details
MARVEL utilizes a three-stage toolchain flow to transition from Python-based high-level DNN representations to a specialized hardware core:
- Model Translation: Apache TVM is leveraged to translate the Python-based DNN models into highly optimized intermediate C code.
- ASIP Generation: Synopsys ASIP Designer is used to identify compute-intensive kernels, model the optimized instructions, and generate the custom, ISA-extended RISC-V core and associated compiler tools.
- FPGA Implementation: Xilinx Vivado is utilized for the physical implementation of the custom core onto the target hardware.
- Baseline Core: Synopsys trv32p3 RISC-V core.
- Target Platform: AMD Zynq UltraScale+ ZCU104 FPGA platform.
- Evaluated Models: LeNet-5, MobileNetV1, ResNet50, VGG16, MobileNetV2, and DenseNet121.
Implications
- Advancing Edge AI: MARVEL provides a crucial solution for deploying deep learning acceleration in deeply embedded systems (IoT endpoints) where power budgets and memory constraints prohibit traditional software stacks and operating systems.
- Validation of RISC-V Customization: The framework powerfully showcases the core advantage of the RISC-V architecture—its extensibility. By automating the generation of custom instructions based on application workload (model class), it maximizes hardware efficiency for specialized tasks.
- Reduced Software Overhead and Security: The ability to run AI tasks using minimal, bare-metal C code drastically reduces the software stack complexity, lowering memory usage, improving boot times, and potentially reducing the attack surface compared to solutions reliant on full runtime environments.
- Design Automation: By providing an automated, end-to-end toolchain, MARVEL lowers the barrier to entry for creating application-specific integrated processors (ASIPs) for the emerging lightweight AI domain.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.