Research

Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning Inference

Admin

0 views • 6 months ago (Updated) • 2 min read •

•

Abstract

This paper introduces a novel System-on-Chip (SoC) architecture that tightly couples a 32-bit Codasip uRISC_V core with the open-source NVDLA for efficient deep learning inference on edge devices. The core innovation is a bare-metal toolflow that generates assembly application code, successfully bypassing complex operating system overheads to maximize execution speed and storage efficiency. Benchmarked on an AMD ZCU102 FPGA using NVDLA-small configuration, the system achieved fast inference times for ResNet-18 (16.2 ms) at a clock frequency of 100 MHz.

Report

Key Highlights

Presents a novel SoC architecture combining a RISC-V core with the open-source NVDLA for high-efficiency deep learning inference.
Utilizes a bare-metal toolflow that generates optimized assembly code, bypassing the high overhead associated with traditional operating systems to achieve greater execution speed.
The tightly coupled hardware and bare-metal software methodology significantly improves storage efficiency, specifically targeting resource-constrained edge computing solutions.
The system was successfully implemented and evaluated on an AMD ZCU102 FPGA using the NVDLA-small configuration.

Technical Details

Core Architecture: A 32-bit, 4-stage pipelined RISC-V core, specifically the Codasip uRISC_V, is used as the control processor.
Accelerator: The open-source NVIDIA Deep Learning Accelerator (NVDLA) is tightly coupled to the CPU.
Software Flow: Model acceleration offloading is handled by bare-metal application code generated directly in assembly, circumventing the need for an operating system.
Evaluation Platform: AMD ZCU102 FPGA board.
Clock Frequency: System evaluation performed at 100 MHz.
**Performance Benchmarks (Inference Time):
- LeNet-5: 4.8 ms
- ResNet-18: 16.2 ms
- ResNet-50: 1.1 s

Implications

Optimization for Edge AI: The bare-metal approach establishes a standard for minimizing latency and maximizing determinism in resource-constrained edge AI, addressing critical real-time performance requirements.
RISC-V and Open Hardware Validation: The project successfully demonstrates the feasibility and performance benefits of integrating two key open-source hardware components—the RISC-V ISA and the NVDLA IP—into a competitive, domain-specific accelerator.
Low-Overhead Solution: By proving that complex deep learning models can be executed efficiently without OS interference, this work encourages the development of highly specialized, low-power RISC-V based accelerators.

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →