Work-in-Progress: Real-Time Neural Network Inference on a Custom RISC-V Multicore Vector Processor

Work-in-Progress: Real-Time Neural Network Inference on a Custom RISC-V Multicore Vector Processor

Abstract

This work presents a custom RISC-V multicore vector processor architecture combined with a novel compiler-based deployment toolchain tailored for real-time neural network inference. The design addresses the gap between high-performance accelerators (which lack predictability) and conventional real-time hardware (which lacks resources). Predictability is ensured by using a central management core and a static, compile-time schedule for all shared external memory accesses, eliminating interference. This methodology allows for the precise calculation of the overall system's Worst-Case Execution Time (WCET) based on subtask estimates and transfer times.

Report

Key Highlights

  • Predictable High Performance: The primary innovation is solving the dichotomy between high-performance neural network acceleration and deterministic, predictable timing behavior required by real-time systems (e.g., autonomous driving).
  • Custom RISC-V Architecture: The proposed solution centers on a custom RISC-V multicore vector processor optimized for inference tasks.
  • Static Scheduling for Predictability: Shared memory interference, the main limiter of predictability in modern accelerators, is mitigated entirely by using a static memory access schedule calculated during compilation.
  • Robust WCET Estimation: The approach allows the total system's Worst-Case Execution Time (WCET) to be accurately derived from the WCET estimates of individual subtasks, data transfer times, and scheduled shared memory access latencies.

Technical Details

  • Processor Type: Custom RISC-V Multicore Vector Processor.
  • Core Structure: Consists of multiple predictable processing cores, each utilizing local scratchpad memories instead of caches to maximize timing determinism.
  • Memory Hierarchy: Features local scratchpad memories and shared external memory.
  • Management Core: A central management core is responsible for orchestrating the overall computation and facilitating controlled access to the shared external memory.
  • Toolchain Methodology: The compiler exploits the inherent fixed data flow characteristic of neural networks to estimate the required resource usage and timing.
  • Scheduling Input: The static schedule calculation relies on Worst-Case Execution Time (WCET) estimates for subtasks running on individual cores.

Implications

  • Advancing Real-Time AI: This architecture provides a crucial advancement for incorporating complex neural networks into hard real-time and safety-critical domains, such as automotive and industrial control, where timing guarantees are non-negotiable.
  • RISC-V Ecosystem Expansion: It demonstrates the potential of RISC-V for highly specialized, deterministic accelerator design, opening new market segments (safety-critical embedded systems) for the instruction set.
  • Memory Determinism Paradigm: By replacing dynamic cache-based systems (which introduce timing interference) with static, schedule-driven scratchpad systems for memory access, the design establishes a viable pathway for deterministic multicore acceleration.
  • Compiler Dependence: The success of the predictability model is tied directly to the accompanying compiler, emphasizing the importance of sophisticated toolchains in next-generation hardware design.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →