Work-in-Progress: Real-Time Neural Network Inference on a Custom RISC-V Multicore Vector Processor
Abstract
This work presents a custom RISC-V multicore vector processor architecture combined with a novel compiler-based deployment toolchain tailored for real-time neural network inference. The design addresses the gap between high-performance accelerators (which lack predictability) and conventional real-time hardware (which lacks resources). Predictability is ensured by using a central management core and a static, compile-time schedule for all shared external memory accesses, eliminating interference. This methodology allows for the precise calculation of the overall system's Worst-Case Execution Time (WCET) based on subtask estimates and transfer times.
Report
Key Highlights
- Predictable High Performance: The primary innovation is solving the dichotomy between high-performance neural network acceleration and deterministic, predictable timing behavior required by real-time systems (e.g., autonomous driving).
- Custom RISC-V Architecture: The proposed solution centers on a custom RISC-V multicore vector processor optimized for inference tasks.
- Static Scheduling for Predictability: Shared memory interference, the main limiter of predictability in modern accelerators, is mitigated entirely by using a static memory access schedule calculated during compilation.
- Robust WCET Estimation: The approach allows the total system's Worst-Case Execution Time (WCET) to be accurately derived from the WCET estimates of individual subtasks, data transfer times, and scheduled shared memory access latencies.
Technical Details
- Processor Type: Custom RISC-V Multicore Vector Processor.
- Core Structure: Consists of multiple predictable processing cores, each utilizing local scratchpad memories instead of caches to maximize timing determinism.
- Memory Hierarchy: Features local scratchpad memories and shared external memory.
- Management Core: A central management core is responsible for orchestrating the overall computation and facilitating controlled access to the shared external memory.
- Toolchain Methodology: The compiler exploits the inherent fixed data flow characteristic of neural networks to estimate the required resource usage and timing.
- Scheduling Input: The static schedule calculation relies on Worst-Case Execution Time (WCET) estimates for subtasks running on individual cores.
Implications
- Advancing Real-Time AI: This architecture provides a crucial advancement for incorporating complex neural networks into hard real-time and safety-critical domains, such as automotive and industrial control, where timing guarantees are non-negotiable.
- RISC-V Ecosystem Expansion: It demonstrates the potential of RISC-V for highly specialized, deterministic accelerator design, opening new market segments (safety-critical embedded systems) for the instruction set.
- Memory Determinism Paradigm: By replacing dynamic cache-based systems (which introduce timing interference) with static, schedule-driven scratchpad systems for memory access, the design establishes a viable pathway for deterministic multicore acceleration.
- Compiler Dependence: The success of the predictability model is tied directly to the accompanying compiler, emphasizing the importance of sophisticated toolchains in next-generation hardware design.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.