Arrow: A RISC-V Vector Accelerator for Machine Learning Inference

Arrow: A RISC-V Vector Accelerator for Machine Learning Inference

Abstract

This paper introduces Arrow, a configurable hardware accelerator architecture implementing a subset of the RISC-V v0.9 vector ISA extension, specifically targeting edge machine learning inference. Benchmarked against fundamental vector and matrix operations, Arrow demonstrates significant performance improvements over a scalar RISC processor. When implemented on a Xilinx FPGA, the Arrow co-processor achieves speedups between 2x and 78x while simultaneously consuming 20% to 99% less energy.

Report

Key Highlights

  • Novel Architecture: Introduction of "Arrow," a configurable hardware accelerator for computational efficiency.
  • RISC-V Compliance: Arrow implements a critical subset of the RISC-V v0.9 vector ISA extension.
  • Target Application: Focused on accelerating edge machine learning (ML) inference tasks.
  • Performance Gain: Achieves substantial acceleration, ranging from 2x to 78x faster execution compared to a scalar RISC processor.
  • Energy Efficiency: Delivers significant energy savings, consuming 20% to 99% less energy during execution.

Technical Details

  • Accelerator Name: Arrow.
  • ISA Basis: RISC-V v0.9 vector ISA extension (subset).
  • Core Workloads: Execution of vector and matrix benchmarks that are fundamental to ML inference.
  • Implementation Platform: Validation and testing were performed using a Xilinx XC7A200T-1SBG484C FPGA.
  • Design Goal: The architecture is designed to be configurable, allowing optimization for specific edge ML requirements.

Implications

  • Validation of RISC-V Vector ISA: The successful implementation and high-performance results of Arrow validate the effectiveness of the RISC-V vector extension (v0.9) as a basis for hardware acceleration, particularly for computationally intensive tasks like ML.
  • Edge AI Enablement: Arrow provides a proven, highly efficient, and energy-conscious solution for deploying ML inference models directly on resource-constrained edge devices.
  • Open Hardware Ecosystem Growth: By demonstrating a high-performance vector accelerator built upon the open RISC-V standard, this work encourages further development and adoption of specialized hardware within the RISC-V ecosystem, driving innovation in custom computing solutions.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →