Work-In-Progress: Accelerating Numpy With OpenBLAS For Open-Source RISC-V Chips

Work-In-Progress: Accelerating Numpy With OpenBLAS For Open-Source RISC-V Chips

Abstract

This preliminary work presents a methodology for accelerating high-level Python applications, specifically those utilizing the Numpy library, on heterogeneous RISC-V Systems-on-Chip (SoCs). The approach involves modifying the OpenBLAS library to utilize OpenMP for offloading selected linear algebra kernels to a programmable manycore accelerator (PMCA). By linking Numpy against this modified library, the researchers successfully demonstrated the acceleration of operators like matrix multiplication on an open-source RISC-V platform emulated on an FPGA.

Report

Key Highlights

  • Numpy applications are accelerated by linking them against a custom version of the OpenBLAS library.
  • The primary innovation is modifying OpenBLAS to offload selected linear algebra kernels (e.g., matrix multiplication) to dedicated hardware.
  • The target platform is an open-source heterogeneous RISC-V System-on-Chip (SoC).
  • The acceleration mechanism utilizes OpenMP directives to manage kernel execution on the accelerator.

Technical Details

  • Software Stack: The Python package Numpy is linked against the modified OpenBLAS library.
  • Offloading Mechanism: OpenMP is used to manage the data and kernel transfer to the accelerator.
  • Target Architecture: A heterogeneous SoC is utilized, featuring two distinct cores:
    • Host: rv64g architecture, capable of running Linux.
    • Accelerator (PMCA): rv32imafd architecture (Programmable Manycore Accelerator).
  • Implementation: The entire heterogeneous platform is implemented and evaluated using FPGA emulation.

Implications

  • Bridging Software/Hardware Gap: This work significantly simplifies the process of leveraging RISC-V hardware heterogeneity, allowing high-level scientific applications (Python/Numpy) to automatically benefit from specialized hardware acceleration without deep manual modification.
  • Enhancing RISC-V Capabilities: By accelerating crucial linear algebra operations (BLAS), this effort substantially improves the performance viability of open-source RISC-V chips for workloads in scientific computing, machine learning, and data processing.
  • Fostering Open-Source Ecosystem: The focus on open-source hardware (RISC-V) and software (Numpy, OpenBLAS) promotes the development of a fully transparent and customizable high-performance computing environment.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →