Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREE

Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREE

Abstract

This paper focuses on accelerating Generative AI workloads by successfully integrating optimized microkernel support for the RISC-V architecture within IREE, an MLIR-based machine learning compiler and runtime. The core technical approach involves lowering MLIR linalg dialect contraction operations to the linalg.mmt4d op specifically for the RISC-V64 target, followed by custom microkernel development. Performance validation was conducted using the Llama-3.2-1B-Instruct model, showing gains compared to both upstream IREE and Llama.cpp.

Report

Key Highlights

  • The primary innovation is enabling optimized RISC-V microkernel support within the IREE machine learning compiler and runtime.
  • The goal is to significantly accelerate Generative AI (GenAI) workloads on RISC-V hardware.
  • The project uses the Llama-3.2-1B-Instruct model for benchmarking performance gains.
  • Results compare the optimized implementation against performance figures from standard upstream IREE and the common reference implementation, Llama.cpp.

Technical Details

  • Tooling: IREE (Intermediate Representation Execution Environment), an MLIR-based system, is used as the foundational compiler and runtime.
  • Target Architecture: The optimization is specifically developed for the RISC-V64 target architecture.
  • Optimization Pipeline: The core method involves modifying the IREE pass pipeline to handle MLIR linalg dialect contraction operations.
  • Specific Op Lowering: These contraction operations are lowered to the linalg.mmt4d operation.
  • Microkernels: Custom, optimized microkernels were developed specifically for RISC-V to execute the underlying matrix multiplication operations efficiently.

Implications

  • Boosting RISC-V ML Performance: This work is crucial for establishing RISC-V as a high-performance, viable platform for demanding modern AI workloads, specifically large language models (LLMs).
  • IREE Validation: It validates IREE's efficacy and flexibility as a compiler and runtime infrastructure capable of supporting and deeply optimizing for emerging Instruction Set Architectures (ISAs) like RISC-V.
  • Edge AI Enablement: By improving the efficiency of core computation kernels, this enables more practical deployment of complex GenAI models onto RISC-V-based edge devices or specialized hardware, where performance and power efficiency are paramount.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →