Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREE
Abstract
This paper focuses on accelerating Generative AI workloads by successfully integrating optimized microkernel support for the RISC-V architecture within IREE, an MLIR-based machine learning compiler and runtime. The core technical approach involves lowering MLIR linalg dialect contraction operations to the linalg.mmt4d op specifically for the RISC-V64 target, followed by custom microkernel development. Performance validation was conducted using the Llama-3.2-1B-Instruct model, showing gains compared to both upstream IREE and Llama.cpp.
Report
Key Highlights
- The primary innovation is enabling optimized RISC-V microkernel support within the IREE machine learning compiler and runtime.
- The goal is to significantly accelerate Generative AI (GenAI) workloads on RISC-V hardware.
- The project uses the Llama-3.2-1B-Instruct model for benchmarking performance gains.
- Results compare the optimized implementation against performance figures from standard upstream IREE and the common reference implementation, Llama.cpp.
Technical Details
- Tooling: IREE (Intermediate Representation Execution Environment), an MLIR-based system, is used as the foundational compiler and runtime.
- Target Architecture: The optimization is specifically developed for the RISC-V64 target architecture.
- Optimization Pipeline: The core method involves modifying the IREE pass pipeline to handle MLIR
linalgdialect contraction operations. - Specific Op Lowering: These contraction operations are lowered to the
linalg.mmt4doperation. - Microkernels: Custom, optimized microkernels were developed specifically for RISC-V to execute the underlying matrix multiplication operations efficiently.
Implications
- Boosting RISC-V ML Performance: This work is crucial for establishing RISC-V as a high-performance, viable platform for demanding modern AI workloads, specifically large language models (LLMs).
- IREE Validation: It validates IREE's efficacy and flexibility as a compiler and runtime infrastructure capable of supporting and deeply optimizing for emerging Instruction Set Architectures (ISAs) like RISC-V.
- Edge AI Enablement: By improving the efficiency of core computation kernels, this enables more practical deployment of complex GenAI models onto RISC-V-based edge devices or specialized hardware, where performance and power efficiency are paramount.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.