HetGPU: The pursuit of making binary compatibility towards GPUs

HetGPU: The pursuit of making binary compatibility towards GPUs

Abstract

HetGPU is a novel system designed to overcome the challenge of binary incompatibility across heterogeneous GPU vendors, including NVIDIA, AMD, Intel, and Tenstorrent. The system employs a compiler to generate an architecture-agnostic Intermediate Representation (IR) that is dynamically translated by a runtime abstraction layer to the target hardware's native code. This framework successfully handles diverse execution models (SIMT vs. MIMD) and facilitates vendor-agnostic GPU computing, including state serialization necessary for live workload migration.

Report

Key Highlights

  • Vendor Agnostic Binary: HetGPU aims to make a single compiled GPU binary executable across major vendor hardware, specifically citing support for NVIDIA, AMD, Intel, and Tenstorrent.
  • System Components: The solution is composed of three main parts: a specialized compiler, a runtime environment, and a unifying abstraction layer.
  • Intermediate Representation (IR): The compiler outputs an architecture-agnostic GPU Intermediate Representation, augmented with essential metadata for execution state management.
  • Live Migration Support: The architecture includes mechanisms for state capture and reload, enabling the live migration of running GPU workloads across disparate hardware architectures with minimal reported overhead.
  • Execution Model Bridge: A core feature is bridging fundamentally different execution models, such as the warp-centric SIMT (NVIDIA/AMD) and the core-centric MIMD execution model found in Tenstorrent's RISC-V architecture.

Technical Details

  • Core Translation Mechanism: The runtime dynamically translates the architecture-agnostic IR into the target GPU's native instruction set and manages scheduling and memory model discrepancies.
  • Abstraction Layer: Provides a uniform API for foundational parallel concepts, including threads, global/local memory access, and synchronization primitives, regardless of the underlying GPU vendor.
  • Supported Architectures: The design explicitly addresses challenges posed by existing NVIDIA CUDA and AMD architectures, as well as emerging RISC-V based solutions like Tenstorrent's many-core designs.
  • State Serialization: A specific state capture/reload mechanism is implemented to ensure complete state serialization, which is crucial for enabling the seamless transition (live migration) of kernels mid-execution from one GPU type to a completely different type.

Implications

  • Reduced Vendor Lock-in: HetGPU fundamentally challenges the current ecosystem where GPU code is highly coupled to specific vendor instruction sets and driver stacks, offering unprecedented flexibility for developers and enterprises.
  • Boost for RISC-V Adoption: By treating the many-core RISC-V designs (like Tenstorrent) as first-class citizens alongside established x86-derived architectures, HetGPU significantly lowers the barrier to entry and increases the utility of emerging, open-source-aligned GPU solutions.
  • Enhanced Cloud/HPC Flexibility: The ability to achieve binary compatibility and live migration across varied hardware allows cloud providers and HPC clusters to utilize resources more efficiently, dynamically shifting workloads to available or cheaper hardware without requiring developer recompilation or manual code adjustments.
  • Pathway to Standardization: The definition of an effective, architecture-agnostic GPU IR could catalyze industry efforts toward a functional standard for heterogeneous parallel computing, mimicking successes seen in other areas of compilation technology.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →