Evaluating IOMMU-Based Shared Virtual Addressing for RISC-V Embedded Heterogeneous SoCs

Evaluating IOMMU-Based Shared Virtual Addressing for RISC-V Embedded Heterogeneous SoCs

Abstract

This work quantitatively evaluates Input-Output Memory Management Unit (IOMMU)-based Shared Virtual Addressing (SVA) for RISC-V embedded heterogeneous SoCs, enabling zero-copy data offloading between host and accelerator. While IO address translation introduces significant overhead in high-latency memory systems (up to 17.6%), the authors demonstrate that integrating a Last-Level Cache drastically reduces this cost to below 1%. This finding validates SVA as a suitable and efficient mechanism for high-performance zero-copy operation in modern RISC-V heterogeneous architectures.

Report

Structured Report: Evaluating IOMMU-Based Shared Virtual Addressing for RISC-V Embedded Heterogeneous SoCs

Key Highlights

  • Innovation Focus: The paper evaluates the integration of IOMMU technology to enable Shared Virtual Addressing (SVA) in open-source RISC-V embedded heterogeneous systems, thereby facilitating zero-copy data transfer.
  • Problem Solved: SVA overcomes the traditional performance bottleneck and poor memory utilization caused by mandatory data copying between the host's virtual address space and the accelerator's physical address space.
  • Performance Bottleneck Identified: On systems with high-latency memory, IO virtual address translation accounts for a significant performance hit, ranging from 4.2% up to 17.6% of the accelerator's runtime for compute kernels like gemm.
  • Effective Mitigation: The research demonstrates that implementing a Last-Level Cache (LLC) virtually eliminates this overhead, dropping the address translation cost to 0.4% and 0.7% under the same high-latency conditions.

Technical Details

  • Architecture Tested: An open-source heterogeneous RISC-V SoC design.
  • Component Configuration: The SoC consists of a 64-bit host processor coupled with a 32-bit accelerator cluster.
  • Core Mechanism: Input-Output Memory Management Units (IOMMUs) are used to map IO virtual addresses (IOVAs) used by accelerators to physical memory.
  • Evaluation Method: The system design was emulated on an FPGA platform to measure real-world performance characteristics.
  • Software Stack: Compute kernels were derived from the RajaPERF benchmark suite and implemented using heterogeneous OpenMP programming.
  • Latency Challenge: Resolving an IO virtual address can require up to three sequential memory accesses on an IOTLB (Input-Output Translation Lookaside Buffer) miss, making the overhead highly sensitive to DRAM access latency.

Implications

  • Enabling Zero-Copy RISC-V: This work establishes a quantitative foundation for implementing efficient zero-copy offloading in RISC-V embedded SoCs, which is essential for maximizing accelerator throughput and reducing host CPU load.
  • System Design Guidance: It provides critical architectural guidance, proving that IOMMU integration is only fully efficient when memory access latency is aggressively mitigated, strongly favoring the inclusion of Last-Level Caches in these heterogeneous designs.
  • Competitive Advantage: The successful evaluation of SVA integration enhances the competitive viability of RISC-V in high-performance embedded markets (e.g., automotive, edge computing) where energy efficiency and high throughput are paramount.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →