A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems

A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems

Abstract

This paper introduces a novel Direct Memory Access Controller (DMAC) specifically optimized to efficiently handle arbitrary transfers of small unit sizes, addressing the inefficiency of classical descriptor-based DMACs in heterogeneous computing environments. The innovation lies in a lightweight descriptor format combined with a low-overhead speculative descriptor prefetching scheme, successfully integrated into a 64-bit Linux-capable RISC-V SoC. Evaluation shows that the proposed DMAC achieves 1.66x less latency for launching transfers and increases bus utilization up to 3.6x for 64-byte transfers in deep memory systems, all while requiring significantly fewer hardware resources.

Report

Key Highlights

  • Targeted Optimization: The DMAC is designed to efficiently handle irregular data transfers of small unit sizes, crucial for modern machine learning and heterogeneous systems.
  • Performance Gain: Achieves 1.66x less latency when launching transfers compared to an off-the-shelf descriptor-based DMAC IP.
  • Bus Utilization: Increases bus utilization up to 2.5x in ideal memory systems and extends this lead to 3.6x with 64-byte transfers in deep memory systems.
  • Resource Efficiency: Requires significantly fewer resources, utilizing 11% fewer lookup tables (LUTs), 23% fewer flip-flops, and zero block RAMs (BRAMs) than comparable IPs.
  • High Frequency: Synthesized using GlobalFoundries' GF12LP+ node, the DMAC operates at a clock frequency exceeding 1.44 GHz.

Technical Details

  • Core Architecture: The DMAC employs a descriptor-based architecture optimized with a lightweight descriptor format to reduce static setup overhead.
  • Performance Mechanism: Includes a low-overhead speculative descriptor prefetching scheme designed to increase performance without introducing latency penalties upon misprediction.
  • Implementation Base: The DMAC is based on the AXI4 protocol.
  • Integration: The DMAC was integrated into a 64-bit Linux-capable RISC-V SoC.
  • Evaluation Environment: Performance was evaluated using emulation on a Kintex FPGA.
  • Physical Metrics: The synthesized DMAC occupies only 49.5 kGE (kilo Gate Equivalents) on the GF12LP+ node.

Implications

  • Enhanced RISC-V Capabilities: Provides a robust, high-performance hardware accelerator IP essential for running complex, data-intensive applications (like ML) efficiently on RISC-V Linux systems.
  • Addressing Data Irregularity: Solves a major bottleneck in heterogeneous computing where transfers often involve small, scattered data units, allowing processors and accelerators to be decoupled more effectively.
  • Cost-Effective IP: The significant reduction in resource utilization (especially eliminating BRAMs) makes this DMAC highly attractive for deployment in resource-constrained environments or as a smaller, integrated component within large SoCs.
  • High Frequency/Advanced Node Viability: Achieving over 1.44 GHz on GF12LP+ demonstrates the DMAC's readiness for deployment in modern, high-performance computing platforms.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →