A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems
Abstract
This paper introduces a novel Direct Memory Access Controller (DMAC) specifically optimized to efficiently handle arbitrary transfers of small unit sizes, addressing the inefficiency of classical descriptor-based DMACs in heterogeneous computing environments. The innovation lies in a lightweight descriptor format combined with a low-overhead speculative descriptor prefetching scheme, successfully integrated into a 64-bit Linux-capable RISC-V SoC. Evaluation shows that the proposed DMAC achieves 1.66x less latency for launching transfers and increases bus utilization up to 3.6x for 64-byte transfers in deep memory systems, all while requiring significantly fewer hardware resources.
Report
Key Highlights
- Targeted Optimization: The DMAC is designed to efficiently handle irregular data transfers of small unit sizes, crucial for modern machine learning and heterogeneous systems.
- Performance Gain: Achieves 1.66x less latency when launching transfers compared to an off-the-shelf descriptor-based DMAC IP.
- Bus Utilization: Increases bus utilization up to 2.5x in ideal memory systems and extends this lead to 3.6x with 64-byte transfers in deep memory systems.
- Resource Efficiency: Requires significantly fewer resources, utilizing 11% fewer lookup tables (LUTs), 23% fewer flip-flops, and zero block RAMs (BRAMs) than comparable IPs.
- High Frequency: Synthesized using GlobalFoundries' GF12LP+ node, the DMAC operates at a clock frequency exceeding 1.44 GHz.
Technical Details
- Core Architecture: The DMAC employs a descriptor-based architecture optimized with a lightweight descriptor format to reduce static setup overhead.
- Performance Mechanism: Includes a low-overhead speculative descriptor prefetching scheme designed to increase performance without introducing latency penalties upon misprediction.
- Implementation Base: The DMAC is based on the AXI4 protocol.
- Integration: The DMAC was integrated into a 64-bit Linux-capable RISC-V SoC.
- Evaluation Environment: Performance was evaluated using emulation on a Kintex FPGA.
- Physical Metrics: The synthesized DMAC occupies only 49.5 kGE (kilo Gate Equivalents) on the GF12LP+ node.
Implications
- Enhanced RISC-V Capabilities: Provides a robust, high-performance hardware accelerator IP essential for running complex, data-intensive applications (like ML) efficiently on RISC-V Linux systems.
- Addressing Data Irregularity: Solves a major bottleneck in heterogeneous computing where transfers often involve small, scattered data units, allowing processors and accelerators to be decoupled more effectively.
- Cost-Effective IP: The significant reduction in resource utilization (especially eliminating BRAMs) makes this DMAC highly attractive for deployment in resource-constrained environments or as a smaller, integrated component within large SoCs.
- High Frequency/Advanced Node Viability: Achieving over 1.44 GHz on GF12LP+ demonstrates the DMAC's readiness for deployment in modern, high-performance computing platforms.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.