Adding Explicit Load-Acquire and Store-Release Instructions to the RISC-V ISA
Abstract
This paper proposes the addition of explicit load-acquire and store-release instructions to the RISC-V ISA, a crucial step for managing synchronization in architectures utilizing weak memory models. The authors demonstrate support by integrating these new semantics into the herd formal memory model, the gem5 simulator, and the LLVM/Clang toolchain. Standardizing these instructions is urgent to prevent RISC-V ecosystem fragmentation through divergent Application Binary Interface (ABI) implementations.
Report
Key Highlights
- ISA Extension Proposal: The core contribution is the proposal and implementation exploration of explicit
load-acquireandstore-releaseinstructions for the RISC-V ISA. - Weak Memory Optimization: These instructions are necessary to maintain correctness (enforcing ordering) when using weak memory models, which otherwise simplify hardware design and improve performance.
- Ecosystem Integration: The semantics were successfully integrated into critical tools, including the
herdformal memory model, thegem5cycle-approximate simulator, and theLLVM/Clangcompiler toolchain. - Muted Performance Impact: Initial findings showed that for workloads characterized by high sharing and heavy contention, the anticipated performance benefits of reduced memory ordering were muted.
Technical Details
- Target: RISC-V Instruction Set Architecture (ISA).
- Primitive Semantics: The proposed instructions implement 'acquire' and 'release' semantics, powerful primitives used for expressing only the minimal ordering required for multi-processor correctness.
- Implementation Targets: The new instructions were implemented across:
herd: Used for formal verification of memory consistency.gem5: Used for cycle-approximate simulation and performance measurement.LLVM/Clang: Compiler toolchain support, crucial for generating code that utilizes the new instructions.
- Purpose of Explicit Instructions: To eliminate unnecessary memory fences or stronger atomic operations when only load-acquire or store-release guarantees are needed, which is common in high-level programming language atomics (e.g., C++11).
Implications
- Standardization Urgency: The paper stresses the immediate necessity of ratifying these explicit synchronization instructions within the RISC-V standard.
- Mitigating Fragmentation: Failure to ratify the instructions quickly risks the creation of "multiple ABI implementations" across different vendors, leading to severe ecosystem fragmentation and portability issues for concurrent software.
- Hardware Efficiency: By allowing software to express precise ordering requirements, the RISC-V architecture can fully leverage the performance benefits of weak memory models without sacrificing programming correctness.
- Software Development: This standardization ensures that high-level synchronization primitives and memory models used by modern operating systems and languages (like C++ and Rust) can be mapped efficiently and reliably onto the RISC-V hardware.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.