SSR: A Stall Scheme Reducing Bubbles in Load-Use Hazard of RISC-V Pipeline
Abstract
This paper introduces SSR (Stall Scheme Reducing bubbles), a microarchitectural modification designed to reduce performance degradation caused by load-use hazards in RISC-V pipelines. The proposed method replaces the traditional ID-EXE stage stall detection with a simple bypass unit placed between the EXE and MEM stages, thereby eliminating unnecessary pipeline bubbles. Implemented in the Rocket-chip generator and synthesized in 130-nm technology, the SSR scheme demonstrated a 6.9% performance increase on the Dhrystone benchmark.
Report
Key Highlights
- Load-Use Hazard Mitigation: The core focus is optimizing pipeline performance by efficiently handling load-use hazards, which commonly cause pipeline stalls.
- SSR Scheme: A novel stall scheme is introduced that reduces the insertion of costly pipeline bubbles.
- Architectural Change: The innovation involves adding a bypass unit between the Execute (EXE) stage and the Memory (MEM) stage.
- Performance Gain: The new scheme resulted in a significant 6.9% performance increase in the Dhrystone benchmark.
- Implementation Base: The design was integrated into the open-source RISC-V SoC generator, Rocket-chip.
Technical Details
- Traditional Scheme: Standard pipelined processors detect load-use hazards early (typically between the ID stage and the EXE stage), immediately forcing a pipeline stall and inserting 'bubbles' (No-Ops).
- SSR Mechanism: The SSR scheme bypasses or disables the traditional early hazard detection. Instead, it relies on a new simple bypass unit situated between the EXE and MEM stages to resolve the dependency later in the pipeline.
- Bubble Reduction: By delaying the dependency resolution and utilizing the new bypass path, the scheme drastically reduces the number of bubbles inserted between the load instruction and the subsequent instruction using the loaded data.
- Verification: The design was synthesized and tested using SMIC 130-nm technology.
- Benchmark Result: The 6.9% speedup measurement was achieved specifically using the Dhrystone benchmark suite.
Implications
- Enhanced RISC-V Efficiency: SSR provides a practical and effective microarchitectural optimization for RISC-V cores, directly addressing a critical performance bottleneck associated with memory access.
- Relevance to Long Pipelines: The paper notes that the benefit is especially 'considerable' in long pipelines, suggesting scalability and improved instruction throughput in higher-performance RISC-V implementations.
- Open-Source Integration: Integration into the widely used Rocket-chip generator means this performance enhancement is easily accessible to researchers and commercial entities developing RISC-V SoCs, accelerating the adoption of high-efficiency cores.
- Favorable Trade-off: The reported performance gain (6.9%) is achieved with only a 'reasonable cost of area and power,' making it a highly desirable optimization for embedded and low-power applications where power efficiency is paramount.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.