Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores
Abstract
Stream Semantic Registers (SSR) is a lightweight, non-invasive RISC-V Instruction Set Architecture extension designed to overcome the von Neumann bottleneck in energy-efficient single-issue cores. SSR achieves full compute utilization by implicitly encoding memory accesses as register reads/writes, eliminating numerous explicit load and store instructions. This innovation delivers a significant 2x to 5x architectural speedup and a 2x improvement in energy efficiency, requiring only an 11% increase in core area.
Report
Key Highlights
- Full Utilization: Achieves nearly 100% compute utilization in energy-efficient single-issue cores by eliminating cycles spent on data movement.
- Performance Gain: Provides a significant architectural speedup ranging from 2x to 5x across different kernels, with sequential code running 3x faster on a single core.
- Energy Efficiency: Delivers a 2x energy efficiency improvement in multi-core clusters.
- Scaling Efficiency: Requires 3x fewer cores in a cluster to match the performance of the non-extended architecture.
- Compiler Transparency: Compilers can automatically map loop nests to SSRs, making the performance boost transparent to the programmer.
Technical Details
- Innovation: Stream Semantic Registers (SSR) is a lightweight, non-invasive extension to the RISC-V ISA.
- Mechanism: Memory accesses (loads/stores) are implicitly encoded as standard register reads and writes, effectively hiding data movement from the execution pipeline.
- Implementation: The extension was implemented in RTL within an existing multi-core cluster and synthesized using a modern 22nm technology.
- Area Cost: The design penalty is minimal, requiring only an 11% increase in core area.
- Fetch Reduction: The elimination of load/store instructions reduces instruction fetches by up to 3.5x, leading to a substantial reduction in instruction cache power consumption (up to 5.6x).
Implications
- Solving the Bottleneck: SSR directly addresses the von Neumann bottleneck, which severely limits the effective computation rate of simple, highly energy-efficient cores, making them much more practical for high-throughput tasks.
- RISC-V Competitiveness: This extension enhances the performance-per-watt profile of RISC-V single-issue cores, making them highly attractive for embedded systems, IoT devices, and specialized accelerators where energy conservation is paramount.
- Cluster Density: The ability to achieve the same performance with 3x fewer cores suggests major cost and area savings for designers deploying large RISC-V multi-core clusters, without resorting to complex out-of-order execution pipelines.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.