STRELA: STReaming ELAstic CGRA Accelerator for Embedded Systems

STRELA: STReaming ELAstic CGRA Accelerator for Embedded Systems

Abstract

STRELA is an elastic Coarse-Grained Reconfigurable Architecture (CGRA) integrated into an energy-efficient RISC-V System-on-Chip (SoC) designed for the embedded domain. Its microarchitecture specifically supports conditionals and irregular loops, enhancing adaptability for complex, domain-specific applications using novel one-shot and multi-shot mapping strategies. Implemented in 65 nm, STRELA demonstrates strong performance, achieving peak speed-ups of up to 18.61x and energy savings of up to 11.10x compared to the host CPU.

Report

Key Highlights

  • Novel Architecture: Introduction of STRELA, an elastic Coarse-Grained Reconfigurable Architecture (CGRA) accelerator.
  • Integration: Seamlessly integrated into an energy-efficient RISC-V-based SoC, optimizing performance for embedded systems.
  • Complex Support: The microarchitecture natively supports conditionals and irregular loops, significantly increasing its domain adaptability.
  • Performance Metrics: Achieved maximum speed-ups of 17.63x (one-shot) and 18.61x (multi-shot) compared to the software-programmable CPU.
  • Efficiency Leader: Demonstrates energy savings up to 11.10x in the overall SoC, with a best energy efficiency of 115.96 MOPs/mW for multi-shot kernels.

Technical Details

  • Technology Node: Implemented in TSMC 65 nm process technology.
  • Frequency: Maximum operating frequency achieved is 250 MHz.
  • Architecture Type: Elastic CGRA. Elasticity is leveraged to efficiently map both simple applications (one-shot kernel, single reconfiguration) and complex/large applications (multi-shot kernels, multiple reconfigurations).
  • Peak Throughput: The design achieves a peak performance of 1.22 GOPs for one-shot kernels and 1.17 GOPs for multi-shot kernels.
  • Memory Structure: Incorporates independent memory nodes specifically to streamline and accelerate data accesses.
  • Power Management: Integrates standard power and clock-gating techniques tailored for the embedded domain to maintain high efficiency.
  • Framework: CGRA acts as a spatial distribution accelerator for parallelizable sections, while the RISC-V CPU handles control-intensive operations.

Implications

  • RISC-V Heterogeneity: STRELA provides a versatile and highly efficient template for integrating reconfigurable computing within the open-source RISC-V ecosystem, broadening its application scope in the energy-constrained embedded market.
  • Edge Computing Enablement: The high energy efficiency (up to 115.96 MOPs/mW) and performance gains validate reconfigurable hardware as a robust solution for demanding edge AI and signal processing tasks where power budgets are tight.
  • Increased Programmability: By supporting complex control flow features (conditionals and irregular loops), this CGRA overcomes traditional limitations of spatial accelerators, making it accessible for a wider range of high-level language applications.
  • Framework Validation: The successful deployment demonstrates the practical effectiveness of co-designing specialized accelerators with general-purpose RISC-V cores, creating a powerful, adaptable, and energy-efficient heterogeneous computing framework.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →