ARCANE: Adaptive RISC-V Cache Architecture for Near-memory Extensions

ARCANE: Adaptive RISC-V Cache Architecture for Near-memory Extensions

Abstract

ARCANE introduces a novel Adaptive RISC-V Cache Architecture that transforms the traditional cache subsystem into a tightly-coupled compute-near-memory coprocessor, specifically addressing the von Neumann data movement bottleneck. This architecture allows the RISC-V cache controller to execute custom instructions from the host CPU by dispatching vector operations to near-memory vector processing units. The proposed design achieves significant acceleration, showing $30\times$ to $84\times$ performance improvement on an 8-bit CNN workload compared to a traditional cached system, while incurring only a 41.3% area overhead.

Report

Key Highlights

  • Core Innovation: ARCANE (Adaptive RISC-V Cache Architecture for Near-memory Extensions) proposes a cache architecture that doubles as a tightly-coupled compute-near-memory (CnM) coprocessor.
  • Target: Directly mitigates the von Neumann bottleneck, characterized by extensive data movement, low throughput, and poor energy efficiency in data-driven applications.
  • Performance Gain: Achieves a substantial $30\times$ to $84\times$ performance improvement when operating on 8-bit data compared to a traditional cached system, demonstrated using a worst-case 32-bit Convolutional Neural Network (CNN) workload.
  • Area Cost: The implementation features a modest area overhead of only 41.3%.
  • Usability: Abstracted memory synchronization and data mapping requirements, improving usability compared to existing in- or near-memory solutions.

Technical Details

  • Architecture Type: Adaptive RISC-V Cache Architecture featuring near-memory computing capabilities.
  • Processing Units: Vector operations are dispatched to specialized Near-Memory Vector Processing Units (VPUs) embedded within the cache memory subsystem.
  • Control Mechanism: The RISC-V cache controller is responsible for executing custom instructions received from the host CPU.
  • Extensibility: The architecture supports software-based Instruction Set Architecture (ISA) extensibility, allowing applications to define and utilize custom compute instructions directly within the cache.
  • Software Abstraction: By handling data synchronization and mapping internally, the architecture reduces complexity for application software, which is a major challenge for many compute-in-memory solutions.

Implications

  • Future of RISC-V: ARCANE provides a high-performance blueprint for integrating computation directly into RISC-V memory hierarchies, enhancing the capability of RISC-V cores in accelerating AI and data-intensive workloads.
  • Data-Centric Computing: This solution offers a practical path toward solving the memory wall problem for modern data-driven applications, making high throughput and energy efficiency attainable without discarding the fundamental flexibility of the CPU.
  • Usability in CnM: By abstracting complex data placement and synchronization requirements, ARCANE addresses a major adoption barrier for near-memory computing, potentially accelerating the widespread use of CnM architectures in commercial products.
  • Efficiency: Achieving massive performance gains ($30\times$ to $84\times$) with a relatively low area overhead (41.3%) makes this architecture economically feasible for integration into system-on-chip (SoC) designs.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →