Abstract
This work presents the design and evaluation of a Processor Tracing System compliant with the RISC-V Efficient Trace specification for Instruction Branch Tracing. Integrated into the CVA6 edge architecture, the system was
Abstract
The large area consumption of the Vector Register File (VRF) in RISC-V vector engines hinders their deployment in low-cost CPUs designed for edge Machine Learning acceleration. This paper introduces "Register Dispersion,
Abstract
This preliminary work presents a methodology for accelerating high-level Python applications, specifically those utilizing the Numpy library, on heterogeneous RISC-V Systems-on-Chip (SoCs). The approach involves modifying the OpenBLAS library to utilize OpenMP
Abstract
This work introduces Fused-Tiled Layers (FTL), a novel algorithm designed for the automatic fusion of tiled layers in Deep Neural Networks (DNNs) to minimize excessive data movement. FTL addresses the common issue
Originally published on ArXiv - Hardware Architecture
Computer Science > Hardware Architecture
arXiv:2504.03675v1 (cs)
[Submitted on 21 Mar 2025]
Title:MemPool Flavors: Between Versatility and Specialization in a RISC-V Manycore Cluster