Research
DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
Abstract
DataMaestro is a novel data streaming engine that applies a decoupled access/execute architecture to Deep Neural Network (DNN) dataflow accelerators to mitigate performance bottlenecks caused by data movement. It features programmable
Research
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
Abstract
The VEXP project introduces a low-cost RISC-V Instruction Set Architecture (ISA) extension specifically designed to accelerate the Softmax computation bottleneck found in modern Transformer models. This is achieved by integrating a custom
Research
Unlimited Vector Processing for Wireless Baseband Based on RISC-V Extension
Abstract
This paper introduces the Unlimited Vector Processing (UVP) instruction set extension for RISC-V, specifically targeting performance improvements in wireless baseband processing (WBP). UVP overcomes conventional vector architecture constraints by implementing a novel
Research
AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit Performance
Abstract
This work introduces AraOS, an integrated environment enabling full operating system (Linux) support for the open-source Ara2 RISC-V vector processor by sharing the Memory Management Unit (MMU) of the CVA6 scalar core.
Research
Efficient Architecture for RISC-V Vector Memory Access
Abstract
Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns, often relying on high-overhead crossbars or large transposition buffers. This paper presents EARTH, a novel RISC-V vector memory
Research
Empowering Vector Architectures for ML: The CAMP Architecture for Matrix Multiplication
Abstract
This study introduces the Cartesian Accumulative Matrix Pipeline (CAMP) architecture, a novel design leveraging a hybrid multiplier to significantly enhance matrix multiplication within Vector Architectures (VAs) and SIMD units, optimized for Quantized