Architecture
Research
CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations
Abstract
CoroAMU is a hardware-software co-designed system addressing severe memory latency issues in data-intensive applications running on disaggregated memory systems. The system features a compiler that optimizes coroutine management by minimizing context and
Research
SynapticCore-X: A Modular Neural Processing Architecture for Low-Cost FPGA Acceleration
Abstract
SynapticCore-X is a modular and open-source neural processing architecture optimized for resource-efficient deployment on low-cost FPGA platforms like the Zynq-7020. The design tightly couples a lightweight RV32IMC RISC-V control core with a
Research
TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-Up Cluster Design With High Bandwidth Main Memory Link
Abstract
The TeraPool project introduces a highly scaled-up cluster architecture integrating 1024 RISC-V cores optimized for parallelism. Its key innovation is a physically design-aware implementation featuring a shared-L1-memory structure, which enables ultra-low-latency data
Research
Decoupled Control Flow and Data Access in RISC-V GPGPUs
Abstract
This paper addresses the performance limitations of Vortex, an open-source RISC-V GPGPU, by tackling high micro-code overheads associated with control flow (CF) management and memory access. The core innovation introduces decoupled CF
Research
FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices
Abstract
This paper presents novel FPGA-accelerated RISC-V instruction set architecture (ISA) extensions designed for efficient neural network inference on resource-constrained edge devices. The customized RISC-V core, featuring four domain-specific ISA extensions and integrated
Research
Hardware-Aware Neural Network Compilation with Learned Optimization: A RISC-V Accelerator Approach
Abstract
The XgenSilicon ML Compiler is an automated end-to-end framework that optimizes high-level machine learning models into highly efficient RISC-V assembly code for custom ASIC accelerators. This system unifies software and hardware cost