Paper
Research
Occamy: A 432-Core Dual-Chiplet Dual-HBM2E 768-DP-GFLOP/s RISC-V System for 8-to-64-bit Dense and Sparse Computing in 12nm FinFET
Abstract
Occamy is a 432-core, 768-DP-GFLOP/s, dual-chiplet RISC-V system designed specifically to maximize compute efficiency across both dense and sparse FP8-to-FP64 ML and HPC workloads. Utilizing dual-HBM2E memory, a latency-tolerant interconnect, and
Research
Modern Hardware Security: A Review of Attacks and Countermeasures
Abstract
This paper reviews the urgent state of hardware security, driven by the proliferation of cloud, IoT, and smart devices, and the rapid evolution of computing architectures. It provides a comprehensive analysis of
Research
A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU
Abstract
This paper introduces a BFloat16 RISC-V acceleration template for edge Generative AI, specifically addressing the performance bottleneck caused by softmax and GELU non-linearities in Transformer models. The innovation lies in SoftEx, a
Research
Efficient transformer adaptation for analog in-memory computing via low-rank adapters
Abstract
This paper proposes Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) to efficiently adapt large transformer models for Analog In-Memory Computing (AIMC) hardware, circumventing the need for costly full model retraining or analog device reprogramming.
Research
From CISC to RISC: language-model guided assembly transpilation
Abstract
This paper introduces CRT, a lightweight LLM-based assembly transpiler designed to automatically convert x86 (CISC) code to RISC architectures like ARM and RISC-V. This tool addresses the fundamental challenge of migrating legacy
Research
Teaching Experiences using the RVfpga Package
Abstract
The RVfpga course provides a robust, hands-on introduction to computer architecture using the RISC-V instruction set and FPGA technology. This paper details various successful teaching experiences, demonstrating its utility across undergraduate and