ACM
Research
The Configuration Wall: Characterization and Elimination of Accelerator Configuration Overhead
Abstract
This paper characterizes the 'Configuration Wall,' a critical bottleneck where the latency of setting up hardware accelerators consumes a dominant portion of the total execution time for fine-grained tasks. The
Research
Sequential Specifications for Precise Hardware Exceptions
Abstract
The paper introduces a formal methodology, termed "Sequential Specifications," designed to rigorously define and guarantee precise hardware exceptions, even in aggressive out-of-order processor implementations. This approach disentangles the complex specification
Research
CHERI-SIMT: Implementing Capability Memory Protection in GPUs
Abstract
CHERI-SIMT introduces the first unified architecture to integrate the CHERI Capability Hardware Enhanced RISC Instructions model directly into the Single Instruction, Multiple Threads (SIMT) execution pipeline of GPUs. This innovation addresses the
Research
A Data-Driven Dynamic Execution Orchestration Architecture
Abstract
This paper presents a novel Data-Driven Dynamic Execution Orchestration Architecture designed to enhance processor efficiency and performance predictability. The core innovation involves using runtime data insights to dynamically manage and schedule execution
Research
Efficient Timing Prediction and Optimization Using Derivable Gradient Boosting Machine Model at Placement Stage
Abstract
This paper presents a novel approach for electronic design automation (EDA) focusing on efficient timing closure during the physical design flow. It introduces a framework utilizing a Derivable Gradient Boosting Machine (D-GBM)
Research
TranSQL+: Serving Large Language Models with SQL on Low-Resource Hardware
Abstract
TranSQL+ is a novel framework designed for serving Large Language Models efficiently on low-resource hardware environments. It achieves this efficiency by utilizing a SQL interface for sophisticated data management and prompt serving,