From CISC to RISC: language-model guided assembly transpilation

Abstract

This paper introduces CRT, a lightweight LLM-based assembly transpiler designed to automatically convert x86 (CISC) code to RISC architectures like ARM and RISC-V. This tool addresses the fundamental challenge of migrating legacy software stacks by bridging the architectural gap while preserving program semantics. Evaluation demonstrates CRT delivers a 1.73x speedup and significant efficiency improvements compared to the Rosetta 2 virtualization engine on Apple M2 hardware.

Report

Key Highlights

CISC to RISC Transpilation: The paper introduces CRT, a lightweight LLM-based transpiler for converting x86 assembly (CISC) to ARM and RISC-V assembly (RISC).
Performance Superiority: Transpiled code, when deployed on Apple M2 (ARMv8), achieved a 1.73x speedup compared to Apple's Rosetta 2 virtualization engine.
Efficiency Gains: CRT delivered 2.41x memory efficiency and 1.47x better energy consumption versus Rosetta 2.
High Accuracy: The system achieved a translation accuracy of 88.68% from x86 to RISC-V and 79.25% from x86 to ARMv5.

Technical Details

Method: CRT is an LLM-based assembly transpiler specifically designed to navigate the complexity of converting between the CISC and RISC instruction set architectures while preserving program semantics.
Source ISA: x86 assembly.
Target ISAs: ARM assembly (specifically evaluated on ARMv5 and deployed on ARMv8, e.g., Apple M2) and RISC-V.
Evaluation Metrics: Translation accuracy, speedup, memory consumption, and energy consumption against commercial virtualization solutions.
Availability: The authors plan to release code, models, training datasets, and benchmarks publicly.

Implications

Accelerating ISA Migration: CRT provides a crucial tool to accelerate the industry-wide shift toward energy-efficient RISC architectures by offering a performant and automated solution for porting massive legacy x86 software stacks.
Enhancing RISC-V Ecosystem: The strong translation accuracy (88.68%) observed for RISC-V directly addresses a major hurdle in RISC-V adoption: the immediate lack of native commercial software. This could significantly speed up the viability of RISC-V platforms.
Native Performance Substitute: By substantially outperforming conventional virtualization (Rosetta 2) in speed, memory, and energy, LLM-guided transpilation offers a path toward achieving near-native performance for cross-architecture applications.
LLMs in Low-Level Systems: This work validates the utility of Language Models in highly complex, low-level programming domains, bridging architectural gaps that traditional compilers or static translators struggle with.

Abstract

Report

Key Highlights

Technical Details

Implications

Prof. B's Student