Tutorial
Research
From CISC to RISC: language-model guided assembly transpilation
Admin
•
2 min read
•
(Updated: )
Abstract
This paper introduces CRT, a lightweight LLM-based assembly transpiler designed to automatically convert x86 (CISC) code to RISC architectures like ARM and RISC-V. This tool addresses the fundamental challenge of migrating legacy software stacks by bridging the architectural gap while preserving program semantics. Evaluation demonstrates CRT delivers a 1.73x speedup and significant efficiency improvements compared to the Rosetta 2 virtualization engine on Apple M2 hardware.
Report
Key Highlights
- CISC to RISC Transpilation: The paper introduces CRT, a lightweight LLM-based transpiler for converting x86 assembly (CISC) to ARM and RISC-V assembly (RISC).
- Performance Superiority: Transpiled code, when deployed on Apple M2 (ARMv8), achieved a 1.73x speedup compared to Apple's Rosetta 2 virtualization engine.
- Efficiency Gains: CRT delivered 2.41x memory efficiency and 1.47x better energy consumption versus Rosetta 2.
- High Accuracy: The system achieved a translation accuracy of 88.68% from x86 to RISC-V and 79.25% from x86 to ARMv5.
Technical Details
- Method: CRT is an LLM-based assembly transpiler specifically designed to navigate the complexity of converting between the CISC and RISC instruction set architectures while preserving program semantics.
- Source ISA: x86 assembly.
- Target ISAs: ARM assembly (specifically evaluated on ARMv5 and deployed on ARMv8, e.g., Apple M2) and RISC-V.
- Evaluation Metrics: Translation accuracy, speedup, memory consumption, and energy consumption against commercial virtualization solutions.
- Availability: The authors plan to release code, models, training datasets, and benchmarks publicly.
Implications
- Accelerating ISA Migration: CRT provides a crucial tool to accelerate the industry-wide shift toward energy-efficient RISC architectures by offering a performant and automated solution for porting massive legacy x86 software stacks.
- Enhancing RISC-V Ecosystem: The strong translation accuracy (88.68%) observed for RISC-V directly addresses a major hurdle in RISC-V adoption: the immediate lack of native commercial software. This could significantly speed up the viability of RISC-V platforms.
- Native Performance Substitute: By substantially outperforming conventional virtualization (Rosetta 2) in speed, memory, and energy, LLM-guided transpilation offers a path toward achieving near-native performance for cross-architecture applications.
- LLMs in Low-Level Systems: This work validates the utility of Language Models in highly complex, low-level programming domains, bridging architectural gaps that traditional compilers or static translators struggle with.