Research

Beyond Random Inputs: A Novel ML-Based Hardware Fuzzing

Admin

0 views • 2 years ago (Updated) • 2 min read •

•

Abstract

This paper introduces ChatFuzz, a novel ML-based hardware fuzzer that leverages Large Language Models (LLMs) and Reinforcement Learning (RL) to efficiently generate effective assembly code sequences for processor verification. Utilizing the RISC-V RocketCore as a testbed, ChatFuzz dramatically outperformed state-of-the-art methods, achieving 75% condition coverage in only 52 minutes compared to a 30-hour benchmark. The approach successfully identified more than 10 unique mismatches, including two previously unknown bugs in the RocketCore processor.

Report

Key Highlights

Novel Fuzzer: The core innovation is ChatFuzz, an ML-based hardware fuzzing tool designed to overcome the limitations of traditional random regression and formal verification methods.
Performance Leap: ChatFuzz achieved 75% condition coverage on the testbed processor in just 52 minutes, a massive speed increase compared to existing state-of-the-art fuzzers, which required 30 hours to reach similar coverage.
Bug Discovery: The fuzzer successfully uncovered over 10 unique hardware mismatches, including two newly identified bugs within the RocketCore processor.
Test Case Results: Out of 199,000 generated test cases, 6,000 produced discrepancies when compared against the processor's golden model.
Coverage Threshold: The method demonstrated the ability to reach 80% coverage within a 130-hour window, even with a limited resource pool of only 10 simulation licenses.

Technical Details

Architecture: The fuzzer targets security vulnerabilities in complex computing hardware, specifically tested on the open-source RISC-V-based RocketCore processor.
Input Generation: The approach utilizes Large Language Models (LLMs), such as ChatGPT, to understand the processor's native language (machine codes) and generate relevant assembly code sequences.
Optimization Mechanism: Reinforcement Learning (RL) is integrated to guide the input generation process. RL provides rewards based on code coverage metrics, ensuring the generated inputs actively explore intricate hardware states.
Coverage Metrics: Success is measured using code coverage, specifically condition coverage rates (achieving 75% rapidly).
Discrepancy Identification: The fuzzer identifies vulnerabilities by checking for discrepancies between the execution results of the tested processor and a reference (golden) ISA Simulator.

Implications

Enhanced Hardware Trust: By significantly improving the efficiency and depth of hardware vulnerability detection, ChatFuzz reinforces the integrity of hardware used as the root of trust in modern computing systems.
Impact on RISC-V Ecosystem: Testing ChatFuzz on the open-source RocketCore processor directly contributes to the security and robustness of the RISC-V ecosystem by identifying and helping patch critical, previously unknown bugs.
Verification Paradigm Shift: The integration of LLMs and RL moves hardware fuzzing beyond purely random inputs, enabling smarter, goal-directed testing that can explore complex state spaces far faster than traditional methods, overcoming the perennial challenge of low coverage thresholds (often below 70%).
Cost Efficiency: The dramatic reduction in required verification time (from 30 hours to 52 minutes for key coverage) reduces computational cost and time-to-market for hardware design validation.

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →