Beyond Random Inputs: A Novel ML-Based Hardware Fuzzing

Beyond Random Inputs: A Novel ML-Based Hardware Fuzzing

Abstract

This paper introduces ChatFuzz, a novel ML-based hardware fuzzer that leverages Large Language Models (LLMs) and Reinforcement Learning (RL) to efficiently generate effective assembly code sequences for processor verification. Utilizing the RISC-V RocketCore as a testbed, ChatFuzz dramatically outperformed state-of-the-art methods, achieving 75% condition coverage in only 52 minutes compared to a 30-hour benchmark. The approach successfully identified more than 10 unique mismatches, including two previously unknown bugs in the RocketCore processor.

Report

Key Highlights

  • Novel Fuzzer: The core innovation is ChatFuzz, an ML-based hardware fuzzing tool designed to overcome the limitations of traditional random regression and formal verification methods.
  • Performance Leap: ChatFuzz achieved 75% condition coverage on the testbed processor in just 52 minutes, a massive speed increase compared to existing state-of-the-art fuzzers, which required 30 hours to reach similar coverage.
  • Bug Discovery: The fuzzer successfully uncovered over 10 unique hardware mismatches, including two newly identified bugs within the RocketCore processor.
  • Test Case Results: Out of 199,000 generated test cases, 6,000 produced discrepancies when compared against the processor's golden model.
  • Coverage Threshold: The method demonstrated the ability to reach 80% coverage within a 130-hour window, even with a limited resource pool of only 10 simulation licenses.

Technical Details

  • Architecture: The fuzzer targets security vulnerabilities in complex computing hardware, specifically tested on the open-source RISC-V-based RocketCore processor.
  • Input Generation: The approach utilizes Large Language Models (LLMs), such as ChatGPT, to understand the processor's native language (machine codes) and generate relevant assembly code sequences.
  • Optimization Mechanism: Reinforcement Learning (RL) is integrated to guide the input generation process. RL provides rewards based on code coverage metrics, ensuring the generated inputs actively explore intricate hardware states.
  • Coverage Metrics: Success is measured using code coverage, specifically condition coverage rates (achieving 75% rapidly).
  • Discrepancy Identification: The fuzzer identifies vulnerabilities by checking for discrepancies between the execution results of the tested processor and a reference (golden) ISA Simulator.

Implications

  • Enhanced Hardware Trust: By significantly improving the efficiency and depth of hardware vulnerability detection, ChatFuzz reinforces the integrity of hardware used as the root of trust in modern computing systems.
  • Impact on RISC-V Ecosystem: Testing ChatFuzz on the open-source RocketCore processor directly contributes to the security and robustness of the RISC-V ecosystem by identifying and helping patch critical, previously unknown bugs.
  • Verification Paradigm Shift: The integration of LLMs and RL moves hardware fuzzing beyond purely random inputs, enabling smarter, goal-directed testing that can explore complex state spaces far faster than traditional methods, overcoming the perennial challenge of low coverage thresholds (often below 70%).
  • Cost Efficiency: The dramatic reduction in required verification time (from 30 hours to 52 minutes for key coverage) reduces computational cost and time-to-market for hardware design validation.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →