CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration

CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration

Abstract

This article describes the implementation and optimization of hardware virtualization support for the open-source RISC-V CVA6 core, encompassing architecture and microarchitecture enhancements. The authors introduce specific structures like the G-Stage TLB (GTLB) and L2 TLB to alleviate the significant performance overhead associated with virtualization. The optimal configuration, identified through extensive Design Space Exploration, achieves an average functional performance speedup of 12.5% over non-optimized designs at negligible costs (0.78% area increase and 0.33% power increase).

Report

Structured Report: CVA6 RISC-V Virtualization

Key Highlights

  • RISC-V Virtualization Implementation: The work successfully integrates and optimizes hardware virtualization support into the open-source RISC-V CVA6 core.
  • Performance Optimization: The implemented microarchitectural enhancements yielded a substantial performance speedup of up to 16% (approximately 12.5% on average) compared to a virtualization-aware but non-optimized design.
  • Efficiency: This significant performance gain was achieved at a minimal resource cost, requiring only a 0.78% increase in area and a 0.33% increase in power.
  • Design Space Exploration (DSE): A comprehensive DSE was performed using both post-layout simulations (22nm FDX technology) and functional assessment on an FPGA platform (Genesys 2) to select the optimal design point.
  • Open Source: All architecture and microarchitecture work described is publicly available, allowing the community to utilize and further iterate on the designs.

Technical Details

Aspect Specification/Method
Target Core RISC-V CVA6 Core
Virtualization Mechanism RISC-V hardware virtualization support
Key Enhancements G-Stage Translation Lookaside Buffer (GTLB) and L2 TLB
Optimization Goal Alleviating performance overhead associated with two-stage address translation in virtualization.
Technology Node Post-layout simulations based on 22nm FDX technology
Functional Testing FPGA mapping using the Genesys 2 platform
Software Stack MiBench benchmark running on Linux atop the Bao hypervisor (single-core configuration)
Metrics Assessed Performance, Power, and Area (PPA)

Implications

This research provides a critical foundation for high-performance RISC-V systems requiring robust virtualization capabilities, a feature increasingly demanded in modern computing.

  • Validation of RISC-V Ecosystem: It validates the practical implementation and optimization of the RISC-V virtualization extension, demonstrating that efficient hardware-assisted virtualization is feasible on open cores.
  • Reference Implementation: The CVA6 implementation serves as a crucial open-source reference design for other commercial and academic RISC-V projects looking to incorporate virtualization efficiently.
  • Resource Efficiency for Embedded Systems: By achieving large performance gains with extremely low power and area overheads, the CVA6 virtualization implementation becomes highly suitable for embedded systems, edge computing, and automotive applications where resource constraints are severe.
  • Performance Benchmarking: The quantified results, particularly the effectiveness of structures like the GTLB and L2 TLB in addressing translation latency, inform future microarchitectural development for RISC-V processors.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →