Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-V
Hardware Review Research

Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-V

Admin (Updated: ) 2 min read

Abstract

This study designs and benchmarks a custom SHA-3 permutation instruction directly integrated into the RISC-V CPU microarchitecture to address the limitations of standalone cryptographic accelerators. Using cycle-accurate GEM5 simulations and FPGA prototyping, the integrated instruction achieved significant speedups, offering up to 46.31x performance improvement for Keccak-specific workloads. These findings demonstrate the feasibility of efficiently embedding complex hashing operations into the RISC-V Instruction Set Architecture (ISA) with only a modest increase in hardware utilization.

Report

Key Highlights

  • Custom Instruction: A custom SHA-3 permutation operation instruction was designed and prototyped for direct integration into the RISC-V CPU microarchitecture.
  • Performance Gain: The integrated instruction delivered performance improvements of up to 8.02x for general RISC-V optimized SHA-3 software workloads.
  • Keccak Acceleration: For specific Keccak-based workloads (the core algorithm of SHA-3), performance improved dramatically, showing speedups of up to 46.31x.
  • Low Overhead: The design incurred relatively minor hardware costs, specifically a 15.09% increase in registers and an 11.51% increase in Look-Up Table (LUT) utilization.

Technical Details

  • Target Architecture: RISC-V CPU architecture.
  • Operation Focus: The study focuses specifically on integrating the distinct permutation-based structure of the SHA-3 operation (Keccak).
  • Microarchitectural Goals: The design addressed challenges related to pipelined simultaneous execution, efficient storage utilization, and overall hardware cost.
  • Validation Methods: Benchmarking was conducted using two primary methods: cycle-accurate GEM5 simulations and practical FPGA prototyping.
  • Design Context: The work is positioned as an attempt to overcome the complexities of direct microarchitectural integration, a problem often avoided by relying on external coprocessors or basic software optimizations for SHA-3.

Implications

  • RISC-V Cryptography Roadmap: This work provides critical evidence supporting the inclusion of complex, multistage cryptographic instructions directly into the RISC-V ISA, mirroring efforts like Intel's AES-NI or ARM's custom extensions.
  • Efficiency and Competitiveness: By achieving high acceleration (up to 46.31x) with minimal overhead, the integration makes RISC-V cores significantly more competitive in security-sensitive domains like networking hardware, secure computing, and embedded systems requiring high-speed hashing.
  • Design Insight: The findings offer crucial, practical design considerations for developers working on future cryptographic instruction set extensions (CSIX), showing how to handle permutation-heavy algorithms efficiently at the microarchitectural level rather than relying on external acceleration units.