An Embedded RISC-V Core with Fast Modular Multiplication

An Embedded RISC-V Core with Fast Modular Multiplication

Abstract

This work introduces an embedded RISC-V core designed to address the security and power consumption challenges inherent in battery-operated IoT devices by accelerating cryptographic operations. The key innovation is an extended custom instruction for modular multiplication that utilizes a Partial Execution mode, blocking the processor for only two cycles regardless of the multiplication size. This design achieves up to a 13x speedup and 95% power reduction on elliptic-curve cryptography benchmarks while maintaining an operational frequency of 136MHz on ASIC.

Report

Key Highlights

  • Core Innovation: A custom instruction extension added to an embedded RISC-V core specifically for accelerating modular multiplication.
  • Performance Metric: Achieves up to a 13x speedup compared to standard software implementations.
  • Power Efficiency: Reduces overall power consumption by up to 95%, critical for battery-operated IoT nodes.
  • Minimal Latency: The custom instruction, when used in Partial Execution mode, blocks the processor for typically only two cycles, regardless of the size of the operation.
  • Operating Frequency: The CPU with 128-bit modular multiplication runs at 136MHz on ASIC and 81MHz on FPGA.

Technical Details

  • Base Architecture: The CPU proof-of-concept adopted the embedded (E) and compressed (C) extensions of the RISC-V instruction set.
  • Acceleration Method: The modular multiplication is handled via an extended custom instruction, providing greater flexibility compared to fixed-function hardware accelerators.
  • Area Overhead: The inclusion of the accelerator resulted in an average area overhead of 41% over the base RISC-V architecture.
  • Cryptography Focus: The design was benchmarked specifically using recent algorithms in the field of Elliptic-Curve Cryptography (ECC).
  • Scalability: The custom instruction is designed to handle modular multiplication of “any size” while maintaining the two-cycle block characteristic.

Implications

  • IoT Security: This development significantly lowers the power budget required for robust encryption and authentication on resource-constrained IoT end-nodes, resolving a major security concern.
  • RISC-V Extensibility: It showcases the power and flexibility of RISC-V’s custom instruction mechanism (custom instruction sets are superior to inflexible hardware accelerators).
  • Real-time Responsiveness: By minimizing the processor stall time to just two cycles, the design maintains better device response time to real-time events, mitigating a major drawback of traditional custom instruction solutions.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →