Research

Exploiting long vectors with a CFD code: a co-design show case

Admin

0 views • a year ago (Updated) • 2 min read •

•

Abstract

This paper showcases a co-design methodology utilizing iterative analysis and compiler autovectorization to effectively exploit long vector architectures in HPC applications, specifically a production CFD code. The optimization process, designed to maximize efficiency while preserving code portability, was evaluated on an innovative RISC-V platform featuring a wide vector unit. The results demonstrated a substantial single-core speedup of $7.6\times$ compared to the scalar implementation, with portability confirmed across diverse architectures like Intel x86 and NEC SX-Aurora.

Report

Key Highlights

Co-Design Focus: The research centers on a co-design methodology to exploit long vector architectures (SIMD/Vector extensions) for data parallelism in HPC.
Optimization Strategy: The primary method is leveraging compiler autovectorization, focusing on iterative code improvement guided by detailed analysis tools to maximize efficiency and minimize code specialization (maintaining portability).
Application Success: The methodology was applied to a production Computational Fluid Dynamics (CFD) code.
Performance Result: Achieved a significant single-core speedup of $7.6\times$ over the scalar implementation on the target platform.
Portability Demonstrated: The optimized solution maintained performance benefits or showed no drawbacks when tested on other major HPC architectures, including Intel x86 and NEC SX-Aurora.

Technical Details

Vectorization Method: Compiler autovectorization is preferred over methods that require extensive code modification (e.g., intrinsics or guided vectorization via pragmas).
Target Architecture: An innovative configurable platform powered by a RISC-V core.
Vector Unit Specification: The platform includes a wide vector unit capable of handling up to 256 double-precision elements.
HPC Context: The study addresses a current trend in HPC systems to utilize SIMD or vector extensions for exploiting data parallelism.
Validation Platforms: Performance comparison utilized the RISC-V core, Intel x86, and NEC SX-Aurora architectures.

Implications

RISC-V Vector Validation: This work provides strong proof that configurable RISC-V cores, coupled with specialized wide vector units, can deliver exceptional performance necessary for demanding scientific workloads like CFD. The $7.6\times$ speedup validates the RISC-V Vector (RVV) ecosystem's potential in high-performance computing.
Software Ecosystem Maturity: The successful reliance on compiler autovectorization indicates growing maturity in RISC-V toolchains and compilers, enabling high performance without requiring developers to write vendor-specific intrinsic code.
Co-Design Utility: The demonstrated iterative co-design approach offers a blueprint for hardware and software developers to jointly optimize scientific applications, accelerating the deployment and adoption of new RISC-V HPC hardware.
HPC Portability Solution: By prioritizing source-level improvements for efficient autovectorization, the resulting code maintains high performance across competing vector architectures (RISC-V, x86, NEC), addressing a critical portability challenge in the heterogeneous HPC landscape.

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →