Ocelot3: Full Vector “V” Extension for BOOM
Abstract
Ocelot3 is the latest iteration of the open-source project integrating vector support into the BOOM RISC-V core, achieving full compatibility with the RVV 1.0 specification. This generation features a decoupled Vector Processing Unit (VPU) connected via the Open Vector Interface, fostering community collaboration. A primary innovation over Ocelot2 is the successful implementation of complex segmented vector memory access instructions, requiring sophisticated data transposition techniques.
Report
Ocelot3: Full Vector “V” Extension for BOOM
Key Highlights
- Full RVV 1.0 Support: Ocelot3 achieves complete compliance with the RISC-V Vector extension (RVV) version 1.0 standard.
- BOOM Core Integration: The project successfully adds vector capabilities to the open-source, high-performance BOOM (Berkeley Out-of-Order Machine) core.
- Decoupled Architecture: The design utilizes a decoupled Vector Processing Unit (VPU).
- Segmented Memory Access: A major update over Ocelot2 is the support for complex segmented vector memory access instructions.
- Open Interface: The VPU connects through the Open Vector Interface, promoting modularity and community development.
Technical Details
- Target Core: BOOM (Berkeley Out-of-Order Machine).
- Vector Standard: RVV 1.0 (Full support).
- VPU Connection: Utilizes the Open Vector Interface (OVI).
- Implementation Challenge: The implementation of segmented vector memory access instructions was challenging due to the necessary step of transposing the data during access.
- Affiliations: Developed by Kishore Senthil Kumar and Kuan-Yu Chen, associated with Tenstorrent and the University of Michigan.
Implications
- Accelerated Open-Source Performance: By providing full RVV 1.0 compliance on the robust BOOM core, Ocelot3 significantly boosts the performance potential of open-source RISC-V hardware, especially for data-parallel workloads like AI/ML and scientific computing.
- Validation of RVV Standard: The successful, open-source implementation of all features, including challenging components like segmented loads/stores, helps validate the maturity and feasibility of the RVV 1.0 specification.
- Standardized Modularity: The use of the Open Vector Interface encourages broader ecosystem participation by standardizing how external vector units integrate with base RISC-V cores, fostering greater choice and innovation in VPU designs.
- Handling Complex Data: The inclusion of segmented memory access allows the processor to efficiently handle non-contiguous or structured data layouts, which is critical for real-world application performance.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.