Octopus: A Heterogeneous In-network Computing Accelerator Enabling Deep Learning for network
Abstract
The Octopus accelerator addresses the challenge of deploying Deep Learning (DL) models directly onto programmable in-network computing devices, which typically lack the necessary processing power and generality. It employs a heterogeneous architecture combining a dedicated feature extractor with collaborative vector accelerators and a systolic array, all governed by a RISC-V core. This design achieves high-throughput performance validated on an FPGA, including 31 Mpkt/s feature extraction and 207 ns packet-based computing latency.
Report
Key Highlights
- Novel Architecture: Proposes Octopus, a specialized heterogeneous in-network computing accelerator designed specifically to run Deep Learning (DL) models on the network data plane.
- Performance Breakthrough: Successfully tackles limitations in computing power, task granularity, and model generality faced by existing in-network devices.
- High Throughput: Demonstrated exceptional performance metrics, including 31 Mpkt/s for feature extracting and 207 ns for packet-based computing latency.
- Heterogeneous Collaboration: The core computing power is derived from a collaborating Vector Accelerator and Systolic Array, optimizing for both low-latency and high-throughput general tasks.
Technical Details
- Core Architecture: Heterogeneous In-Network Computing (INC) design.
- Primary Components:
- Feature Extractor: Designed for fast and efficient initial data processing.
- Vector Accelerator & Systolic Array: Work collaboratively to provide general, low-latency/high-throughput computing for packet-and-flow-based tasks.
- RISC-V Core: Used for global controlling functions.
- Memory: Utilizes an on-chip memory fabric for storage and connectivity.
- Implementation Platform: The Octopus accelerator design was implemented and validated on an FPGA.
- Measured Performance:
- Feature Extracting: 31 Mpkt/s
- Packet-Based Computing Latency: 207 ns
- Flow-Based Computing Throughput: 90 kflow/s
Implications
- Advancing Intelligent Networks: Octopus enables true deployment of complex DL models (for tasks like traffic classification or intrusion detection) directly within the high-speed data plane, reducing reliance on external CPUs or servers.
- Validation of RISC-V in INC: The explicit use of the RISC-V core for global controlling reinforces its growing role as the flexible, open control plane ISA for highly specialized, domain-specific heterogeneous accelerators in networking hardware.
- Accelerating Data Plane Customization: This work validates the approach of combining customized processing units (Feature Extractors, Systolic Arrays) with a standard, open control processor (RISC-V) to achieve specialized application performance previously unattainable in general-purpose network processors.
- Future Hardware Trend: Octopus contributes to the paradigm shift toward integrating advanced AI inference capabilities directly into network switches and SmartNICs.
Technical Deep Dive Available
This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.