TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

Abstract

TCN-CUTIE is a novel, all-digital ternary neural network accelerator implemented in 22 nm FDX technology and integrated into a RISC-V SoC, designed specifically for stringent TinyML constraints. It achieves a record peak energy efficiency of 1036 TOp/s/W, outperforming existing silicon-proven quantized accelerators by 1.67x. The accelerator supports both Ternary Convolutional Networks and specialized Temporal Convolutional Networks, demonstrating ultra-low energy consumption (2.72 uJ/Inference) while achieving competitive accuracy (86% on CIFAR-10).

Report

TCN-CUTIE: Ternary Accelerator Analysis

Key Highlights

  • Peak Energy Efficiency: Achieves a record 1036 TOp/s/W.
  • Efficiency Improvement: Outperforms the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67x.
  • Ultra-Low Power/Energy: Operates at 12.2 mW power consumption (at 0.5 V) and achieves 2.72 uJ/Inference for a 9-layer CNN.
  • Technology Node: Fabricated in 22 nm FDX (FD-SOI) technology.
  • Flexibility: Supports both standard Ternary Convolutional Neural Networks (TNNs) and time-dilated Temporal Convolutional Neural Networks (TCNs).

Technical Details

  • Architecture: All-Digital Ternary Neural Network (TNN) accelerator, ensuring robustness and simplicity.
  • Integration: Designed as a flexible accelerator block within a RISC-V-based System-on-Chip (SoC).
  • Ternary Operation: Utilizes ternary quantization (weights and/or activations limited to -1, 0, 1) to drastically reduce computational complexity and memory bandwidth.
  • TCN Support: Includes specialized extensions to handle temporal processing required by time-dilated TCNs, making it suitable for sequential and event-based data.
  • Benchmark Performance Metrics (CIFAR-10): 3200 Inferences/sec, 12.2 mW, 2.72 uJ/Inference, with 86% accuracy.
  • Benchmark Performance Metrics (DVS TCN): 8000 Inferences/sec, 12.2 mW, 5.5 uJ/Inference, with 94.5% accuracy.

Implications

  • Advancing TinyML: TCN-CUTIE sets a new benchmark for energy efficiency in TinyML hardware, proving that uJ/Inference constraints are achievable even with complex neural networks, directly enabling highly resource-constrained edge devices.
  • RISC-V Ecosystem Enhancement: Its integration within a RISC-V SoC demonstrates the platform's versatility and suitability for hosting highly specialized, power-efficient AI accelerators, further cementing RISC-V's role in the embedded and edge AI market.
  • Quantization Validation: The design validates the benefits of extreme quantization (ternary) in VLSI implementations, showing significant power reduction without catastrophically impacting accuracy for key tasks like image classification (CIFAR-10) and event-based sensing (DVS).
  • FDX Technology Adoption: Utilization of 22 nm FDX technology highlights its strength in achieving extremely low operating voltages (0.5 V) and maximizing energy efficiency critical for battery-powered, always-on applications.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →