Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities

Abstract

This paper evaluates the performance and energy efficiency of the Tenstorrent Grayskull e75 RISC-V accelerator when executing fundamental MatMul operations critical for Large Language Models. Researchers conducted a detailed characterization of the accelerator, comparing its throughput against state-of-the-art architectures, including Intel Sapphire Rapids and NVIDIA A100/V100 GPUs. While NVIDIA retains the lead in raw performance, the Grayskull chip demonstrated a highly competitive power efficiency trade-off, achieving a peak of 1.55 TFLOPs/Watt using BF16 precision.

Report

Key Highlights

  • Hardware Focus: The study specifically assesses the performance of the Tenstorrent Grayskull e75 RISC-V accelerator.
  • Efficiency Metric: Grayskull achieved a high energy efficiency peak of 1.55 TFLOPs/Watt when utilizing BF16 reduced numerical precision.
  • AI Workload: The evaluation focused on basic linear algebra kernels (MatMul), foundational operations for modern Large Language Models (LLMs).
  • Competitive Standing: While NVIDIA GPUs (V100 and A100) demonstrated superior raw performance, Grayskull provided a compelling trade-off between computational throughput and power consumption.

Technical Details

  • Accelerator: Tenstorrent Grayskull e75 (a RISC-V architecture).
  • Test Kernels: Basic linear algebra kernels, focusing on MatMul (Matrix Multiplication).
  • Precision: Evaluation used reduced numerical precision, specifically achieving peak efficiency with BF16 (Bfloat16).
  • Comparison Benchmarks: Performance was measured against current industry standard tensor accelerators, including Intel Sapphire Rapids processors and NVIDIA V100 and A100 GPUs.
  • Characterization Parameters: The analysis included a detailed examination of Grayskull’s execution model, grid size, matrix dimensions, data formats, and the impact of numerical precision on efficiency.

Implications

  • Validation of RISC-V in AI: The findings validate that RISC-V based architectures, like Grayskull, can offer competitive power-performance ratios suitable for energy-conscious AI and LLM deployment scenarios.
  • Alternative Compute Viability: Although not achieving the absolute highest raw throughput of top-tier GPUs, the high TFLOPs/Watt figure positions Tenstorrent as a strong alternative for data centers or edge applications where power budgets are constrained.
  • Ecosystem Growth: Providing public performance data on a major commercial RISC-V AI accelerator contributes critical benchmarks, encouraging further development and investment in the open-source instruction set architecture for demanding computational fields like AI/ML.
lock-1

Technical Deep Dive Available

This public summary covers the essentials. The Full Report contains exclusive architectural diagrams, performance audits, and deep-dive technical analysis reserved for our members.

Read Full Report →