Ryzen AI · XDNA2 · VitisAI EP

Stress-test Stable Diffusion
on the AMD Ryzen AI NPU.

A Gradio-based endurance benchmark that runs Stable Diffusion natively on the AMD Ryzen AI NPU — measuring real inference throughput, stability and thermal behavior over time.

NPU stress test. not CPU · not GPU

XDNA2 · NPU ~50 TOPS RyzenAI 1.7.1 Windows 11 MIT · license
5
SD models
60
scene prompts
24hr
max endurance run
1500
max rounds
Benchmark Models · 05

Five Stable Diffusion models.

AMD-optimized ONNX models running on XDNA2. Reference numbers measured on an XDNA2 architecture NPU. Download from AMD's Hugging Face model zoo.

SD 1.5 LITE TIER
models/sd_15
Resolution
512×512
Steps
20
it/s
~0.87
Min RAM
16 GB
Avg latency · reference
~23s
per 512px image · 20 steps
throughput156.7 img/hr
SD Turbo LITE TIER
models/sd_turbo_bfp
Resolution
512×512
Steps
1
it/s
1-step
Min RAM
16 GB
Avg latency · reference
~20s
fastest · single-step model
throughput182.5 img/hr
SDXL Turbo FULL TIER
models/sdxl_turbo_bfp
Resolution
512×512
Steps
1
it/s
1-step
Min RAM
28 GB
Avg latency · reference
~27s
XL quality · single-step
throughput134.2 img/hr
Segmind Vega FULL TIER
models/Segmind-Vega_bfp
Resolution
1024×1024
Steps
20
it/s
~0.60
Min RAM
28 GB
Avg latency · reference
~33s
1024px · compact SDXL
throughput107.6 img/hr
SDXL Base 1.0 FULL TIER
models/sdxl-base-1.0_bfp
Resolution
1024×1024
Steps
50
it/s
~0.54
Min RAM
28 GB
Avg latency · reference
~93s
heaviest · 50 steps · ~8GB weights
throughput38.6 img/hr

Note: SD Turbo and SDXL Turbo are 1-step models — total latency, not it/s, is the meaningful metric. For production use, img/hr is what matters.

Capabilities · 04

GUI Visualization Dashboard.

A live Gradio dashboard — gallery, metrics, reports and stability analysis in one place.

2

Two test modes & Prompts

By Duration or By Rounds, with diverse scene prompts.

5 live

Live GUI Metrics

Real-time updates for images, failures, img/hr, it/s, and CPU utilization.

.txt

Reports & Auto-saving

Plain-text reports with stability analysis, auto-saved to JSON alongside checkpoints.

±5%

Stability & Safety

Monitors thermal throttling and automatically stops orphan processes on exit.

Self-Attention · Transformer Inference

Inside every NPU inference.

Visualizing the internal execution pipeline and data flow within the XDNA architecture.

Head 0 · Q
Head 1 · K
Head 2 · V
Head 3 · QKᵀ
hover a token to focus
Live Dashboard

Watch it generate, in real time.

Settings on the left, a live gallery and metrics on the right. Opens automatically on start.

Settings
ModeBy Duration
Duration1 hour
TierFull · 5 models
SeedRandom
Prompts60 · EN
RAM31.1 GB
87
Images
0
Failed
87.3
img/hr
0.59
it/s
34
CPU%
Performance Report

Every run, on the record.

Auto-saved to endurance_results/report_TIMESTAMP.txt — per-model breakdown plus stability trends.

report_20260603_204512.txt
============================================================
  NPU SD ENDURANCE TEST — PERFORMANCE REPORT
============================================================
  Generated   : 2026-06-03  20:45:12
  Duration    : 01:00:00          Mode : Duration
  Status      : COMPLETE
  Total runs  : 87  (OK: 87  FAIL: 0)   Success: 100.0%

  SYSTEM
  RAM   : 31.1 GB  (avail 14.8 GB)        Tier : FULL

  PER-MODEL RESULTS
  ──────────────────────────────────────────────────────
  Model           N   Avg(s)   Min    Max    Std    img/hr
  SD 1.5         20    22.97  22.10  24.50   0.62   156.7
  SD Turbo       20    19.72  18.90  21.30   0.58   182.5
  SDXL Turbo     17    26.82  25.50  28.10   0.72   134.2
  Segmind Vega   17    33.46  32.00  35.20   0.84   107.6
  SDXL Base      13    93.19  91.00  96.50   1.43    38.6
  ──────────────────────────────────────────────────────

  STABILITY ANALYSIS
  SD 1.5         : STABLE     trend +0.3%
  SD Turbo       : STABLE     trend +1.1%
  SDXL Turbo     : STABLE     trend -0.8%
  Segmind Vega   : STABLE     trend +0.5%
  SDXL Base      : STABLE     trend +1.2%

  Overall throughput : 87.3 images/hr
============================================================
  Design by HIKO1999GenAI
============================================================
Prerequisites

System Environment Preparation

Hardware

AMD Ryzen AI processor with an XDNA2 NPU.

Minimum 16 GB RAM. 32 GB recommended for Full tier with all 5 models.

Software

AMD RyzenAI 1.7.1 — sets up the conda env (xdna171) and all DLLs.

1.7.1 is required. 1.6 / 1.5 use incompatible model formats.

Models

AMD-optimized ONNX model folders — separate from standard HuggingFace checkpoints.

Auto-detected from the models/ folder; you don't need all 5.

Get Started · 03 steps

Up and running in three steps.

STEP 01

Clone & place models

Clone the repo and drop each ONNX model folder into models/.

git clone <repo> cd npu-benchmark-amd # models/sd_15, sd_turbo_bfp, ...
STEP 02

Verify RyzenAI

Activate the env and confirm the NPU is present.

conda activate xdna171 xrt-smi.exe examine # NPU Krackan → present
STEP 03

Run the benchmark

Set DD_ROOT and launch — the dashboard opens automatically.

set DD_ROOT=...\GenAI-SD python npu_sd_endurance_gradio.py # → 127.0.0.1:7862