Stress-test Stable Diffusion
on the AMD Ryzen AI NPU.
A Gradio-based endurance benchmark that runs Stable Diffusion natively on the AMD Ryzen AI NPU — measuring real inference throughput, stability and thermal behavior over time.
NPU stress test. not CPU · not GPU
Five Stable Diffusion models.
AMD-optimized ONNX models running on XDNA2. Reference numbers measured on an XDNA2 architecture NPU. Download from AMD's Hugging Face model zoo.
Note: SD Turbo and SDXL Turbo are 1-step models — total latency, not it/s, is the meaningful metric. For production use, img/hr is what matters.
GUI Visualization Dashboard.
A live Gradio dashboard — gallery, metrics, reports and stability analysis in one place.
Two test modes & Prompts
By Duration or By Rounds, with diverse scene prompts.
Live GUI Metrics
Real-time updates for images, failures, img/hr, it/s, and CPU utilization.
Reports & Auto-saving
Plain-text reports with stability analysis, auto-saved to JSON alongside checkpoints.
Stability & Safety
Monitors thermal throttling and automatically stops orphan processes on exit.
Inside every NPU inference.
Visualizing the internal execution pipeline and data flow within the XDNA architecture.
Watch it generate, in real time.
Settings on the left, a live gallery and metrics on the right. Opens automatically on start.
Every run, on the record.
Auto-saved to endurance_results/report_TIMESTAMP.txt — per-model breakdown plus stability trends.
============================================================ NPU SD ENDURANCE TEST — PERFORMANCE REPORT ============================================================ Generated : 2026-06-03 20:45:12 Duration : 01:00:00 Mode : Duration Status : COMPLETE Total runs : 87 (OK: 87 FAIL: 0) Success: 100.0% SYSTEM RAM : 31.1 GB (avail 14.8 GB) Tier : FULL PER-MODEL RESULTS ────────────────────────────────────────────────────── Model N Avg(s) Min Max Std img/hr SD 1.5 20 22.97 22.10 24.50 0.62 156.7 SD Turbo 20 19.72 18.90 21.30 0.58 182.5 SDXL Turbo 17 26.82 25.50 28.10 0.72 134.2 Segmind Vega 17 33.46 32.00 35.20 0.84 107.6 SDXL Base 13 93.19 91.00 96.50 1.43 38.6 ────────────────────────────────────────────────────── STABILITY ANALYSIS SD 1.5 : STABLE trend +0.3% SD Turbo : STABLE trend +1.1% SDXL Turbo : STABLE trend -0.8% Segmind Vega : STABLE trend +0.5% SDXL Base : STABLE trend +1.2% Overall throughput : 87.3 images/hr ============================================================ Design by HIKO1999GenAI ============================================================
System Environment Preparation
Hardware
AMD Ryzen AI processor with an XDNA2 NPU.
Minimum 16 GB RAM. 32 GB recommended for Full tier with all 5 models.
Software
AMD RyzenAI 1.7.1 — sets up the conda env (xdna171) and all DLLs.
1.7.1 is required. 1.6 / 1.5 use incompatible model formats.
Models
AMD-optimized ONNX model folders — separate from standard HuggingFace checkpoints.
Auto-detected from the models/ folder; you don't need all 5.
Up and running in three steps.
Clone & place models
Clone the repo and drop each ONNX model folder into models/.
Verify RyzenAI
Activate the env and confirm the NPU is present.
Run the benchmark
Set DD_ROOT and launch — the dashboard opens automatically.
· The benchmark code (npu_sd_endurance_gradio.py) is released under the MIT License.
· AMD RyzenAI SDK, runtime DLLs and SD weights are subject to their respective licenses. Please support AI ecosystem toolchains from AMD and other vendors to boost AI development.
· Reference numbers tested on an XDNA2 architecture NPU, RyzenAI 1.7.1, Windows 11. Results vary by configuration and thermal conditions. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
· This is an independent community tool and is not an official AMD product. Ryzen, Ryzen AI and XDNA are trademarks of Advanced Micro Devices, Inc.