April 18, 2026
Benchmarks or it didn’t happen
My first impressions on ROCm and Strix Halo
AMD Strix Halo setup guide lands — comments demand “show the numbers”
TLDR: A guide shows how to get AMD’s Strix Halo running AI with ROCm and llama.cpp, but it shares steps without performance data. Commenters split between “thanks, it works,” “give us benchmarks,” and a debate over whether ROCm is even needed versus using Vulkan—highlighting early growing pains for new AMD AI rigs.
A clean, no-nonsense setup guide for running AI on AMD’s new Strix Halo chip just dropped — BIOS update, a couple of tweaks, ROCm (AMD’s GPU software), PyTorch, and llama.cpp for hosting a big Qwen model — and it immediately lit up the comments. The tutorial’s vibe is “here’s how I got it working,” but the community wants receipts. One top reply fires the flare: “Cool… but where are the numbers?” No screenshots, no speed tests, no memory usage — cue the classic meme: benchmarks or it didn’t happen.
That sparked a split. Pragmatists cheered the minimalist how‑to. “Perfect. No fluff,” one fan said, happy just to see PyTorch recognizing the GPU after a BIOS Wi‑Fi update (yes, the BIOS literally downloaded itself — gamer side quest unlocked). But the performance police pushed for data: timings, throughput, anything. A hardware tweaker jumped in with tuning tips, warning that running in FP16 (half-precision) “isn’t ideal” on this chip and urging smarter quantization to save bandwidth — think smaller, faster models for real work.
Then came the spicy meta-take: Do we even need ROCm here? One commenter argued Strix Halo’s whole pitch was unified memory and a shift toward Vulkan, the graphics/compute standard. Translation: why wrestle ROCm when compute shaders might do? And so the thread turned into the internet’s favorite brawl: ROCm loyalists vs. Vulkan enjoyers vs. “just give me numbers.”
Key Points
- •Ubuntu 24.04 LTS with official instructions was used to install and run ROCm on a Strix Halo system.
- •A BIOS update was required for PyTorch to detect the GPU; reserved VRAM was set low (e.g., 512 MB) to rely on GTT-shared memory.
- •GRUB parameters ttm.pages_limit and amdgpu.gttsize were set, with advice to leave 4–12 GB of system memory for the CPU for stability.
- •PyTorch 2.11.0 with ROCm 7.2 and triton-rocm was installed via uv using the PyTorch ROCm wheel index; IPython was used to verify HIP and GPU availability.
- •llama.cpp was run in a Podman container (server-rocm), serving a locally converted Qwen3.6 model in GGUF format; OpenCode was configured to use the local API endpoint.