December 16, 2025
PCIe or it didn’t happen
Show HN: Deterministic PCIe Diagnostics for GPUs on Linux
A no‑BS GPU lie detector drops; crowd cheers, “NVIDIA only?”
TLDR: A new Linux tool tells you if your GPU’s connection to the computer is running at full speed using only hard data, then labels it OK, DEGRADED, or UNDERPERFORMING. The community loves the proof‑based verdicts but debates NVIDIA‑only support and asks for extras like memory checks.
Hacker News just got a new toy: a command‑line “truth meter” for your GPU’s PCIe link, and the crowd came in hot. The [tool] checks whether your graphics card is talking to your computer at full speed, using only hard numbers—link speed, lane count, copy rates, and hardware counters—and spits out a dramatic verdict: OK, DEGRADED, or UNDERPERFORMING. No tweaks, no magic; just receipts. Fans love the no‑nonsense vibe: “Finally, proof my riser cable isn’t cursed.” Skeptics immediately asked the obvious: is this NVIDIA‑only? One commenter’s side‑eye said it all: if it needs CUDA (NVIDIA’s toolkit) and NVML (NVIDIA’s monitoring library), where does that leave AMD and Intel?
Then came the feature wishlist. Someone asked for bad memory block checks (translation: can it sniff flaky VRAM?), while practical folks begged for cross‑vendor support. The jokes flew fast: “My PCIe link negotiated x1—aka drinking a milkshake through a coffee straw.” Others loved the clean verdicts, calling them the “scarlet letters” of PC building—proof when BIOS updates sneak your GPU from x16 down to x8. Purists clapped back at anyone wanting auto‑fixes: this is a diagnostic, not a wrench. Cue the drama: team “observability is king” vs. team “fix my stuff, now.” Either way, it’s the rare tool that turns invisible bottlenecks into visible truth—and yes, it’s Linux‑first, NVIDIA‑powered, and unapologetically data‑driven.
Key Points
- •A deterministic Linux tool validates GPU PCIe link health using only observable hardware data.
- •It measures link gen/width via NVML, peak Host↔Device bandwidth via CUDA memcpy, and sustained utilization via NVML TX/RX counters.
- •Rule-based verdicts (OK, DEGRADED, UNDERPERFORMING) are derived from measured link state and throughput relative to theoretical payload bandwidth.
- •Requirements include NVIDIA GPU, CUDA Toolkit, and NVML; tested on Ubuntu 24.04.3 LTS and may need elevated privileges.
- •Features include CSV/JSON logging, configurable telemetry duration for stability, and optional AER integrity checks via Linux sysfs.