Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

One tool to watch every AI chip—fans cheer, purists say use nvtop

TLDR: zml-smi debuts as a single tool to monitor GPUs, Google TPUs, and AWS Trainium. The thread splits between praise for cross‑vendor coverage and criticism of a 'hacky' AMD file redirect plus calls to upstream it to nvtop, with extra side‑eye over 'NPU' meaning Trainium only.

zml-smi just dropped as a “one screen to rule them all” monitor for GPUs (graphics cards), Google TPUs (AI accelerator chips), and AWS Trainium (Amazon’s AI chip). Think a mashup of nvidia-smi and nvtop with a live “top” view, plus host and process stats, all in a tidy, sandboxed download from the official mirror. On paper: fewer windows, more vibes.

But the comments? Spicy. One camp loves the cross‑vendor coverage: finally, a single dashboard for NVIDIA, AMD, TPUs, and Trainium. The other camp is side‑eyeing the AMD trick where the tool intercepts a file lookup to keep everything self‑contained—critics call it a fragile “hack,” not “sandboxing.” User mrflop lit the fuse: “brittle hack masquerading as ‘sandboxing,’” and asked why this isn’t just upstreamed to nvtop instead of “fragmenting” tools.

Then rdyro chimed in with a flex: nvtop can already handle TPUs via libtpuinfo, reviving the classic open‑source soap opera—new tool vs. upgrade the old one. Meanwhile, 152334H poked the naming bear: does “NPU” here basically mean Trainium only? Cue memes about “universal” meaning “universal…ish,” and quips like “my htop is crying” as another monitor joins the party.

Bottom line: zml-smi brings real-time performance, temps, memory, and process info across vendors in one pane—but the thread is a tug‑of‑war between people who want a bold new hub and folks who’d rather see the features land in the tools they already trust.

Key Points

•zml-smi is a universal, sandboxed monitoring tool for GPUs, TPUs, and NPUs, combining features of nvidia-smi and nvtop.
•It supports NVIDIA, AMD, Google TPU, and AWS Trainium, with plans to add more platforms as ZML expands.
•Installation is via a downloadable archive; the tool lists devices and offers real-time monitoring with the --top flag.
•Host and process metrics are provided, plus platform-specific metrics sourced through NVML (NVIDIA), AMD SMI/libdrm (AMD), TPU runtime gRPC (Google TPU), and libnrt.so (AWS Trainium).
•For AMD, zml-smi merges amdgpu.ids from Mesa and ROCm and uses a zmlxrocm.so interposer to redirect file access, enabling updated device recognition within the sandbox.

Hottest takes

"feels like a brittle hack masquerading as 'sandboxing'" — mrflop

"nvtop can actually support TPUs too" — rdyro

"'NPU' seems to refer to Trainium only?" — 152334H

April 4, 2026

Comment wars go brrr

One tool to watch every AI chip—fans cheer, purists say use nvtop

TLDR: zml-smi debuts as a single tool to monitor GPUs, Google TPUs, and AWS Trainium. The thread splits between praise for cross‑vendor coverage and criticism of a 'hacky' AMD file redirect plus calls to upstream it to nvtop, with extra side‑eye over 'NPU' meaning Trainium only.

Key Points

Hottest takes

April 4, 2026

Comment wars go brrr

Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

One tool to watch every AI chip—fans cheer, purists say use nvtop

TLDR: zml-smi debuts as a single tool to monitor GPUs, Google TPUs, and AWS Trainium. The thread splits between praise for cross‑vendor coverage and criticism of a 'hacky' AMD file redirect plus calls to upstream it to nvtop, with extra side‑eye over 'NPU' meaning Trainium only.

Key Points

Hottest takes

Save News