December 18, 2025
Open Source or Open Kimono?
Virtualizing Nvidia HGX B200 GPUs with Open Source
DIY GPU VMs wow devs while skeptics shout “not open”
TLDR: Engineers showed how to run virtual machines on NVIDIA’s HGX B200 using open-source tooling. Comments erupted over whether this is truly “open,” how safe multi-tenant isolation is, and if AMD and InfiniBand fit—mixing excitement with skepticism and turning a niche GPU guide into community drama.
A new guide claims you can spin up virtual machines on NVIDIA’s monster HGX B200 boards using open-source tools. The author jumps in saying the real headache was NVLink—NVIDIA’s super-fast GPU-to-GPU highway—because it makes isolating tenants tricky while keeping speed. Folks loved the transparency, with AMA invites flying, but that’s where the comment drama revved up. One skeptic asked point-blank if NVIDIA’s Fabric Manager—the control software for these GPU clusters—is even open source, quipping this felt more like “open kimono” than open-source, and pressed hard on security boundaries for multi-tenant setups. Another thread spun into the eternal Team Green vs Team Red meme, with a commenter noting you can swap “nvidia” for “amdgpu” and much of the how-to still works—cue the AMD stans. A practical crowd asked whether you can slice one GPU for multiple VMs without relying on NVIDIA’s MIG (their built-in slicing tech), and a speed freak wanted to know if InfiniBand—the fast network used in supercomputers—runs at full blast inside each VM. The vibe: builders are hyped, skeptics smell marketing, and everyone’s making jokes about cloud giants guarding secrets while indie hackers leak the DIY playbook. Chaos, memes, and GPU heat—classic
Key Points
- •The article explains how to virtualize NVIDIA HGX B200 GPUs using open-source tools on Linux.
- •HGX B200 uses SXM modules with NVLink/NVSwitch, creating a uniform high-bandwidth fabric that complicates virtualization.
- •Host setup includes switching drivers and permanently binding GPUs to vfio-pci, and aligning software versions between host and guest.
- •A PCI topology mismatch can block device initialization; QEMU is used to craft custom PCI layouts for guests.
- •A Large-BAR stall issue is addressed by upgrading to QEMU 10.1+ or disabling BAR mmap with x-no-mmap=true; GPU partitioning is managed with NVIDIA Fabric Manager and its API.