June 16, 2026
One tiny part, one huge meltdown
4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
A tiny missing part turned a monster AI machine into comment-section chaos
TLDR: A powerful liquid-cooled AI training machine kept failing because a tiny power part had accidentally been pulled off and left on the bench. Commenters turned it into a full spectacle: some were amazed the machine exists, others questioned its usefulness, and a few gleefully nitpicked the builder’s explanations.
This was supposed to be a victory lap: a giant homegrown AI training machine packed with four ultra-powerful graphics cards and enough liquid cooling to make a car jealous. Instead, the real plot twist was gloriously petty: one card kept crashing, and after all the usual software blame games, the culprit turned out to be a tiny piece of hardware left sitting on the workbench. Yes, the whole drama came down to one little part going AWOL during assembly.
And the comments? Absolutely the main event. One camp was impressed to the point of disbelief, basically gawking at the idea of a "little computer" trying to train large AI models at all, with one commenter asking the obvious outsider question: is this thing really training AI from scratch, and where does all the data even come from? Another faction skipped the detective story entirely and went straight into backseat-engineering mode, suggesting the builder should ditch the army of fans for one giant fan because, naturally, the internet can never resist redesigning someone else’s machine.
Then came the nerd-sniping. One sharp reply openly mocked the post’s explanation of the physical forces involved, joking that maybe the same AI wrote that part. Ouch. On the lighter side, there was also a tiny mini-drama over someone misreading "at" as "Δt" before sheepishly walking it back — a perfect comment-thread cameo. The mood overall was a mix of respect, skepticism, nitpicking, and comedy, which is exactly what happens when someone builds a fire-breathing AI box and then reveals it was defeated by a part barely bigger than a crumb.
Key Points
- •The article covers a four-GPU RTX PRO 6000 Blackwell workstation converted from air cooling to custom water cooling for sustained model-training workloads.
- •The build uses four 600 W GPUs, dual 1500 W power supplies, two large Alphacool 1260 mm radiators, and four Bykski full-cover waterblocks.
- •After conversion, three GPUs operated normally, but one card repeatedly failed under sustained load while appearing normal at idle and during short tasks.
- •System logs showed NVRM Xid 79 and Xid 154 errors plus PCIe AER messages indicating a DPC containment event, pointing to a hardware or power-delivery issue rather than software.
- •Inspection revealed that a VRM choke had detached during the waterblock conversion, likely while removing the stock thermal pad, explaining why the card failed only at higher sustained current draw.