February 28, 2026
DIY Skynet, dial‑up vibes
Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster
DIY trillion‑model flex has dial‑up vibes; commenters roast the tiny network
TLDR: AMD demoed a four‑PC home setup running a trillion‑parameter AI model with open tools. Commenters cheered the milestone but slammed the 1.5‑minute first reply, 8.3‑tokens‑per‑second speed, and five‑gig networking, calling it a $10k science project rather than a practical ChatGPT competitor.
AMD just showed how to run a trillion‑parameter AI model at home by chaining four Ryzen AI Max+ PCs, then using an open‑source tool called llama.cpp to make them act like one big brain. It runs Moonshot’s open Kimi‑K2.5, which can handle code, long reasoning, and even video. Sounds epic—until the comments roll in. One user drops the mic with: a 1.5‑minute “time‑to‑first‑token” (how long before it starts typing) and about 8.3 tokens per second (think slow words‑per‑second). Another fires back: ChatGPT usually replies in under a second at around 50 tokens per second. The vibe: “DIY Skynet, but with buffering.”
Then the network wars begin. The cluster uses five‑gigabit Ethernet, and people are roasting it. “Only 5‑gig? Where’s Thunderbolt 40Gbps?” asks one. Another goes scorched earth, accusing the Framework desktop of fake upgradability and sneering “5 GbE is a joke.” Meanwhile, the price talk hits hard: a commenter pegs the setup at around $10k, with rumors there are barely any of these mini PCs in the wild. Others admit they can’t even run a tiny 3‑billion‑parameter model without lag and ask, “So… how much for this?”
Bottom line: the community’s split between “wow, it works!” and “why would you?”—with memes about dial‑up‑speed mega‑brains stealing the show.
Key Points
- •AMD provides a guide to run a one trillion-parameter class LLM locally using a four-node Ryzen AI Max+ cluster.
- •The setup uses 4 Framework Desktop systems (Ryzen AI Max+ 395, 128GB each) with Ubuntu 24.04.3 LTS.
- •Inference is orchestrated via llama.cpp RPC with AMD ROCm, using the Kimi K2.5 (UD_Q2_K_XL) model (~375GB).
- •Networking between nodes is 5 Gbps Ethernet, enabling distributed local inference as a single logical accelerator.
- •Memory configuration includes setting iGPU Memory Size to 512MB in BIOS and leveraging Linux TTM to extend VRAM allocation, with 96GB per node available as dedicated VRAM.