June 13, 2026

GPU soap opera, now with extra watts

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

One gamer’s two-card AI monster wowed fans — and sparked a money-vs-DIY showdown

TLDR: A user combined two high-end graphics cards to make a very fast home AI setup, showing how far local tools have come. But the comments quickly turned into a brawl over whether this is a brilliant privacy-first hobby project or an expensive flex compared with cheap online AI access.

A hobbyist stitched together two powerful graphics cards — one newer, one older — and got a home AI chatbot setup spitting out answers at blazing speed. On paper, that’s the nerd dream: more memory, more power, more local control. But in the comments, the real show wasn’t the build itself — it was the instant split between the “this rules” crowd and the “why not just pay a few bucks online?” skeptics.

One camp was absolutely living for the do-it-yourself chaos. People swapped their own Frankenstein setups, from cheap Chinese adapter boards to spare power supplies, with a very strong “it works, don’t ask questions” energy. Another reader basically said the post read like a cooking recipe without the science, asking for more explanation about why the setup works instead of just a step-by-step guide. Translation: the gearheads wanted lore, not just instructions.

Then came the money discourse, because of course it did. One commenter dropped the cold-shower take: why spend well over $2,000 on hardware and electricity when renting access to the same model online costs pocket change? That instantly turned the story into a classic tech culture argument: privacy and control versus convenience and cost. And then there was the quiet flex from users saying they now prefer their local AI to big-name paid tools because when it messes up, at least it does so in a more obvious, less sneaky way. In other words: the machine may hallucinate, but the community drama is crystal clear.

Key Points

  • The article describes a dual-GPU local LLM setup using an RTX 5080 and a refurbished RTX 3090 to run Qwen 3.6 with higher throughput.
  • Adding the 24GB RTX 3090 allowed the author to run Qwen 3.6 Q4 locally, with performance rising from about 30 tok/s to 50–60 tok/s using MTP.
  • The build used an Asus Prime X570-Pro motherboard because it can split a PCIe x16 connection into 2x8 for two GPUs.
  • The article says the system must not boot in BIOS/MBR mode and lists required BIOS settings including disabling CSM, enabling Above 4G Decoding, enabling ReSize BAR, and setting both PCIe slots to Gen 4.
  • For mixed Nvidia GPU models, the article recommends the nvidia-open driver rather than patched open-gpu-kernel-modules, and shows both cards recognized in nvidia-smi output.

Hottest takes

"what’s essentially just a recipe" — ComputerGuru
"well over 2k, not to mention the electricity" — deng
"It works." — avyeed_desa
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.