RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

One gamer’s two-card AI monster wowed fans — and sparked a money-vs-DIY showdown

TLDR: A user combined two high-end graphics cards to make a very fast home AI setup, showing how far local tools have come. But the comments quickly turned into a brawl over whether this is a brilliant privacy-first hobby project or an expensive flex compared with cheap online AI access.

A hobbyist stitched together two powerful graphics cards — one newer, one older — and got a home AI chatbot setup spitting out answers at blazing speed. On paper, that’s the nerd dream: more memory, more power, more local control. But in the comments, the real show wasn’t the build itself — it was the instant split between the “this rules” crowd and the “why not just pay a few bucks online?” skeptics.

One camp was absolutely living for the do-it-yourself chaos. People swapped their own Frankenstein setups, from cheap Chinese adapter boards to spare power supplies, with a very strong “it works, don’t ask questions” energy. Another reader basically said the post read like a cooking recipe without the science, asking for more explanation about why the setup works instead of just a step-by-step guide. Translation: the gearheads wanted lore, not just instructions.

Then came the money discourse, because of course it did. One commenter dropped the cold-shower take: why spend well over $2,000 on hardware and electricity when renting access to the same model online costs pocket change? That instantly turned the story into a classic tech culture argument: privacy and control versus convenience and cost. And then there was the quiet flex from users saying they now prefer their local AI to big-name paid tools because when it messes up, at least it does so in a more obvious, less sneaky way. In other words: the machine may hallucinate, but the community drama is crystal clear.

Key Points

•The article describes a dual-GPU local LLM setup using an RTX 5080 and a refurbished RTX 3090 to run Qwen 3.6 with higher throughput.
•Adding the 24GB RTX 3090 allowed the author to run Qwen 3.6 Q4 locally, with performance rising from about 30 tok/s to 50–60 tok/s using MTP.
•The build used an Asus Prime X570-Pro motherboard because it can split a PCIe x16 connection into 2x8 for two GPUs.
•The article says the system must not boot in BIOS/MBR mode and lists required BIOS settings including disabling CSM, enabling Above 4G Decoding, enabling ReSize BAR, and setting both PCIe slots to Gen 4.
•For mixed Nvidia GPU models, the article recommends the nvidia-open driver rather than patched open-gpu-kernel-modules, and shows both cards recognized in nvidia-smi output.

Hottest takes

"what’s essentially just a recipe" — ComputerGuru

"well over 2k, not to mention the electricity" — deng

"It works." — avyeed_desa

June 13, 2026

GPU soap opera, now with extra watts

One gamer’s two-card AI monster wowed fans — and sparked a money-vs-DIY showdown

TLDR: A user combined two high-end graphics cards to make a very fast home AI setup, showing how far local tools have come. But the comments quickly turned into a brawl over whether this is a brilliant privacy-first hobby project or an expensive flex compared with cheap online AI access.

Key Points

Hottest takes

June 13, 2026

GPU soap opera, now with extra watts

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

One gamer’s two-card AI monster wowed fans — and sparked a money-vs-DIY showdown

TLDR: A user combined two high-end graphics cards to make a very fast home AI setup, showing how far local tools have come. But the comments quickly turned into a brawl over whether this is a brilliant privacy-first hobby project or an expensive flex compared with cheap online AI access.

Key Points

Hottest takes

Save News