Furiosa: 3.5x efficiency over H100s

New AI box claims 3.5x over Nvidia—commenters demand real benchmarks

TLDR: FuriosaAI’s NXT RNGD Server promises 3.5× efficiency over Nvidia H100s for AI responses, fitting air‑cooled data centers and backed by LG’s results. Commenters are split: some love the power savings, others call the benchmarks cherry‑picked and worry about model support and vendor lock‑in, demanding apples‑to‑apples tests.

FuriosaAI just dropped the NXT RNGD Server, a plug-and-play box for AI “inference” (translation: making AI answer questions fast) that they say is 3.5× more power‑efficient than Nvidia’s H100 and runs on regular, air‑cooled racks. It ships with Furiosa’s software, supports popular serving tools, and even has a real‑world shoutout: LG AI Research reported 60 tokens per second on its EXAONE 32B model. The pitch: more AI, less electricity, no exotic plumbing. The vibe: spicy.

Commenters rolled in like a QA team with espresso. One called the comparison chart “really weird” for pitting Furiosa against 3 H100 PCIe cards instead of a full 8‑GPU box—demanding apples‑to‑apples watts vs words. Another saw the benchmark for Llama 3.1 8B and instantly yelled “hand‑tuned!” while asking for bigger, newer models. There’s tension between the efficiency crowd (“air‑cooled gang finally gets a W”) and the flexibility crew warning that GPUs may be inefficient but they run everything. Usability questions popped up too: Will buyers be locked into a niche ecosystem or can they serve any model without drama? And the jokers chimed in with “wake me when it trains something,” plus a hot take that TSMC wants more Nvidia rivals anyway. Verdict: bold claims, sharper knives, popcorn ready.

Key Points

  • FuriosaAI launched the NXT RNGD Server, a turnkey AI inference system built around RNGD accelerators.
  • The server delivers up to 4 petaFLOPS FP8 per system with dual AMD EPYC CPUs and supports BF16, FP8, INT8, and INT4.
  • It is air‑cooled, runs at 3 kW, and integrates via standard PCIe, targeting deployment in existing data centers with 8 kW/rack or less.
  • LG AI Research validated performance: EXAONE 3.5 32B achieved 60 tokens/s (4K context) and 50 tokens/s (32K) on four RNGD cards.
  • Software stack includes preinstalled Furiosa SDK and Furiosa LLM runtime, vLLM compatibility, Kubernetes/Helm integration, and OpenAI API support.

Hottest takes

"really weird graph where they're comparing to 3x H100 PCI-E" — darknoon
"Show me a benchmark for gpt-oss-120b" — zmmmmm
"Got excited, then I saw it was for inference. yawns" — whimsicalism
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.