Trinity large: An open 400B sparse MoE model

400B ‘open’ AI drops free to try — devs cheer, purists squint, wallets scream

TLDR: Arcee released a giant but efficient AI model you can try for free, with a pristine “TrueBase” version for researchers. The crowd’s split between hype over performance and cost, arguments about what “open” really means, and practical worries about whether anyone can run it without huge hardware.

Arcee just unleashed Trinity Large, a 400-billion-parameter “Mixture of Experts” model — think a giant brain where only a few specialists wake up per question — and the internet’s already speedrunning the vibes. It’s free to try on OpenRouter for now, and comes in three flavors: a chatty Preview, a benchmark-crushing Base, and a “TrueBase” that’s as untouched as researchers dream of (link).

The hype train? Loud. One camp is swooning over a real “true base” to study; another is side-eyeing the word “open” and demanding receipts on weights vs data. Budget gossip hit hard too: commenters are buzzing about a ~33-day, ~$20M sprint and claiming it hangs with Qwen and DeepSeek — cue the “fast, not cheap” memes. Non-nerds asked the question everyone’s thinking: can regular humans run this, or do you need a small power plant? With only about 13B parameters “awake” per request (4 of 256 experts), fans say it should be snappier; skeptics say you still need serious hardware.

Between flexes about 2–3x faster inference, and jokes about “expert routing therapy,” the thread oscillates between celebration and audit mode. Verdict for now: impressive drop, spicy debate on openness, and a stampede of researchers lining up to poke the “TrueBase” beast.

Key Points

  • Arcee AI released Trinity Large, a 400B-parameter sparse MoE model, with Preview, Base, and TrueBase checkpoints.
  • The model uses 256 experts with 4 active per token (1.56% routing fraction) and 13B active parameters per token; dense layers were increased from 3 to 6 to stabilize routing.
  • Training used 2,048 Nvidia B300 GPUs over just over 30 days, claimed as the largest publicly stated run on these machines.
  • High sparsity and an efficient attention mechanism enabled roughly 2–3x faster training and inference on the same hardware compared to peers.
  • Training stability techniques include momentum-based expert load balancing (with tanh clipping, momentum, per-sequence balance loss) and z-loss to control LM-head logit scale.

Hottest takes

"33 days for ~20m... Pretty impressive" — mynti
"What exactly does “open” mean... just weights?" — frogperson
"would it run well... or do you need to hold the full model in RAM" — mwcampbell
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.