Ternary Bonsai: Top Intelligence at 1.58 Bits

Pocket-size AI drops jaws — and sparks a "compare fair!" fight

TLDR: PrismML unveiled tiny 1.58‑bit “Ternary Bonsai” models that run fast on Apple devices while scoring near the top despite being much smaller. The thread erupts over fairness of comparisons and real-world adoption, with hardware fans cheering and skeptics asking why big labs aren’t using it.

PrismML just dropped Ternary Bonsai, a trio of ultra-compact AI models that squeeze “big brain” vibes into 1.58-bit weights (think three simple levels: -1, 0, +1). The 8B model weighs about 1.75GB yet posts a 75.5 average on standard tests, trailing only a much larger rival while claiming ~9x smaller memory. It flies on Apple gear via MLX—about 82 tokens/sec on an M4 Pro and 27 on an iPhone 17 Pro Max—and ships under the Apache 2.0 license. Fans are calling it “intelligence density” done right; critics are already sharpening knives.

The comments? A spectacle. One camp cheers the hardware win—“no multiplications at inference,” promises one enthusiast—imagining dirt-cheap chips running clever chatbots. Another camp throws shade at the charts: “Stop comparing to unquantized models,” scolds a skeptic, accusing the team of padding size wins. A pragmatist piles on: if this is so great, why aren’t the big labs doing it? Meanwhile, a veteran engineer roasts the test suites for being stuck in the 1970s (“hello, Intel 8085”), and dreamers ask what monstrosities we could cram into 20GB if these keep shrinking. The vibe: tiny models, giant debate—is this a true frontier shift or just compression cosplay? Either way, the comments are eating it up.

Key Points

  • PrismML released Ternary Bonsai, a true 1.58-bit ternary LLM family in 8B, 4B, and 1.7B sizes.
  • The models use group-wise quantization with {-s, 0, +s} weights and an FP16 scale per 128 weights, achieving ~9× smaller memory than 16-bit models.
  • Ternary Bonsai 8B averages 75.5 on benchmarks (1.75 GB), a 5-point gain over 1-bit Bonsai 8B (70.5, 1.15 GB) with ~600 MB more memory.
  • Among peers, Ternary Bonsai 8B trails only Qwen3 8B (16.38 GB) and performs strongly across MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval, and BFCLv3.
  • On-device performance: 82 toks/sec on M4 Pro (0.105 mWh/tok) and 27 toks/sec on iPhone 17 Pro Max (0.132 mWh/tok); runs via MLX with Apache 2.0 weights.

Hottest takes

"Yet again they're comparing against unquantized versions of other models." — wmf
"None of them seem to be using this technique" — mchusma
"no multiplications at inference time" — yodon
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.