Unsloth Dynamic 2.0 GGUFs

Faster local AI sparks speed flexes, flip fears, and SEO side-eye

TLDR: Unsloth’s Dynamic 2.0 promises smaller, faster local AI without losing accuracy. Commenters split between speed bragging, doubts about modest metric gains, and real-world reports that better settings reduce answer flip‑flops—highlighting a bigger fight over genuine progress versus hype.

Unsloth just dropped “Dynamic 2.0,” a new way to shrink AI models (aka quantization) while keeping answers closer to the original. They claim smarter, per-model compression, new formats that play nice with Apple chips, and a focus on two gut checks: MMLU (a big standardized test) and KL divergence (how much the model’s answers drift). But the comments stole the show.

One camp came in hot with speed flexes. Maxious bragged about Qwen3.5 hitting “200k context” at ~63 tokens/sec on a local RTX5080, and fans like electroglyph cheered the team. For home tinkerers, the promise is simple: smaller files, faster chat, fewer brain farts.

Then the skepticism. Havoc noted the KL gains look modest and asked what that actually means, while jychang side‑eyed the whole thread as “some weird SEO thing.” Unsloth’s own aside that reproducing the MMLU test was “nightmarish” only fueled the drama. Meanwhile, practitioner tenpa0000 brought receipts: at tiny model sizes, ultra‑compressed settings make yes/no answers flip‑flop, so KL and “flips” matter more than leaderboard scores. Cue jokes about models needing flip‑flops and the classic “is this progress or PR?” brawl. If Dynamic 2.0’s per‑model tuning really holds up, local AI could get snappier without sounding dumber — and that’s why everyone’s loud.

Key Points

  • Unsloth released Dynamic v2.0, an upgraded quantization method for GGUF and safetensors, claiming improved 5-shot MMLU and KL Divergence.
  • Dynamic v2.0 performs intelligent, per-layer quantization tailored to each model and now supports both MoE and non-MoE architectures.
  • New quant formats (Q4_NL, Q5.1, Q5.0, Q4.1, Q4.0) were added to optimize performance on Apple Silicon and ARM devices.
  • Unsloth built an evaluation framework to match official 5-shot MMLU for Llama 4 and Gemma 3, comparing Dynamic v2.0 against full precision, QAT, and imatrix GGUF.
  • The team emphasizes KL Divergence and 'flips' as key metrics, avoids calibration overfitting to Wikipedia for fair tests, and reports replication challenges for some models' 5-shot MMLU.

Hottest takes

"200k context running at 62.98 tokens per second" — Maxious
"Some weird SEO campaign thing?" — jychang
"Q2 starts flipping yes/no answers that Q4 gets right" — tenpa0000
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.