June 22, 2026

Your PC vs the AI refrigerator

Unsloth GLM-5.2 – How to Run Locally

This AI beast can live on a home computer—if your wallet survives the comments

TLDR: GLM-5.2 is a new ultra-powerful AI model that can now run on some top-end home computers instead of only in the cloud. Commenters are split between hype that local AI is catching up fast and skepticism that “it runs” really just means “it barely crawls unless you own a monster machine.”

A new giant AI model called GLM-5.2 just strutted onto the scene with a huge claim: it can now be run locally, meaning on your own machine instead of paying a big company every time you ask it a question. That’s the shiny headline. But in the comments, people immediately turned this into a full-blown hardware reality check. One camp was thrilled that a model this powerful can be squeezed down enough to fit on very high-end home setups. Another camp basically said: sure, it “fits,” but does it actually run in any fun or useful way?

That split is where the drama lives. One commenter mourned that their already-ridiculous setup—192GB of memory plus a powerful graphics card—was still not enough, calling it “so close!” Another dreamed that AMD’s next AI chip might finally make this easy, and even fantasized about near-top-tier AI power for under 2,000 euros. That sparked the bigger hot take: if local AI keeps improving this fast, are big AI companies about to get very nervous?

But the skeptics were not having the victory lap. The spiciest pushback came from people warning that “it can fit” is doing a lot of work here: yes, the model may technically load, but it could still be painfully slow compared with using an online service. There was also some nerdy side-eye over why this model seems so much smaller than rivals, with commenters sniffing around for hidden compromises. In other words: part celebration, part copium, part shopping spree.

Key Points

  • The article says Z.ai’s GLM-5.2 open model has 744B parameters, 40B active parameters, and a 1M context window, targeting coding, reasoning, and agentic tasks.
  • Unsloth Dynamic GGUF quantizations reduce GLM-5.2 storage from 1.51TB to 239GB for 2-bit and 217GB for 1-bit variants.
  • The recommended UD-IQ2_M 2-bit quant is described as fitting on a 256GB unified memory Mac and working with a 24GB GPU plus 256GB RAM using MoE offloading.
  • GLM-5.2 supports non-thinking, high-thinking, and max-thinking modes, configurable in Unsloth Studio or via llama.cpp command-line options.
  • The article reports KLD-based quantization results, stating that dynamic 4-bit and 5-bit variants are generally lossless, while 1-bit and 2-bit retain about 76.2% and 82% top-1 accuracy respectively.

Hottest takes

"So close! My machine with 192GB RAM + RTX 3090 24GB can almost run this" — xrd
"I am very excited for local LLMs... we may have GPT 5.5-xhigh level of performance for under 2000 EUR" — zuzululu
"'it can fit' on 256GB of RAM, but it will be heavily quantized and still run very slowly" — skiing_crawling
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.