Running local models on an M4 with 24GB memory

This tiny AI setup impressed tinkerers, but the comments turned into a memory-size brawl

TLDR: A user got a small offline AI assistant running decently on a 24GB M4 MacBook, showing local AI is possible on a regular laptop. But the comments stole the show, with readers fighting over memory limits, missing speed numbers, and whether this saves money or just wastes time.

A brave tinkerer took to local AI land with a 24GB M4 MacBook Pro and declared a small win: yes, you really can run a useful offline assistant on it, with no internet and a little less reliance on Big Tech. The setup that got the most love was a smaller Qwen model running fast enough for basic coding, planning, and research. But if the post was meant to be a celebration, the comment section instantly turned it into a nerdy reality check.

One of the first reactions was pure detective energy: wait, does an M4 even come with 24GB? That sparked instant confusion, with readers wondering if the machine specs were wrong before they even got to the AI part. Then came the speed police, demanding the one metric everyone cares about: how many words-per-second does this thing actually spit out? Nothing gets tech commenters fired up faster than missing benchmark numbers.

And then the real drama hit: is local AI on a laptop actually worth it, or is this just an expensive hobby? Some commenters were cautiously impressed, saying today’s smaller models feel about as good as the best systems from a year ago. Others were brutally unimpressed, basically saying 24GB is "cute" but nowhere near enough for serious work. One person flat-out warned that chasing local AI can become a time sink where tweaking the setup eats more hours than the work itself. Meanwhile, another commenter was already doing the most relatable math on earth: can a future 128GB Mac finally kill their monthly AI subscription bill? The vibe was equal parts hopeful, skeptical, and gloriously obsessed.

Key Points

  • The article describes the challenges of setting up local AI inference on a 24GB M4 MacBook Pro, including choosing runtimes, fitting models into memory, and tuning inference settings.
  • The author tested multiple models—Qwen 3.6 Q3, GPT-OSS 20B, Devstral Small 24B, and Gemma 4B—and found that some fit in memory but were not practically usable, while Gemma 4B lacked strong tool use.
  • Qwen 3.5-9B in 4-bit quantization was identified as the most workable model, delivering roughly 40 tokens per second, tool use, thinking mode, and a 128K context window in LM Studio.
  • The article provides specific recommended parameters for coding in thinking mode and notes that enabling thinking in LM Studio required editing the prompt template.
  • Example configurations are included for using the setup with pi and OpenCode, and the article concludes that this local model remains well below state-of-the-art systems for complex long-duration tasks.

Hottest takes

"The M4, as far as I know, doesn’t have 24GB" — NBJack
"24GB is just a bit short" — canpan
"your time is not free" — sourc3
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.