May 12, 2026

Small model, huge comment chaos

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Tiny AI wowed builders, but the comments instantly turned into a demo-and-legal fight

TLDR: Needle is a very small AI that claims it can do useful app-control tasks on everyday computers, which could make personal AI much cheaper and easier to run. Commenters were impressed but quickly turned the launch into a mix of bug reports, product ideas, demands for a public demo, and legal worries over Google’s rules.

A tiny new AI project called Needle rolled onto Hacker News with a huge promise: a model small enough to run and even be retrained on an ordinary laptop, yet still good at turning plain-English requests into app actions like “check the weather.” That alone got people excited, because the dream here is obvious even to non-experts: smarter phones, gadgets, command lines, and personal tools without needing a giant cloud service hovering over everything.

But the real show was in the comments, where the community immediately split into three classic internet camps: the builders, the nitpickers, and the hall monitors. One early reaction came from Simon Willison, who tried the README and instantly hit a broken link wall, basically turning the launch into a mini “works on my machine” scandal. He then twisted the knife with a very Hacker News suggestion: if it’s really that small, why not put up a live demo on a cheap server and let everyone poke it?

Meanwhile, some readers were already dreaming up weird and wonderful uses, like turning natural language into command-line arguments or plugging it into a text-based game system. That sparked the funnier side of the thread: excitement mixed with the eternal developer panic of “great, now every app is going to ship with another 14 MB just to parse text.”

Then came the spiciest twist: one commenter warned that distilling Gemini may violate Google’s terms of service, abruptly yanking the mood from scrappy open-source triumph to possible legal side-eye. So yes, people loved the tiny-AI ambition — but they also wanted working links, public proof, and maybe a lawyer on standby.

Key Points

  • Needle is described as a 26M-parameter Simple Attention Network distilled from Gemini 3.1 for single-shot tool calling.
  • The project says the model is fully open, including weights and dataset generation, and can be fine-tuned locally on a Mac or PC.
  • The article reports production performance on Cactus of 6,000 tokens/sec prefill and 1,200 tokens/sec decode.
  • Training details given are 200B pretraining tokens on 16 TPU v6e in 27 hours, followed by 2B post-training tokens in 45 minutes.
  • The post includes quickstart, Python, and CLI workflows for inference, fine-tuning, training, evaluation, tokenization, synthetic data generation, and TPU management.

Hottest takes

"Looks like you need to open up access" — simonw
"it could be pretty bad if everyone started doing that" — ilaksh
"distilling Gemini is explicitly against the ToS" — ac29
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.