April 14, 2026

AI speaks gibberish, devs scream

The M×N problem of tool calling and open-source models

Developers are losing their minds over AI ‘secret languages’ and nobody can agree what to do about it

TLDR: The article says every AI model uses its own weird “tool language,” forcing developers to constantly rebuild translators, and some see this as a major hidden crisis. Commenters are split between demanding a shared standard, betting on existing formats or protocols, and dismissing it as a fixable nuisance.

The article explains a very nerdy problem: every AI chatbot speaks its own private “tool language,” so apps have to build custom translators for each one. But the real show is in the comments, where the crowd splits into camps faster than you can say “JSON.” One reader grumbles that the real bug is… the layout: that giant code indent was so annoying they turned on “hide distractions” just to cope. Priorities.

Others treat the piece like a wake‑up call. One commenter calls it “one of the most relevant posts of the year,” basically saying: this is the boring plumbing that could make or break the AI boom. Why, they ask, hasn’t the industry converged on one sane format already? Another wonders if OpenAI’s Harmony format is supposed to be the chosen one, waiting for its prophecy to be fulfilled next model generation.

Then the skeptics wade in. One shrugs and says they “fail to see why this is such a problem,” arguing that writing a library to handle multiple formats is trivial and modern AI could probably write it itself in an afternoon. Another asks if this isn’t exactly what MCP (Model Context Protocol) was meant to solve. The result: a classic internet brawl between “this is a crisis” and “lol just write the parser.”

Key Points

  • Open-source model tool calling depends on model-specific wire formats; unsupported formats lead to garbled outputs and missing tool calls.
  • Each model family encodes tool calls differently (e.g., gpt-oss/Harmony, DeepSeek, GLM5), forcing engines to implement custom parsers and generation logic.
  • Real-world issues (e.g., Gemma 4 in vLLM) show reasoning tokens being stripped or leaking into arguments, and generic parsers failing, prompting dedicated implementations (e.g., llama.cpp).
  • Generic parsing heuristics break because formats vary widely (Harmony’s <|channel|> with to=, GLM5 non-JSON arguments), leaving a long tail of per-model bugs.
  • Both grammar engines (generation-time constraints) and output parsers (post-generation extraction) require the same model-specific format knowledge, indicating a missing separation and ongoing M×N burden.

Hottest takes

"One of the most relevant posts about AI on HN this year" — airstrike
"I fail to see why this is such a problem" — evelant
"Isn't this supposed to be the point of MCP?" — jiehong
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.