AI companies charge you 60% more based on your language, BPE tokens

Internet erupts: “Language tax!” vs “It’s just math”

TLDR: A viral post says non‑English users pay up to 60% more because AI bills by tiny “tokens” that vary by company, making the same text cost more in some languages. Comments split between “hidden language tax” outrage and “it’s just cost math,” with calls for CJK data and transparency.

An explosive newsletter claims AI companies charge non‑English speakers up to 60% more because they bill by tiny text chunks called “tokens,” not words. It name‑drops different token counters (OpenAI’s, Google’s, Anthropic’s) and even calls out Anthropic’s “black box” billing. It also flaunts a wild 420× price gap between fancy and bargain models. Cue the comment war.

One camp is furious, dubbing it a “language tax” and arguing the system quietly punishes anyone not typing in American English. They want receipts too: one commenter deadpanned that the piece didn’t even include Chinese/Japanese/Korean stats, implying the gap could be worse. Another roasted the tone as near‑satire. The mood: suspicious, spicy, and very online.

But the pushback is loud. Skeptics say this is obvious economics, not a conspiracy: tokens cost compute, compute costs money. As one wag put it, “pay by token is priced by token; news at 11,” comparing it to French publishers paying more for paper. Engineers chimed in that usage scaling is linear with tokens, so higher bill = higher cost, full stop.

In the middle, confused readers asked why the same sentence can cost different amounts across companies. The explainer: each company chops text differently, so “unbelievable” might be 2, 3, or 4 tokens depending on who’s counting. That’s where the drama lives—less about math, more about opacity and fairness. Is it a hidden tax or just poorly explained pricing? The comments are split down the middle, and the memes are writing themselves.

Key Points

  • AI model usage is billed by tokens (subword units), not words, and token counts vary by provider tokenizer.
  • Different tokenizers (e.g., OpenAI’s tiktoken, Google’s SentencePiece, Anthropic’s proprietary method) split text differently, changing costs.
  • There is no standardized token definition; providers use distinct, sometimes opaque, tokenizers and vocabularies.
  • Non‑English languages generally tokenize into more tokens per word, leading to higher relative costs (e.g., Spanish ~1.6×, Hindi ~4.9× vs English).
  • As of March 2026, per‑million‑token prices vary widely across models, with a cited 420× gap between the cheapest and the most expensive options.

Hottest takes

"Funny they didn't include any CJK languages on their list." — Mindless2112
"‘Pay by token’… news at 11?" — lxgr
"You are charged more because it was more expensive to handle the request." — charcircuit
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.