All of human cooking compressed into 2 megabytes

Scientists say they squeezed the world’s recipes tiny — commenters say, not so fast

TLDR: Researchers trained a tiny model on millions of recipes to map which ingredients go well together across cultures. Commenters liked the idea but roasted the headline, arguing it compressed ingredient relationships — not "all of human cooking" — and pounced on the paper’s wording.

A research team says it packed a huge chunk of the world’s food knowledge into a tiny 2-megabyte model by studying 4.14 million recipes in multiple languages and reducing messy ingredient names into about 1,790 core ingredients. In plain English: they taught a small system that tomatoes and beef tend to belong together, garlic loves a crowd, and ingredients have relationship drama across cuisines. But in the comments, the real heat wasn’t in the kitchen — it was over the title.

Several readers instantly called out the headline as way too dramatic. One blunt reaction: this is not “all of human cooking,” because cooking is more than ingredient lists — it’s technique, timing, quantity, heat, and all the other stuff that stops dinner from becoming chaos. One commenter suggested the paper really compressed ingredients, not cooking itself, and that became the thread’s main reality check. Another person dryly dropped “Neat” and shared their own recipe schematic project, which gave the whole discussion strong “cool idea, oversold trailer” energy.

Then came the nitpickers, because of course they did. One reader zoomed in on the paper’s wording around so-called deterministic AI classification and basically said: low randomness is not the same as deterministic. Another pointed out the work came from a startup building automated restaurants, which added a little futuristic spice. The vibe? Genuinely impressed by the food data, deeply suspicious of the marketing, and very ready to dunk on imprecise wording.

Key Points

  • Epicure is a family of three skip-gram ingredient embedding models trained from scratch on a multilingual recipe corpus.
  • The dataset aggregates 4.14 million recipes from 11 sources spanning multiple languages, with ingredient strings normalized to 1,790 canonical entries.
  • The pipeline uses a 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph.
  • The FlavorDB graph includes 2,247 typed compound nodes across 15 categories.
  • Three Metapath2Vec variants—Cooc, Chem, and Core—use different random-walk schemas to emphasize recipe co-occurrence, chemistry, or a blend of both.

Hottest takes

"I don't see why the title needs to be quite so grandiose" — suddenlybananas
"There is little to substantively nothing about the actual cooking" — epsteingpt
"low-temperature is not the same thing as deterministic" — Retr0id
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.