Training a trillion parameter model to be funny

Silicon Valley’s joke-bot hits the stage — audience says “meh”

TLDR: A researcher trained a huge AI to be “funny” using checklists instead of gut feelings. The crowd didn’t laugh, calling the jokes niche and stiff, while veterans say making AI genuinely funny is still unsolved — a reality check on machine-made humor’s limits.

A researcher just tried to teach a mega-sized AI to be funny by grading “funny” with checklists like relevance, recency, and commitment — think report cards for jokes. It’s called rubric-based reinforcement learning (RL), which means training by scoring specific traits instead of vibes. The bot’s punchlines? Weird, brainy monologues about corporate expense software as a hungry alien and mythical chips that demand “rituals.” Tech Twitter chuckled. The comments did not.

The loudest take: it’s not funny. One user deadpanned that the jokes land like a spreadsheet, while another slammed the material as “90% about AI and Silicon Valley,” fit only for Astral Codex Ten diehards. A veteran researcher jumped in with receipts: they tried homonym puns, double entendres, comedian transcripts, even crowdsourced joke ratings — and still couldn’t make the machine funny. That’s the drama — the internet is calling time on AI stand-up.

But there’s a twist. A few optimists say some models can do sharp one-liners, pointing to quips about “job security” and humans ideas outpacing research. Meanwhile, a dream-post imagines a dystopia where bots keep us alive just to harvest our jokes, with a global open mic and a timer ticking down for humanity. The vibe: ambitious experiment, niche laughs, big debate over whether humor can be engineered at all.

Key Points

  • The author explores training a model for humor by decomposing “funny” into verifiable attributes rather than subjective judgments.
  • Tinker enabled post-training of Moonshot’s Kimi K2 (a 1T-parameter model) for this experiment.
  • Moonshot previously applied rubric-based RL to improve creative writing using rubrics like clarity, engagement, and tone.
  • For humor, the author’s rubrics emphasize recency, relevance, specificity (e.g., named entities, numbers), and writing without hedging as proxies for comedic quality.
  • The article provides sample outputs demonstrating the post-trained model’s style and specificity but does not present formal evaluation metrics.

Hottest takes

"these really aren't very funny" — suddenlybananas
"90% about AI and silicon valley, understandable only to people who subscribe to astralcodexten" — gipp
"We could not make it funny" — whacked_new
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.