Microgpt

200-line DIY mini AI drops: fans swoon, skeptics ask if it runs on your laptop

TLDR: Karpathy dropped microgpt, a 200‑line Python script that trains a tiny text model to make new name-like words. Comments split between applause and questions about real use, licensing, and whether a normal laptop can train it fast—turning it into a showdown of AI basics vs. practicality.

Andrej Karpathy just launched microgpt, a tiny 200-line Python file that trains a baby-sized text model to invent new names, no extras needed. The code packs everything—data, tokenizer, model, optimizer—into one tidy script, and you can peek at it on his page. The demos spit out names like “karia” and “vialan,” and the vibe split instantly: some users sighed “Beautiful work,” calling it art; others demanded, “What is the prime use case”—toy, teaching tool, or secret startup seed?

Then the practical brigade stormed in: can this run “on a consumer grade laptop… in less than a week”? Performance hawks circled, pitching a “language shootout” to see whose version—Python, Rust, you name it—wins. Legal eagles chimed in with, “Which license is being used for this?” Meanwhile, the meme machine spun up: folks joked about baby showers themed after generated names and rebranding their side projects to “Keylen.” Beneath the jokes is a real tension: the romance of simplicity vs. the grind of production. microgpt isn’t a ChatGPT replacement—it’s a clear, bite-sized window into how these models tick, and the community is equal parts smitten, suspicious, and itching to benchmark. Grab popcorn—and maybe a newborn named “Kamon.”

Key Points

  • Microgpt is a single-file (~200 lines) pure Python implementation that trains and runs a GPT-like model with no external dependencies.
  • The script includes a dataset handler, character-level tokenizer with a BOS token, autograd engine, GPT-2-like architecture, Adam optimizer, and training/inference loops.
  • Code is available as a GitHub gist (microgpt.py) and on a web page at karpathy.ai/microgpt.html.
  • The dataset comprises ~32,000 names (one per line), which the model learns to generate plausible new examples from after training.
  • The guide contrasts the simple character-level tokenizer with production tokenizers such as tiktoken used by GPT-4.

Hottest takes

"What is the prime use case" — tithos
"Which license is being used for this?" — ViktorRay
"on a consumer grade laptop... in less than a week" — profsummergig
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.