December 18, 2025
Code by bots, rage by humans
How I wrote JustHTML, a Python-based HTML5 parser, using coding agents
AI-made HTML parser drops; praise, 'port' accusations, and a 'not 100%' test fight
TLDR: An AI‑assisted Python HTML parser claims full test compliance and a sleek, no‑dependency design. Commenters quickly split the room: supporters cheer the ambition, while critics call it a Rust‑to‑Python port and dispute the “100%” claim with failing tests—turning the launch into a live fact‑check of AI‑built code.
JustHTML landed with a bang: a tiny Python library that reads messy webpages, built with coding agents (AI helpers) and boasting “100%” of the big HTML5 test suite, zero add‑ons, and even a search-by-CSS feature. The dev’s saga—17 steps, a pit stop rewriting the tokenizer in Rust, then a crisis over whether the world needs another parser—had readers hooked. Early hype from simonw called it “neat” and fully-tested, but the celebration didn’t last.
Commenters lit the fuse. minusf questioned originality, calling it a likely Python port of Rust’s html5ever and demanding clearer credit. Aloisius ran the tests and reported only 88.6% passing, poking holes in the “100%” story with example errors. furyofantares dunked on the blog’s AI-flavored formatting (“17 tiny sections?”). Amid the fire, one brave soul asked for a Postgres database plug‑in—because why not parse the web inside your spreadsheets. The author’s cheeky nod to the “adoption agency algorithm” and its “Noah’s Ark” rule (keep only three of a kind) spawned two‑by‑two memes, but the thread’s vibe was clear: cool demo, but receipts, please. Fans love the hustle and the no-dependency design; skeptics want real numbers, clearer credit, and fewer AI vibes in the write‑up.
Key Points
- •JustHTML is a pure-Python HTML5 parser with zero dependencies and a CSS selector query API.
- •The author claims JustHTML passes 100% of the html5lib test suite using extensive test-driven development.
- •Complex HTML5 parsing challenges are addressed, notably the adoption agency algorithm and the Noah’s Ark clause.
- •Development used coding agents (GitHub Copilot Agent mode, Claude Sonnet 3.7) with automated iteration against html5lib-tests.
- •Performance work included a Rust-based tokenizer to slightly surpass html5lib speed, and consideration of html5ever, with a pivot to a pure-Python approach.