How I wrote JustHTML, a Python-based HTML5 parser, using coding agents

AI-made HTML parser drops; praise, 'port' accusations, and a 'not 100%' test fight

TLDR: An AI‑assisted Python HTML parser claims full test compliance and a sleek, no‑dependency design. Commenters quickly split the room: supporters cheer the ambition, while critics call it a Rust‑to‑Python port and dispute the “100%” claim with failing tests—turning the launch into a live fact‑check of AI‑built code.

JustHTML landed with a bang: a tiny Python library that reads messy webpages, built with coding agents (AI helpers) and boasting “100%” of the big HTML5 test suite, zero add‑ons, and even a search-by-CSS feature. The dev’s saga—17 steps, a pit stop rewriting the tokenizer in Rust, then a crisis over whether the world needs another parser—had readers hooked. Early hype from simonw called it “neat” and fully-tested, but the celebration didn’t last.

Commenters lit the fuse. minusf questioned originality, calling it a likely Python port of Rust’s html5ever and demanding clearer credit. Aloisius ran the tests and reported only 88.6% passing, poking holes in the “100%” story with example errors. furyofantares dunked on the blog’s AI-flavored formatting (“17 tiny sections?”). Amid the fire, one brave soul asked for a Postgres database plug‑in—because why not parse the web inside your spreadsheets. The author’s cheeky nod to the “adoption agency algorithm” and its “Noah’s Ark” rule (keep only three of a kind) spawned two‑by‑two memes, but the thread’s vibe was clear: cool demo, but receipts, please. Fans love the hustle and the no-dependency design; skeptics want real numbers, clearer credit, and fewer AI vibes in the write‑up.

Key Points

•JustHTML is a pure-Python HTML5 parser with zero dependencies and a CSS selector query API.
•The author claims JustHTML passes 100% of the html5lib test suite using extensive test-driven development.
•Complex HTML5 parsing challenges are addressed, notably the adoption agency algorithm and the Noah’s Ark clause.
•Development used coding agents (GitHub Copilot Agent mode, Claude Sonnet 3.7) with automated iteration against html5lib-tests.
•Performance work included a Rust-based tokenizer to slightly surpass html5lib speed, and consideration of html5ever, with a pivot to a pure-Python approach.

Hottest takes

"isn't this more like a port of `html5ever` from rust to python using LLM" — minusf

"I'm not seeing 100% pass rates." — Aloisius

"Is it really too much to do a little more editing of the LLM output" — furyofantares

December 18, 2025

Code by bots, rage by humans

AI-made HTML parser drops; praise, 'port' accusations, and a 'not 100%' test fight

Key Points

Hottest takes

December 18, 2025

Code by bots, rage by humans

How I wrote JustHTML, a Python-based HTML5 parser, using coding agents

AI-made HTML parser drops; praise, 'port' accusations, and a 'not 100%' test fight

Key Points

Hottest takes

Save News