November 5, 2025
Element of surprise
Parsing Chemistry
Dev teaches a quirky language to read molecules — cue SMILES vs SELFIES smackdown
TLDR: A coder built a tool that reads chemical formulas into neat counts, earning cheers and a swift reality check. Commenters split between loving the demo and urging real-world standards like SMILES/SELFIES and InChI, arguing the true challenge is handling messy chemistry notations — which matters for serious science apps.
A solo dev built a tiny tool that reads chemical formulas like H2O, (CH3)2, and even tricky fractional counts — and the internet did what it does best: clap, then argue. The Factor-language hack uses a grammar trick to turn strings into neat element counts. Cue the chorus: some folks shouted “insanely cool,” while others rolled in with “have you met real chemistry?” and dropped acronyms like confetti.
Standards stans arrived first. One commenter asked if SMILES — a popular way to write molecules as text — has a proper grammar, pointing to Wikipedia. Another chimed in with “use SMILES or SELFIES (a tougher, error-proof cousin)” and name-dropped PubChem lookups. Then came the rabbit-hole brigade: “What about InChI,” the official ID system, followed by links, talks, and a collective “send help.” Meanwhile, a chem hardliner threw cold water: does it handle water of hydration (like CaSO4·2H2O), gas tags like H2O(g), and keep subunits intact? Translation: parsing basic formulas is the easy part; deciding scope is the boss battle.
So yes, it’s a charming hack. But the crowd is split between “fun weekend project” and “welcome to chemical notation hunger games,” with grammar nerds and lab veterans squaring off over SMILES, SELFIES, and how deep this rabbit hole really goes.
Key Points
- •The article reimplements chemparse-like functionality in Factor to parse chemical formulas.
- •Factor’s EBNF syntax defines a PEG that supports symbols, numbers (including floats and exponents), and grouped/nested structures with parentheses and brackets.
- •A split-formula step produces a nested representation of the formula, handling optional numeric prefixes and suffixes.
- •A flatten-formula step recursively converts the parsed tree into an element-to-count associative mapping, multiplying and aggregating counts.
- •Unit tests confirm functionality on examples like H2O, (CH3)2, and C1.5O3; code is available on GitHub.