Learnings from training a font recognition model from scratch

DIY font finder sparks designer cheers, purist rage, and dev demands for receipts

TLDR: An engineer launched Lens, a fast, open-source tool that maps images to the closest Google Font. Designers beg for exact font IDs while devs demand architecture details and global script support, turning a neat demo into a debate over accuracy, transparency, and whether “close enough” is good enough.

An everyday coder just dropped “Lens,” a DIY font finder that maps any image to the closest Google Font in about 2–3 seconds, no fiddly letter selection needed. They open-sourced it (GitHub) and even explain that a “model” is really a full pipeline — image cleanup, OCR (optical character recognition) to find words, then classification and name mapping — not just a magic file. Cool story, but the comments? Absolute typography theater.

Designers stormed in cheering and jeering. One called mainstream tools “liars” and demanded exact font IDs, not just lookalikes, while others loved the fast, free-only promise. The biggest spat: close match vs. the actual font. Lens favors open-source lookalikes; purists want the precise commercial face used on the poster. Devs chimed in too: “Where’s the architecture breakdown?” asked the code detectives, poking for model details and benchmarks. International users asked if it can handle non‑Latin scripts and brand‑new releases. Memes flew: “Comic Sans witness protection,” “Papyrus patrol,” and a proposed Lens vs. Adobe Retype cage match. Supporters dropped a respectful “rewarding repo” and linked the font finder page; skeptics want roadmaps for multilingual support and exact-match hunting. Verdict: the tool slaps, the comment section slaps harder.

Key Points

  • The author built a font recognition model, Lens, to automatically map images to the closest Google Font without manual letterform selection.
  • Lens returns results in about 2–3 seconds and works across various font weights, styles, and image qualities.
  • The full source code for the model and its inference stack is publicly available on GitHub.
  • The author emphasizes that a practical “model” includes an entire pipeline: image retrieval, cleanup, OCR detection, cropping, classification, and output mapping.
  • A key lesson is to design model inputs and outputs carefully, with scope and pipeline steps clearly defined.

Hottest takes

"Gemini and its competitors flat out lie" — Tommix11
"recognise font in a different language?" — hank_z
"why the model architecture wasn’t talked about at all?" — codemog
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.