Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction

Startup says it tames messy PDFs; fans cheer, skeptics ask how it's different

TLDR: Pulse says it can turn messy documents into auditable, accurate text by mixing vision and OCR. Comments split between a customer praising handwriting gains over Datalab and skeptics asking how it differs from YC rival Extend while others claim future AI will make tools like this unnecessary.

Pulse hit Hacker News promising to turn chaotic PDFs, scans, and tables into clean, trustworthy text. The founders say they mix “see the page” vision with old-school OCR (optical character recognition), separate layout from reading, and tie every extracted number back to its spot on the page. They even dropped tough examples: a 10K, a newspaper, and a rent roll. That’s the pitch — but the comments were the real show.

First came polite applause, then the knives: is this just another YC twin? One user asked how it differs from Extend, another wondered if it’s basically docling with better marketing. Then a plot twist — a real customer chimed in saying Pulse beat Datalab on messy handwriting. Suddenly, the vibe shifted from skepticism to “okay, maybe this thing slaps.” And just as the confetti fell, a futurist crashed the party: “AI will do all this natively.” Cue jokes about PDF PTSD and tables that lie. The mood: pragmatists love the auditing and “show your work” approach; prophets predict the category vanishes soon. For non-tech folks: Pulse is trying to make machines read documents accurately and show their uncertainty, not just guess pretty text.

Key Points

  • Pulse launched a hybrid document extraction system combining VLMs, OCR, and computer vision to produce LLM-ready text.
  • The approach separates layout analysis from language modeling, normalizing documents before schema mapping.
  • Extraction is constrained by predefined schemas and values are tied back to source locations for auditability.
  • The system targets real-world challenges: long PDFs, dense tables, mixed layouts, low-fidelity scans, and numeric accuracy.
  • Pulse offers a usage-based API/platform with examples (10-K, newspaper, rent roll) and acknowledges limitations on degraded scans/handwriting.

Hottest takes

"How is this different from Extend(Also YC)?" — asdev
"It’s results were better than Datalab from our tests, especially in the handwriting category" — throw03172019
"AI models will do all this natively" — mikert89
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.