Mistral OCR 4

Mistral’s new text-reading bot drops, and the comments are already judging everything

TLDR: Mistral launched a new document-reading tool that promises better accuracy, page layout detection, and support for 170 languages at a low price. Commenters were split between big praise from real-world users, sticker-shock amazement, and the usual launch-day skepticism over benchmarks, rivals, and even the website certificate.

Mistral showed up with OCR 4, its new tool for reading documents, spotting where text lives on a page, labeling chunks like tables or signatures, and even telling you how sure it is about each word. In plain English: it’s meant to turn messy files into something businesses can actually search, quote, and feed into their AI systems. Mistral is bragging hard, too — saying human reviewers picked it over big-name rivals most of the time, and that it works across 170 languages while also being small enough to run on your own servers. Translation: less sending sensitive paperwork out into the cloud, more “we can keep this in-house.”

But the real action was in the community, where the reactions swung from genuine applause to instant nitpicking. One user basically gave the sequel an unsolicited standing ovation, saying the previous version handled 55-year-old degraded paper files so well that Abbyy FineReader “didn’t even come close.” That’s the kind of comment that starts fan wars. Another commenter immediately yanked the party into chaos with the digital equivalent of checking the fire exits: “Is there something wrong with their certificate?” Nothing kills launch-day glamour faster than browser trust issues.

Then came the price detectives and benchmark skeptics. One person did the math and gasped at $4 for 1,000 pages, while another wanted a showdown with Unlimited-OCR. And in peak engineer-comment-section fashion, somebody skipped the hype entirely and asked the painfully practical question: can it actually turn old plots into usable X,Y data? In other words, the crowd reaction was classic tech internet: half impressed, half suspicious, and fully ready for a cage match.

Key Points

  • Mistral OCR 4 adds structured document outputs including bounding boxes, typed block classification, and inline confidence scores alongside extracted text.
  • The model supports 170 languages across 10 language groups and accepts formats such as PDF, DOC, PPT, and OpenDocument.
  • Mistral says OCR 4 can run in a single container for fully self-hosted enterprise deployments with data residency and compliance needs.
  • The article states OCR 4 is integrated with the Mistral Search Toolkit for ingestion, retrieval, and evaluation workflows in enterprise search and RAG.
  • Mistral reports benchmark results including a top OlmOCRBench score of 85.20, average annotator preference win rates of 72%, and pricing starting at $4 per 1,000 pages via API.

Hottest takes

"Abbyy Finereader ... didn’t even come close" — Ducki
"Is there something wrong with their certificate?" — jppope
"1000 pages for $4? damn" — ge96
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.