14× faster embeddings: how we rebuilt the ONNX path in Manticore

Manticore’s search tool got wildly faster, and the comments turned into a speed war

TLDR: Manticore says it made its automatic AI text processing about 14 times faster, a huge deal for anyone loading lots of data. The comments were impressed but instantly turned competitive, with people arguing this was obvious, demanding better models, and debating whether even more speed is still being left on the table.

Manticore just dropped a major speed upgrade for its automatic text-matching feature, claiming it now runs about 14 times faster on the same machine after rebuilding its model system around ONNX — a common format for AI models. In plain English: the database can now turn text into searchable AI fingerprints much faster, which means uploads and inserts move from painfully slow to actually useful. But the real spectacle was in the community reaction, where the comments instantly became a mix of victory lap, armchair coaching session, and low-key nerd cage match.

One of the loudest vibes was basically: “Well, duh — ONNX is the first thing people suggest if you want CPU speed.” That gave the whole announcement an amusing undertone of “congrats on discovering what the comment section already knew.” Then came the next wave of hot takes: sure, it’s faster, but is the model itself still good enough? One commenter immediately pivoted from celebration to “we need a better replacement” for the current popular model, kicking off the classic tech-world move of treating a huge win like a stepping stone to the next complaint.

And then came the optimization purists. Another commenter warned that piling on bigger batches can actually backfire on regular processors, arguing the real magic is in using chip-specific tricks. Translation: even after a 14x glow-up, the community still found a way to say, “nice, but you could go harder.” The mood was half impressed, half impossible-to-please — which, honestly, is how you know the launch mattered.

Key Points

  • Manticore rebuilt its embedding inference path around ONNX Runtime and released it in Manticore Search 27.1.5.
  • The company reports the new ONNX backend is about 14× faster on average than the previous SentenceTransformers-on-Candle path on the same 16-core, 32-thread server.
  • In tests with all-MiniLM-L12-v2, the old path stayed at 5–11 docs/sec while the new path reached 70–230 docs/sec across thread and batch configurations.
  • Reported single-insert latency dropped to about 14 ms with one client and 56 ms under 8-way concurrency, versus more than 200 ms for the older Candle path.
  • Manticore says the two most important implementation changes were disabling intra_op_spinning and removing document batching inside the worker, with no user-facing API changes required for ONNX-capable models.

Hottest takes

"ONNX is my first suggestion" — electroglyph
"We really need a replacement" — minimaxir
"batching inference won’t necessarily give you a speed boost" — ducviet00
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.