Show HN: Autofit2 – End-to-end pipeline for multilingual text classification

Tiny training data, 50+ languages, and commenters asking: wait, isn’t this just SetFit?

TLDR: Autofit2 promises an almost one-click way to build text classifiers in dozens of languages using very little sample data. Commenters immediately zeroed in on the same question: is this a real step forward, or just SetFit with a nicer pipeline and packaging?

A new Show HN project called Autofit2 is pitching a big promise in very plain terms: give it a small pile of labeled text, and it can crank out text-sorting models across 50+ languages with one config file. The creator touts sky-high accuracy, automated training, reusable runs, model cards, and even CO₂ tracking for the eco-conscious among us. In normal-person language, it’s a machine that helps computers sort messages, reviews, or support tickets into categories without needing mountains of training data.

But the real action is in the comments, where the first big reaction was basically: “Cool… but how is this different from SetFit?” That was the main plot twist. Instead of instant applause, the thread opened with a polite but pointed side-eye from a user who said the Hugging Face version already works well, especially for multilingual tasks. Even juicier, they casually dropped that they trained only on English and got Polish and German working “for free,” which is the kind of flex that can make a whole launch thread feel like an accidental comparison test.

So the vibe wasn’t “this is fake,” but more “prove why this deserves its own spotlight.” The humor came from that classic Hacker News energy: someone shows up with a shiny new pipeline, and the crowd immediately turns into product detectives. The subtext? People love the idea of simpler AI tools, but they really love asking whether it’s genuinely new or just a prettier wrapper around something they already use.

Key Points

  • Autofit2 is presented as a fully automated few-shot text classification pipeline built on SetFit and SBERT embeddings.
  • The article claims 95–99% precision with only a few dozen labeled examples.
  • It supports more than 50 languages, with pretrained models for 20 languages and evaluation corpora for 50+, and is described as scalable to 100+ via Common Crawl.
  • The workflow is driven by a single JSON configuration covering preprocessing, fine-tuning, evaluation, and deployment.
  • Outputs include deployable model archives, generated model cards, bias evaluation details, and CO₂ emissions tracking.

Hottest takes

"How does this differ from SetFit?" — nmstoker
"I found the HF version pretty effective" — nmstoker
"translations of our intents tended to work 'for free'" — nmstoker
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.