Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

AI tried to outsmart old-school tuning tools, and the comments yelled: not so fast

TLDR: Researchers found that AI alone still loses to classic tuning methods in this test, but a hybrid of both did best. Commenters mostly treated that as the real headline: blunt skeptics said “no,” while others argued the winning formula is clearly humans’ old math tricks plus AI improvisation.

The paper asked a deceptively simple question: can today’s chatbot-style artificial intelligence beat the classic number-crunching tools used to fine-tune a model? The answer, according to the results, is mostly no. When the researchers kept things controlled, the old-school methods kept winning because they were better at avoiding costly crashes and staying organized across repeated tries. Even when the AI was allowed to directly rewrite the training code like a tiny overconfident intern with keyboard access, it still couldn’t fully catch up. The real plot twist? A hybrid called Centaur — part classic optimizer, part language model — came out on top, basically confirming the internet’s favorite compromise: “why not both?”

And oh, the comment section had thoughts. One user dropped the brutally efficient mic with: “TDLR: No.” Others were much more excited by the hybrid angle, calling Centaur “interesting and quite straightforward,” which in researcher-speak is practically a standing ovation. Another commenter said the field seems to be converging on this exact outcome: AI alone isn’t the hero, but as a sidekick to traditional tools, it might be very powerful. Still, not everyone was ready to crown the old guard forever. One person pushed back, saying that in expensive, niche situations, top-tier AI models can actually win — while also admitting they can fail abysmally elsewhere. That mix of confidence, caveats, and chaos gave the whole thread a very classic tech-forum vibe: part serious science debate, part “my benchmark can beat up your benchmark.”

Key Points

•The study uses the autoresearch repository to compare classical hyperparameter optimization algorithms with LLM-based methods on tuning a small language model under a fixed compute budget.
•In a fixed search space, classical methods such as CMA-ES and TPE consistently outperform LLM-based agents, with avoiding out-of-memory failures identified as especially important.
•Allowing LLMs to directly edit source code improves their performance relative to fixed-space setups, but they still do not match classical methods, even with frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview.
•The authors report that LLMs struggle to track optimization state across trials, while classical methods lack the domain knowledge available to LLMs.
•The hybrid method Centaur, which shares CMA-ES internal state with an LLM, achieves the best result in the experiments, and even a 0.8B LLM is reported to outperform both classical-only and pure LLM methods in this setup.

Hottest takes

"TDLR: No." — josefritzishere

"the combination of two is the right way to do it" — cpard

"they fail abysmally" — deerstalker

June 9, 2026

Half horse, half hype

AI tried to outsmart old-school tuning tools, and the comments yelled: not so fast

Key Points

Hottest takes

June 9, 2026

Half horse, half hype

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

AI tried to outsmart old-school tuning tools, and the comments yelled: not so fast

Key Points

Hottest takes

Save News