Autoresearch for SAT Solvers

Self-taught AI claims puzzle crown — skeptics say it “borrowed tricks”

TLDR: A self-taught AI coordinated via Git solved 220/229 tough MaxSAT puzzles and beat some 2024 scores. Commenters split between “borrowed tricks” skepticism (citing missing solvers like Z3), questions about what “cost” means, and alternative ideas like AlphaDev — making the win big, but the debate bigger.

An AI “intern” just blitzed a set of brutal logic puzzles (MaxSAT) by teaching itself new moves, coordinating across cloud machines, and pushing updates via Git — no humans in the loop. It solved 220 of 229 challenges, beat a few 2024 competition scores, and even cracked one nobody had before. The devs bragged numbers; the internet brought drama.

Skeptics pounced first. One voice argued the AI may have learned from a popular solver that wasn’t in the 2024 contest, throwing shade that the “novel” techniques weren’t so novel. Another chimed in with academic receipts, name-dropping university research on AI agents tuning these tools — translation: “cool, but you’re not the only ones on this.” Meanwhile, a confused reader asked what “our cost” even means, and a helpful reply cut through the fog: it’s just the quality score of the best solution found vs. the best known score — lower is better, think golf, not speed.

Then came the hot take: maybe AlphaDev — a system for discovering algorithms — would be a better play here. Cue the vibes: part “AI just speedran optimization,” part “show your sources,” with a side of “explain the scoreboard like I’m five.” It’s a victory lap with footnotes — and comment-section nitpicks keeping pace.

Key Points

  • An autonomous AI agent tackled 229 weighted MaxSAT instances from the 2024 MaxSAT Evaluation without human guidance.
  • The system coordinates via a shared GitHub repo; multiple agents iterate, share solutions and knowledge, and synchronize through git.
  • Deployment script automates setup on Amazon EC2, including dependency installation, benchmark download, and launching agents in tmux.
  • Results: 220/229 solved; 29 optimal, 4 better than competition bests, 1 novel solve; 9 remained unsolved due to scale or lack of references.
  • Discovered techniques include greedy SAT with selector variables, core-guided UNSAT core relaxation, WPM1, biased-SAT, clause-weighting local search (SATLike), tabu search, and multi-init with CaDiCaL and glucose4.

Hottest takes

“the agent picked up on techniques from Z3” — ericpauley
“AlphaDev might be a better approach” — ktimespi
“its just comparing the cost of the best solution found” — chaisan
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.