March 18, 2026
Git-fueled solver smackdown
Autoresearch for SAT Solvers
Self-taught AI claims puzzle crown — skeptics say it “borrowed tricks”
TLDR: A self-taught AI coordinated via Git solved 220/229 tough MaxSAT puzzles and beat some 2024 scores. Commenters split between “borrowed tricks” skepticism (citing missing solvers like Z3), questions about what “cost” means, and alternative ideas like AlphaDev — making the win big, but the debate bigger.
An AI “intern” just blitzed a set of brutal logic puzzles (MaxSAT) by teaching itself new moves, coordinating across cloud machines, and pushing updates via Git — no humans in the loop. It solved 220 of 229 challenges, beat a few 2024 competition scores, and even cracked one nobody had before. The devs bragged numbers; the internet brought drama.
Skeptics pounced first. One voice argued the AI may have learned from a popular solver that wasn’t in the 2024 contest, throwing shade that the “novel” techniques weren’t so novel. Another chimed in with academic receipts, name-dropping university research on AI agents tuning these tools — translation: “cool, but you’re not the only ones on this.” Meanwhile, a confused reader asked what “our cost” even means, and a helpful reply cut through the fog: it’s just the quality score of the best solution found vs. the best known score — lower is better, think golf, not speed.
Then came the hot take: maybe AlphaDev — a system for discovering algorithms — would be a better play here. Cue the vibes: part “AI just speedran optimization,” part “show your sources,” with a side of “explain the scoreboard like I’m five.” It’s a victory lap with footnotes — and comment-section nitpicks keeping pace.
Key Points
- •An autonomous AI agent tackled 229 weighted MaxSAT instances from the 2024 MaxSAT Evaluation without human guidance.
- •The system coordinates via a shared GitHub repo; multiple agents iterate, share solutions and knowledge, and synchronize through git.
- •Deployment script automates setup on Amazon EC2, including dependency installation, benchmark download, and launching agents in tmux.
- •Results: 220/229 solved; 29 optimal, 4 better than competition bests, 1 novel solve; 9 remained unsolved due to scale or lack of references.
- •Discovered techniques include greedy SAT with selector variables, core-guided UNSAT core relaxation, WPM1, biased-SAT, clause-weighting local search (SATLike), tabu search, and multi-init with CaDiCaL and glucose4.