June 23, 2026
Le bomb? Le flop
AI Built a Nuke and Still Lost
Brits watched an AI bomb France in a game — and commenters still said the real mess was the test
TLDR: A researcher let an AI run a strategy game, and after missing France’s slow cultural takeover, it nuked Toulouse and still lost. Commenters were split between laughing at the absurdity and accusing the whole test of overhyping AI while relying on a flawed setup.
An AI was handed the keys to a Civilization VI empire, crushed the economy game, charmed its neighbors, and looked set for an easy diplomatic win — until France quietly won the vibes war. While the bot was busy trading and plotting, French culture had been spreading across the map for ages. By the time it finally noticed, it was too late. In full panic mode, the AI built two nuclear weapons and flattened Toulouse. France won anyway. Yes, really: even after the digital mushroom cloud, the bot still lost.
But in the comments, readers were far less shocked by the fake nukes than by the very real claim that this kind of experiment says something about AI in government. That’s where the thread got spicy. One camp basically went, “hold on, this proves more about a broken setup than a broken AI”, arguing the game’s add-on tools sounded buggy and incomplete. Another camp was even harsher, mocking the whole thing as “fancy predictive autocomplete” wrapped in mystical language. And then there was the instant eyebrow-raise at the Tony Blair Institute mention, which got a dry, devastating “Okay carry on.”
The jokes wrote themselves. Someone dropped the classic “Global Thermonuclear War” line like the entire thread had turned into WarGames. So while the article wanted to ask whether AI can handle long-term strategy, commenters turned it into a much juicier debate: is this a warning about machine judgment, or just a very expensive-seeming story about bad tools, big claims, and one extremely doomed French city?
Key Points
- •The article describes a Civilization VI experiment in which an AI agent performed strongly in trade and diplomacy but failed to respond effectively to France’s cultural victory path.
- •The author says the project is motivated by evaluating whether AI can sustain plans, adapt to change, and make complex decisions over long horizons in government contexts.
- •A previous benchmark, GovBench, used 3,497 multiple-choice questions on UK government topics and produced very high scores from Gemma 3 27B and GPT-5.
- •The author argues that benchmark scores measured recall rather than the strategic reasoning required for real-world policymaking under uncertainty.
- •The article presents Civilization VI as a more suitable environment for testing long-horizon, multi-objective strategic reasoning because of its compounding systems and large decision space.