April 13, 2026

Arcade-mode hacker, real-world drama

Evaluation of Claude Mythos Preview's cyber capabilities

First AI to finish a fake company hack ignites cheers, side‑eye, and a stats war

TLDR: Claude Mythos Preview is the first AI reported to finish a full simulated company hack in 3 of 10 tries, but the thread devolved into hype vs. “meh.” Commenters battled over cost (~$10K/run), missing stats, and easy tests, while some insisted this marks real, useful autonomous hacking.

Anthropic’s new “Mythos” just got stamped a level‑up by the AI Security Institute: it chained dozens of steps to “take over” a fake corporate network in 3 of 10 runs and scored 73% on expert hacking puzzles. Cue the comments section going nuclear. One camp is hyped; the other’s yelling scoreboard.

Skeptics like thepasch squinted at the graphs and basically said, meh. They questioned the “stepwise” hype and compared it to rivals (“isn’t GPT‑5 already here?”), while jokers dubbed Mythos a CTF speedrunner that might just be farming points. Supporters clapped back, with others noting Mythos is the first to finish the mega‑range and that average progress (22/32 steps vs. Opus 16) actually matters. The vibes: breakthrough vs. bragging rights.

Then came the wallet shock. cbg0 pegged it at “around $10K” per successful takeover attempt at today’s prices, sparking memes like “press corporate card to pwn.” Methodology hawks piled on: no active defenders, static targets, and no penalty for noisy moves—aka arcade mode hacking. Cynddl slammed the post for missing confidence intervals and linked a NeurIPS review on shaky evals here. Meanwhile, boosters like lebovic argued we’ve crossed the ‘useful autonomy’ line, even if Mythos flubbed the ‘Cooling Tower’ test—which birthed the meme: “cold tower, hot takes.”

Key Points

  • AISI evaluated Anthropic’s Claude Mythos Preview for cybersecurity, noting improved performance over prior frontier models.
  • On expert-level CTF tasks, previously unsolved before April 2025, Mythos Preview achieved a 73% success rate.
  • In the 32-step TLO corporate network simulation, Mythos Preview completed the full sequence in 3/10 attempts and averaged 22 steps.
  • Claude Opus 4.6 was the next best on TLO, averaging 16 steps, indicating a performance gap in multi-step attack execution.
  • Tests used a 100M token budget and lacked active defenders and penalties for alerts, limiting conclusions about performance on well-defended systems.

Hottest takes

"those charts don’t look… particularly impressive" — thepasch
"around $10K for a full network takeover" — cbg0
"crossed the threshold of meaningfully useful capabilities" — lebovic
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.