New Prompt Injection Papers: Agents Rule of Two and the Attacker Moves Second

Choose Two, Trust None: Commenters fight over AI’s new safety rule

TLDR: Meta’s “Rule of Two” says AI bots should only combine two risky powers without a human, while a major study shows attackers still beat many defenses. Commenters are split between calling it necessary brakes, useless security theater, and even cheering for hacks to punish lazy AI use.

Meta just dropped a “Rule of Two” for AI agents: your bot can only do two out of three things—read sketchy stuff, touch private data, or talk/change things outside—before a human has to step in. Meanwhile, a heavyweight study says attackers can still sidestep a dozen so-called defenses. Translation: guardrails are in, but hackers still dance around them. Read the Meta take here and the punchy attack paper here.

The comments lit up. One camp says the rule kneecaps useful AI: ares623 argues the real value needs all three powers, otherwise you add a slow human rubber-stamp that’s basically the third power “with extra steps.” The snark squad, led by r0x0r007, calls this Security Theater 2.0—if we did this to normal apps, nothing would ship, but sure, at least bots write poems. Skeptics like kubb ask how this “rule” guarantees anything; security pros like ArcHound warn it’ll become an excuse to ignore real risk work. Then there’s the chaotic neutral take: behnamoh openly hopes prompt injection stays possible to jam lazy academic reviewers using AI. Cue Star Wars “Rule of Two” memes and “pick two, pray later” jokes. Bottom line: the research says defenses are flimsy, and the crowd is split between adding brakes now or rethinking how much freedom these bots should get at all.

Key Points

  • Meta AI published “Agents Rule of Two” on October 31, 2025, proposing limits on AI agent capabilities per session to reduce prompt injection risk.
  • The Rule of Two allows agents to satisfy at most two of three properties: handling untrusted inputs, accessing sensitive data/systems, and changing state or communicating externally.
  • If all three properties are needed in one session, the agent should require supervision (e.g., human-in-the-loop) or validation and not operate autonomously.
  • Simon Willison compares the Rule of Two to his “lethal trifecta” and Google Chrome’s “Rule of 2,” noting prompt injection remains an unsolved problem and filtering is unreliable.
  • An arXiv paper dated October 10, 2025, “The Attacker Moves Second,” by authors from OpenAI, Anthropic, and Google DeepMind, evaluates 12 defenses using adaptive attacks against prompt injection and jailbreaks.

Hottest takes

“doesn’t a huge value of LLMs for the general population necessitate all 3 of the circles?” — ares623
“Ooh, right, cause we couldn’t use them and a whole industry got created that’s called cybersecurity” — r0x0r007
“I actually want prompt injection to remain possible.” — behnamoh
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.