Show HN: Open-source playground to red-team AI agents with exploits published

Open season on chatbots: first testers say the “judge” wobbles—bring a leaderboard

TLDR: Fabraix launched an open arena where people try to break real AI agents—and publish how they did it—to harden defenses. Early reaction: a tester says AI judging AI feels shaky and begs for a leaderboard, setting the tone for a competitive, public push to make AI safer and more trustworthy.

Fabraix just lit the fuse with the Playground, an open arena where real AI agents are unleashed and the internet is invited to break them—then publish the exploit. It’s not a demo; it’s a gladiator pit for bots. System prompts are public, winning hacks get written up, and every new takedown forces tougher defenses. Think “bug bash,” but for robot sidekicks.

The early vibe? Competitive chaos, please. One tester showed up swinging with clever encoding and language-switch tricks, only to report they “didn’t work so far.” But the real spice was the critique: using an AI as the referee might be a shaky defense. That’s got folks imagining hilarious scenes of bots grading other bots—cue the memes about “robots policing robots” and “fox guarding the henhouse.” And yes, the community mood is crystal clear: give us a leaderboard. If there’s a countdown clock and a crown for the quickest jailbreak, the hackers will come running.

While Fabraix frames this as a trust-building move—“break our agents so we can make them better”—the drama writes itself: transparency purists cheering the open docs, cautious voices side-eyeing the idea of publishing exploits, and everyone else chanting, “Make it a game.” It’s the Hunger Games for chatbots, and the audience wants popcorn.

Key Points

•Fabraix launched an open, community-driven Playground to red-team live AI agents with visible system prompts.
•Challenges are proposed and voted on by the community; the top challenge goes live with a timer, and the fastest successful jailbreak wins.
•Winning jailbreak techniques, including approach and reasoning, are published to improve defenses and shared understanding.
•Challenge configs and prompts are versioned openly; guardrail evaluation runs server-side to prevent tampering, and the agent runtime will be open-sourced separately.
•The project uses a React/TypeScript/Vite/Tailwind frontend, can run locally via npm, and invites contributions and discussion via Discord.

Hottest takes

"llm as a judge is a very fragile defence" — hellocr7

"Would be cool to add a leaderboard" — hellocr7

March 15, 2026

Bots enter, chaos ensues

Open season on chatbots: first testers say the “judge” wobbles—bring a leaderboard

TLDR: Fabraix launched an open arena where people try to break real AI agents—and publish how they did it—to harden defenses. Early reaction: a tester says AI judging AI feels shaky and begs for a leaderboard, setting the tone for a competitive, public push to make AI safer and more trustworthy.

Key Points

Hottest takes

March 15, 2026

Bots enter, chaos ensues

Show HN: Open-source playground to red-team AI agents with exploits published

Open season on chatbots: first testers say the “judge” wobbles—bring a leaderboard

Key Points

Hottest takes

Save News