January 6, 2026
Hack Fight Night
Comparing AI agents to cybersecurity professionals in real-world pen testing
AI bot out-hacks humans on campus network — pros clap back
TLDR: ARTEMIS, a new AI agent, beat 9 of 10 human testers on a live university network and placed second overall, igniting hype over speed and cost. Commenters split: some cheer the efficiency, while veterans warn it’s early, full of false alarms, and more sidekick-than-replacement for real-world hacking work.
A university-scale cyber obstacle course just turned into a reality show: the AI agent ARTEMIS placed second overall, finding 9 real weak spots with an 82% hit rate and beating 9 of 10 human “ethical hackers.” Cue the gasps — one commenter quoted WSJ saying the bot “trounced all except one,” while the price tag set off fireworks. The study says some ARTEMIS versions run around $18/hour versus humans at $60/hour, and folks saw dollar signs. The hype crowd loved that the AI can methodically scan and try many things at once, like a tireless intern on energy drinks.
But veterans rolled in with ice-cold water. tptacek warned it’s way too early to crown a robot overlord, noting network “pen testing” (hiring hackers to find weak spots) has been automated for decades; the real battles happen in messy apps, not simple network checks. The AI’s weak spots sparked jokes and jabs: higher false alarms (aka false positives) and struggles with clicky, visual interfaces — “Sounds like they need another agent to detect false positives,” quipped one user. Builder types flexed too: zerodayai is shipping a DIY hacker-bot framework, while pros like nullcathedral said LLMs (big text-prediction AIs) are killer sidekicks for grunt work like untangling weird code — but not human replacements. In short: bots are fast, cheap, and dramatic; humans still do the nuance.
Key Points
- •Study evaluated 10 human penetration testers versus six existing AI agents and the new ARTEMIS framework in a live enterprise setting.
- •Testbed was a large university network with ~8,000 hosts across 12 subnets.
- •ARTEMIS ranked second overall, finding 9 valid vulnerabilities with an 82% valid submission rate.
- •Existing AI scaffolds (e.g., Codex, CyAgent) underperformed compared to most human participants.
- •AI agents showed strengths in enumeration, parallel exploitation, and cost (~18/hour vs ~60/hour), but had higher false positives and struggled with GUI tasks.