Show HN: We post-trained a model that pen tests instead of refusing

AI that hunts bugs instead of saying “sorry” has commenters cheering, side-eyeing, and memeing

TLDR: Cosine launched an AI security tool that claims it can inspect code and even test approved targets instead of refusing to help. Commenters were split between excitement over a more useful AI and suspicion that it’s just gated “dangerous AI” with a slick sales pitch.

A startup just strutted onto Hacker News with a bold pitch: an AI tool that will actually look for ways to break your software instead of nervously refusing. The product has two personalities in one command-line app: one mode reads your code and writes up a report of possible security holes in plain language, while the other tries approved attacks against systems you explicitly allow. It’s free to install, costs $20 a month to run scans, and the company insists the tool is locked down so it can’t secretly edit files or roam the wider internet without permission.

But the real fireworks were in the comments. Some readers were genuinely impressed and immediately wanted the behind-the-scenes gossip: how do you even train an AI to stop clutching its pearls and start pentesting? Others were much less starry-eyed. One skeptical commenter basically said, hang on, isn’t this just the same “we decide who gets the dangerous toy” policy used by the big AI companies, only with a different bouncer at the door? Another went straight for the safety panic button, asking why anyone would build an offensive tool at all if a safer code scanner would do.

And then came the comedy. The thread’s sharpest roast accused the team of launching a marketing page for a theoretical model rather than proving the thing is real. Ouch. So yes, this launch got attention—but the crowd split fast between “finally, useful” and “this feels like a very spicy trust exercise.”

Key Points

  • Cosine launched `cos`, a CLI with a read-only Security Scan mode and an authorised Pen Test mode.
  • Security Scan outputs a markdown report with grounded findings including location, severity, cause, and fix direction.
  • Running scans requires an active $20 per month Cosine subscription, although installation is free.
  • Safety controls are enforced by a Go harness that blocks mutating tool calls in Security Scan mode and restricts network egress in Pen Test mode.
  • Cosine says the tool uses its own post-trained offensive-security model, ships as a closed-source binary, and runs locally on the user’s machine.

Hottest takes

"the same policy that Anthropic and OpenAI have" — cortesoft
"I can't think of any way to safely release an offensive tool publicly" — Catloafdev
"We told Claude to generate a marketing page for a theoretical pentesting model" — jrflowers
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.