November 29, 2025
Reviewer #2 is a robot?
Major AI conference flooded with peer reviews written by AI
AI wrote 21% of peer reviews and the comments are chaos
TLDR: A major AI conference found about 21% of peer reviews were written by AI, sparking a fierce debate. Some commenters shrugged that it’s surprisingly low, while others slammed AI detectors as unreliable and biased. The fight now is over trust: who should judge AI—humans, tools, or AI itself?
An elite AI conference just discovered its own peer reviews were… written by AI. Pangram Labs scanned submissions to ICLR and says 21% of reviews were fully AI-generated, with over half showing AI fingerprints. Cue internet meltdown. The shocker? Many commenters weren’t outraged—it was the opposite. The top vibe: “Wait, only 21%?” Folks expected a robot takeover and instead got more like a robot internship.
Then the drama hit: skeptics roasted the detection itself. Some called the analysis “PR for a tool vendor,” warning that AI detectors can be wrong, biased, and unfair. One commenter didn’t mince words, labeling the product “garbage.” Meanwhile, researchers posted war stories: long, bullet-pointed reviews that demanded weird stats and cited papers that never existed. CMU’s Graham Neubig even offered a reward on X to help prove the reviews were bot-made.
Philosophy class crashed the party too: If an AI conference is about AI, should robots get a seat at the table? Cue memes of “Reviewer #2 = ChatGPT” and jokes about snake-eating-its-tail peer review. Organizers say they’ll now use automated tools to police AI use, but the community is split between “this is fine,” “this is fraud,” and “this is the future.”
Key Points
- •Pangram Labs analyzed 19,490 studies and 75,800 peer reviews submitted to ICLR 2026.
- •About 21% of ICLR peer reviews were flagged as fully AI-generated; over half showed signs of AI use.
- •Pangram’s tool also found 199 manuscripts (1%) to be fully AI-generated; 9% had more than 50% AI-generated text.
- •ICLR organizers will deploy automated tools to check for AI policy breaches in submissions and reviews.
- •Researchers, including Graham Neubig and Desmond Elliott, reported suspect reviews; Elliott’s flagged review gave the lowest rating.