Scientific production in the era of large language models [pdf]

AI flood hits science: shiny papers, shaky substance, and comment wars

TLDR: Huge preprint study says AI boosts paper output and polish but makes fancy writing a bad sign for quality. Comments split between “AI democratizes science” and “robot‑wielding bullshitters will hijack careers,” pushing to decouple publishing from rewards and fix how good research gets found.

A new study of millions of research preprints says AI writing tools are turbo-charging paper output, helping non‑native English speakers, and widening the mix of citations—while making polished language a terrible shortcut for judging quality. Cue the comment chaos. One top take: barishnamazov declares a “reversal” in how we read science—glossy prose no longer signals good work, it might just mean “ChatGPT did the writing.” Others worry the paper’s warning is bigger than style. hirenj fears the career ladder will be hijacked by “robot‑weilding bullshitters,” unless we decouple publishing from rewards. felipeerias adds that writing papers wasn’t the same as doing science even before AI, urging new ways to surface real breakthroughs “before we all drown in the noise.” There’s nerd comedy too: conditionnumber applauds the data appendix, marvels they pulled arXiv metadata from Kaggle, and jokes it’d be about $1,000 to run the whole thing through an AI. Meanwhile, ggm dunks on the title: “It’s about scientific PAPER production.” The mood? Torn between AI as a democratizer and AI as a fog machine. The study’s big ask for policy‑makers lands hard: if language isn’t a reliable quality signal anymore, the rules of science may need a rewrite.

Key Points

  • The study analyzes 2.1 million preprints from arXiv, bioRxiv, and SSRN spanning January 2018 to June 2024.
  • LLM usage is found to accelerate manuscript output, reduce barriers for non-native English speakers, and diversify discovery of prior literatures.
  • A text-based AI detection algorithm compares token distributions of human-written and LLM-rewritten abstracts to identify probable LLM assistance.
  • GPT-3.5-turbo-0125 was used to rewrite pre-2023 abstracts, establishing LLM-written token distribution for detection.
  • Traditional signals of scientific quality, such as language complexity, are becoming unreliable, prompting calls for policy adaptation of scientific institutions.

Hottest takes

"The key finding here is the reversal of the relationship between writing complexity and paper quality" — barishnamazov
"we are going to lose access to both the careers (to robot-weilding bullshitters) and even worse, the shared space where scientific communication took place" — hirenj
"Really badly named article at source. Scientific PAPER production in the era of..." — ggm
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.