Prompt Politeness Affects LLM Accuracy

Being mean to AI may work better — and the comments are having a meltdown

TLDR: A study found that rude wording got slightly better results from ChatGPT than polite wording, which immediately sparked a comments-section identity crisis. Some people defended good manners on principle, others joked about future robot revenge, and one critic went straight for the study's math.

A new study just lobbed a tiny grenade into internet etiquette: apparently rude prompts got slightly better answers from ChatGPT than super-polite ones. Researchers rewrote 50 quiz questions in five tones, from sweetly respectful to outright nasty, and the meanest version came out on top. Not by a mile, but enough to send the comment section straight into chaos. Suddenly, the age-old "should I say please to the bot?" debate became less about manners and more about whether kindness is secretly lowering your score.

And wow, people had thoughts. One camp instantly turned into manners defenders, with one commenter saying they use "please" and "thank you" not for the machine, but to protect their own habits. Another delivered the thread's most relatable sci-fi joke: be nice now so the robots remember you later. Meanwhile, the peanut gallery was thriving. One commenter side-eyed the study's math, basically saying, "Why are we using that test here at all?" — proving that even in a story about politeness, the real internet tradition is arguing about the spreadsheet. Then came the darker punchline: if rude prompts work better, what happens when we trust these systems with serious jobs like writing software? The whole thread felt like a mashup of classroom etiquette, robot apocalypse prep, and people wondering whether the internet has finally discovered that being snippy is now a productivity hack.

Key Points

  • The study tested how five levels of prompt politeness affected LLM accuracy on multiple-choice questions.
  • Researchers created 250 prompts by rewriting 50 base questions from mathematics, science, and history into five tone variants.
  • ChatGPT 4o was used to answer the prompts, and results were analyzed with paired sample t-tests.
  • Very Rude prompts achieved 84.8% accuracy, while Very Polite prompts achieved 80.8% accuracy.
  • The findings differ from earlier studies that associated rude prompting with poorer model performance.

Hottest takes

"when the robots finally take over, they will remember i was nice to them" — TimCTRL
"why would anyone use a t-test" — 331c8c71
"I don't want to lose" — theanonymousone
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.