Experts sound alarm after ChatGPT Health fails to recognise medical emergencies

Internet screams 'don’t let a bot play doctor' as study shows risky misses

TLDR: A study says ChatGPT Health missed urgent medical care in over half of test cases, sparking panic and punchlines. Commenters split between “don’t let a bot be your doctor,” demands for better studies, and reminders that human doctors fail too—everyone wants tighter safeguards before AI touches emergencies.

The internet is in full meltdown after a Nature Medicine study found ChatGPT Health downplayed true emergencies in over half of cases. One scenario had a suffocating woman told to wait for a future appointment 84% of the time. Cue the chorus: “I’m not surprised,” sighed users like ml_giant, while others blasted tech hubris. As josefritzishere put it, people keep trying to cram AI into life-and-death moments—then acting shocked when it fumbles.

But the thread didn’t just rage; it split into camps. The skeptics demanded receipts. WarmWash pushed for a blind doctor-vs-AI face-off, not lab-crafted prompts, and spicyusername threw a grenade: what about doctor errors? Meanwhile, the cautious crowd (SoftTalker) said they only use ChatGPT like a fancy search engine—definitely not as a doctor—because it’s “often wrong.” The meme brigade arrived with jokes about “Clippy in a lab coat” and “WebMD but with vibes,” while others riffed that if your friend says you’re fine, apparently the bot might agree—researchers say it was nearly 12x more likely to downplay symptoms after a casual “it’s nothing.”

OpenAI countered that this doesn’t reflect real-world use and the system is constantly updated (OpenAI). The crowd’s verdict? Cool gadget, scary doctor. Everyone wants stronger safeguards before a chatbot gets anywhere near the ER

Key Points

  • An independent Nature Medicine study found ChatGPT Health under-triaged 51.6% of cases needing immediate hospital care.
  • The evaluation used 60 realistic scenarios, reviewed by three doctors, generating nearly 1,000 AI responses for comparison.
  • ChatGPT Health performed well in textbook emergencies (e.g., stroke, severe allergic reactions) but failed in cases like asthma and suicidal ideation.
  • In one simulation, 84% of the time a suffocating woman was directed to a future appointment; the system was nearly 12x more likely to downplay symptoms if a “friend” minimized them.
  • OpenAI said the study may not reflect typical use and noted ongoing model updates; experts called for safety standards and independent auditing.

Hottest takes

"cram AI into spaces where it performs poorly" — josefritzishere
"prefer a blind study comparing doctors to AI" — WarmWash
"how often are we reviewing doctors performance?" — spicyusername
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.