What It Means for AI in Healthcare
Artificial intelligence (AI) continues to reshape many aspects of our lives — from creative writing to customer service and even medical guidance. Among the most talked-about innovations is OpenAI’s ChatGPT Health, a specialized version of its popular conversational AI designed to assist users with health-related questions.
But a new independent study has raised serious concerns about the tool’s ability to assess the urgency of medical conditions — a role that could mean the difference between life and death. According to research published in Nature Medicine, ChatGPT Health under-triaged more than half of situations that physicians classified as genuine emergencies.
Let’s unpack what this means, why it matters, and the broader implications for the future of AI in healthcare.
What “Under-Triaging” Means
In clinical practice, triage refers to the evaluation of symptoms to determine how urgently someone needs medical attention. In emergency medicine, accurate triage guides decisions such as whether to call an ambulance, go directly to an emergency room, see a doctor soon, or manage a condition at home.
“Under-triaging” means classifying a serious condition — one that requires immediate medical care — as less urgent, suggesting delayed treatment or routine care instead. This is exactly what the new study found ChatGPT Health did in more than half the simulated emergency cases it evaluated.
How the Study Worked
Researchers at Icahn School of Medicine at Mount Sinai designed an experiment to test ChatGPT Health’s triage accuracy. They created 60 clinical scenarios spanning from routine health concerns to true medical emergencies. Each scenario was presented to the AI with 16 variations — changing details like patient gender or race — to ensure fairness and robustness. The tool’s advice was then compared with assessments from three trained physicians using established clinical guidelines.
The results showed that:
- In 51.6% of true emergency cases, ChatGPT Health recommended seeing a doctor within a day or two instead of advising immediate emergency care.
- It also over-triaged 64.8% of nonurgent cases, suggesting appointments when at-home care was appropriate.
- In examples involving suicidal thoughts or self-harm risk, the tool’s responses were inconsistent — sometimes failing to direct users to appropriate crisis support.
These discrepancies highlight how AI can misinterpret complex, nuanced clinical information — especially when symptoms don’t fit a textbook pattern.
Why Triaging Accurately Matters
Medical emergencies aren’t always dramatic, textbook scenarios like heart attacks or seizures. Many begin with subtle warning signs — early respiratory failure, diabetic complications, or evolving infections — that demand professional attention before they become unmistakable.
When an AI suggests it’s safe to wait, a user might delay urgent care, with potentially harmful consequences. Experts warn that under-triage can lead to delayed diagnosis, prolonged suffering, preventable complications, and even death. Conversely, over-triage can strain medical resources, leading healthy people to seek care unnecessarily.
AI’s Strength and Its Limitations
Supporters of AI tools like ChatGPT Health note that these systems offer round-the-clock access to medical guidance, which can be especially valuable in regions with limited healthcare access. Millions of people already turn to general chatbots for health questions — and these tools can help explain medical terminology, summarize test results, or offer general wellness advice.
Yet the study highlights that under the hood, the technology is still not reliable as a standalone decision-maker for urgent care. Even though ChatGPT can recall and synthesize medical knowledge — and even perform well on written medical exams — clinical decision-making in real life involves nuance, context, and judgment that current AI has not mastered.

What OpenAI Says
In response to the study’s findings, an OpenAI spokesperson emphasized that ChatGPT Health is not meant for diagnosis or treatment and that users can ask follow-up questions to clarify their situation. They further noted that the product is still in a limited rollout phase and is expected to improve over time.
While incremental improvement is positive, experts point out that public health and safety should be the priority. Thomas Mafi, a physician unaffiliated with the study, says that any tool capable of influencing urgent healthcare decisions should be rigorously tested before wide adoption.
The Bigger Picture: AI in Healthcare
This study serves as a wake-up call in the broader conversation about AI’s role in medicine. As powerful as large models may be, they also carry risks — especially when used outside controlled environments.
Medical professionals generally agree that AI should augment rather than replace clinical judgment. AI tools may excel at handling routine information or reducing paperwork, but the complexity of human health demands professional oversight, deep clinical training, and real-world validation.
Studies like this one push the field toward improved training data, better safety protocols, and clearer boundaries around how these tools are deployed — especially when patient lives are at stake.
Final Thoughts
The study revealing that ChatGPT Health “under-triaged” more than half of genuine emergencies highlights a core truth: AI has extraordinary potential, but it is not yet ready to be a trusted gatekeeper for medical emergencies.
As we continue to embrace digital health innovation, we must ensure that tools are safe, transparent, and backed by rigorous independent evaluation. Until then, AI should be seen as an assistant, not a decision-maker, and users should always consult qualified healthcare professionals when urgent health concerns arise.



