ChatGPT Health ‘Under-Triaged’ Half of Medical Emergencies in a New Study

March 4, 2026

33

What It Means for AI in Healthcare

Artificial intelligence (AI) continues to reshape many aspects of our lives — from creative writing to customer service and even medical guidance. Among the most talked-about innovations is OpenAI’s ChatGPT Health, a specialized version of its popular conversational AI designed to assist users with health-related questions.

But a new independent study has raised serious concerns about the tool’s ability to assess the urgency of medical conditions — a role that could mean the difference between life and death. According to research published in Nature Medicine, ChatGPT Health under-triaged more than half of situations that physicians classified as genuine emergencies.

Let’s unpack what this means, why it matters, and the broader implications for the future of AI in healthcare.

What “Under-Triaging” Means

In clinical practice, triage refers to the evaluation of symptoms to determine how urgently someone needs medical attention. In emergency medicine, accurate triage guides decisions such as whether to call an ambulance, go directly to an emergency room, see a doctor soon, or manage a condition at home.

“Under-triaging” means classifying a serious condition — one that requires immediate medical care — as less urgent, suggesting delayed treatment or routine care instead. This is exactly what the new study found ChatGPT Health did in more than half the simulated emergency cases it evaluated.

How the Study Worked

Researchers at Icahn School of Medicine at Mount Sinai designed an experiment to test ChatGPT Health’s triage accuracy. They created 60 clinical scenarios spanning from routine health concerns to true medical emergencies. Each scenario was presented to the AI with 16 variations — changing details like patient gender or race — to ensure fairness and robustness. The tool’s advice was then compared with assessments from three trained physicians using established clinical guidelines.

The results showed that:

In 51.6% of true emergency cases, ChatGPT Health recommended seeing a doctor within a day or two instead of advising immediate emergency care.
It also over-triaged 64.8% of nonurgent cases, suggesting appointments when at-home care was appropriate.
In examples involving suicidal thoughts or self-harm risk, the tool’s responses were inconsistent — sometimes failing to direct users to appropriate crisis support.

These discrepancies highlight how AI can misinterpret complex, nuanced clinical information — especially when symptoms don’t fit a textbook pattern.

Why Triaging Accurately Matters

Medical emergencies aren’t always dramatic, textbook scenarios like heart attacks or seizures. Many begin with subtle warning signs — early respiratory failure, diabetic complications, or evolving infections — that demand professional attention before they become unmistakable.

When an AI suggests it’s safe to wait, a user might delay urgent care, with potentially harmful consequences. Experts warn that under-triage can lead to delayed diagnosis, prolonged suffering, preventable complications, and even death. Conversely, over-triage can strain medical resources, leading healthy people to seek care unnecessarily.

AI’s Strength and Its Limitations

Supporters of AI tools like ChatGPT Health note that these systems offer round-the-clock access to medical guidance, which can be especially valuable in regions with limited healthcare access. Millions of people already turn to general chatbots for health questions — and these tools can help explain medical terminology, summarize test results, or offer general wellness advice.

Yet the study highlights that under the hood, the technology is still not reliable as a standalone decision-maker for urgent care. Even though ChatGPT can recall and synthesize medical knowledge — and even perform well on written medical exams — clinical decision-making in real life involves nuance, context, and judgment that current AI has not mastered.

What OpenAI Says

In response to the study’s findings, an OpenAI spokesperson emphasized that ChatGPT Health is not meant for diagnosis or treatment and that users can ask follow-up questions to clarify their situation. They further noted that the product is still in a limited rollout phase and is expected to improve over time.

While incremental improvement is positive, experts point out that public health and safety should be the priority. Thomas Mafi, a physician unaffiliated with the study, says that any tool capable of influencing urgent healthcare decisions should be rigorously tested before wide adoption.

The Bigger Picture: AI in Healthcare

This study serves as a wake-up call in the broader conversation about AI’s role in medicine. As powerful as large models may be, they also carry risks — especially when used outside controlled environments.

Medical professionals generally agree that AI should augment rather than replace clinical judgment. AI tools may excel at handling routine information or reducing paperwork, but the complexity of human health demands professional oversight, deep clinical training, and real-world validation.

Studies like this one push the field toward improved training data, better safety protocols, and clearer boundaries around how these tools are deployed — especially when patient lives are at stake.

Final Thoughts

The study revealing that ChatGPT Health “under-triaged” more than half of genuine emergencies highlights a core truth: AI has extraordinary potential, but it is not yet ready to be a trusted gatekeeper for medical emergencies.

As we continue to embrace digital health innovation, we must ensure that tools are safe, transparent, and backed by rigorous independent evaluation. Until then, AI should be seen as an assistant, not a decision-maker, and users should always consult qualified healthcare professionals when urgent health concerns arise.

ChatGPT Health ‘Under-Triaged’ Half of Medical Emergencies in a New Study

What It Means for AI in Healthcare

What “Under-Triaging” Means

How the Study Worked

Why Triaging Accurately Matters

AI’s Strength and Its Limitations

What OpenAI Says

The Bigger Picture: AI in Healthcare

Final Thoughts

Taylor Swift Donates Over $2 Million to Charities in Christmas Holiday Giving Spree

Trump’s National Address: Economy, Military ‘Warrior Dividend,’ and Year-End Priorities

Trump’s Effort to Expand Presidential Power Heads to Supreme Court

LEAVE A REPLY Cancel reply

Most Popular

Elon Musk’s Makeshift AI Power Plant: The Roar, the Backlash, and the Future of AI Infrastructure in Mississippi

Can robots ever move with grace?

Elon Musk Sparks Row With Spain’s PM Over Plans to Restrict Social Media for Minors

Boden Fashion for Women & Kids | Boden DE

Últimas Noticias

Elon Musk’s Makeshift AI Power Plant: The Roar, the Backlash, and the Future of AI Infrastructure in Mississippi

Can robots ever move with grace?

Elon Musk Sparks Row With Spain’s PM Over Plans to Restrict Social Media for Minors

Publicaciones Populares

Elon Musk’s Makeshift AI Power Plant: The Roar, the Backlash, and the Future of AI Infrastructure in Mississippi

Can robots ever move with grace?

Elon Musk Sparks Row With Spain’s PM Over Plans to Restrict Social Media for Minors

Categoría Popular

ACERCA DE NOSOTROS

Síguenos