ChatGPT Health struggles to recognise when users need urgent care, according to a new study.
More than 230 million people a week ask ChatGPT for medical advice – from checking whether food is safe to eat, to managing allergies, or finding remedies to shake off a cold, according to OpenAI.
Despite performing well for textbook cases, ChatGPT Health failed to advise emergency care in serious cases, according to a new study published in Nature.
The study found that while the tool generally handled clear-cut emergencies correctly, it underestimated more than half of the cases that required emergency care.
“We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?” said Ashwin Ramaswamy, lead author of the study at Mount Sinai in New York.
“ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions,” he said.
He added that the language model struggled in situations where the danger is not immediately obvious.
In one asthma scenario, the system identified early warning signs of respiratory failure in its explanation but still advised waiting rather than seeking emergency treatment, he noted.
The research team created 60 structured clinical scenarios across 21 medical specialties with cases ranging from minor conditions appropriate for home care to true medical emergencies. Three independent physicians determined the correct level of urgency for each case using guidelines from 56 medical societies.
ChatGPT Health was launched by OpenAI in January 2026, allowing users to connect their health information – such as medical records and data from wellness apps like MyFitnessPal – to receive more personalised and contextual responses.
Misidentified suicide risk
The study also examined how the model responded to users reporting self-harm intentions and found similar results.
ChatGPT Health is supposed to be programmed so that when someone mentions self-harm or suicidal thoughts, it directly encourages them to seek help and call a public health number.
The banner “Help is available,” linking to the suicide and crisis lifeline, appeared inconsistently during the study.
The authors noted that the guardrail answered more reliably for the patient who had not identified a means of self-harm than for those who had.
“The pattern was not merely inconsistent but paradoxically inverted relative to clinical severity,” the study found.
Is it safe to use ChatGPT Health?
Despite the findings, the researchers did not suggest consumers should abandon AI health tools altogether.
“As a medical student training at a time when AI health tools are already in the hands of millions, I see them as technologies we must learn to integrate thoughtfully into care rather than substitutes for clinical judgment,” said Alvira Tyagi, second author of the study.
The study authors advised that people experiencing worsening or concerning symptoms, including chest pain, shortness of breath, severe allergic reactions, or changes in mental status, should seek medical care directly rather than relying solely on chatbot guidance.
The study also noted that AI language models are constantly evolving and frequently updated, meaning performance can change over time.
“Starting medical training alongside tools that are evolving in real time makes it clear that today’s results are not set in stone,” said Tyagi.
She added that the rapidly changing reality calls for ongoing review to ensure that technology improvements translate into safer care.