Your AI Doctor Is Wrong Half the Time — New Study Exposes a Dangerous Health Risk
Millions of people around the world now turn to AI chatbots before they call a doctor, and it is easy to understand why. These tools are fast, free, and available at any hour of the day. But a landmark new study published in the peer-reviewed medical journal The New York Times covered extensively reveals something deeply unsettling: popular AI chatbots give problematic medical advice roughly 50% of the time. That is not a minor glitch. That is a coin flip with your health on the line.
What the Study Actually Found
Researchers from the United States, Canada, and the United Kingdom conducted a rigorous evaluation of five of the most widely used AI platforms: ChatGPT, Gemini, Meta AI, Grok, and DeepSeek. Each platform was presented with 10 questions spanning five major health categories. The results, published in BMJ Open, were alarming. About 50% of all responses were classified as problematic. Even more concerning, nearly 20% of responses were deemed highly problematic, meaning they could pose a direct risk to someone acting on that advice.
To make things worse, not a single chatbot produced a fully complete and accurate reference list in response to any prompt. Citations were frequently incomplete or outright fabricated. In fact, only 32% of more than 500 citations pulled from ChatGPT, ScholarGPT, and DeepSeek were verified as accurate. Nearly half were at least partially made up. This is not a small margin of error. This is a fundamental reliability problem that strikes at the heart of the rise of AI doctors and the growing public dependence on them.
Which AI Chatbots Performed the Worst?
Not every platform performed equally, though none emerged with a clean record. Grok returned the highest share of problematic responses at 58%. ChatGPT followed closely at 52%, while Meta AI came in at 50%. Gemini and DeepSeek also contributed to the overall failure rate. The study makes clear that no chatbot is reliably safe as a standalone medical advisor, regardless of how advanced or popular it may be.
The Most Dangerous Part: Confidence Without Accuracy
Here is what makes this situation especially treacherous. The chatbots did not stammer, hedge, or express uncertainty when they gave wrong answers. They delivered flawed responses with a tone of confidence and authority. For the average user who has no medical background, there is little reason to question an answer that sounds so sure of itself. This is the crux of the danger: an AI that sounds like a doctor but reasons like a probability engine trained on imperfect data.
Chatbots do not understand information the way human clinicians do. They generate responses based on statistical patterns from their training data. When that data is incomplete, biased, or outdated, the output reflects those flaws, but the tone rarely does. The study's authors noted that these systems generate responses that sound authoritative but can be deeply flawed, and that combination is uniquely hazardous in a healthcare context.
Where AI Gets It Wrong the Most
The study identified clear patterns in where AI chatbots struggle the most. They performed relatively better on closed-ended questions and on topics like vaccines and cancer where the medical consensus is well established. However, performance dropped sharply when questions were open-ended or touched on complex subjects like stem cell therapy and nutrition. These are precisely the areas where people are most vulnerable to misinformation, and where the stakes of bad advice can be highest.
The chatbots also responded to adversarial or leading questions without adequate caution. They rarely refused to answer, even in situations where a responsible response would have been to direct the user to a qualified healthcare provider. This eagerness to answer, regardless of the complexity or sensitivity of the question, amplifies the risk for users who may not know when to seek a second opinion. This problem becomes even more pronounced when you consider how AI is being positioned in healthcare as a gateway to better patient outcomes.
Emergency Situations: A Separate and Serious Problem
Separate research highlighted by NPR adds another alarming dimension. In that study, researchers presented AI bots with simulated medical emergency scenarios. In 52% of emergency cases, the bots under-triaged, meaning they treated the ailment as less serious than it actually was. In one example, an AI failed to direct a hypothetical patient suffering from diabetic ketoacidosis and impending respiratory failure to go to the emergency department. That is not a minor error. That is the kind of mistake that can cost a life.
Dr. Girish Nadkarni, a physician and AI researcher at Mount Sinai, noted that when there was a textbook medical emergency, ChatGPT often got it right. The problem arose in more nuanced cases where time was a critical factor. In those scenarios, the AI both over- and under-estimated how urgently a patient needed care. For a person in a real emergency relying on a chatbot for guidance, that uncertainty could be fatal.
200 Million People Ask ChatGPT Health Questions Every Week
Let that number sink in. According to OpenAI, more than 200 million people ask ChatGPT health and wellness questions every single week. That figure represents an enormous portion of humanity seeking medical guidance from a system that the latest research confirms is wrong about half the time. Even at the more conservative estimate cited elsewhere of 40 million daily users consulting these platforms for health information, the scale of potential harm is staggering.
The appeal is understandable. Healthcare is expensive, complicated, and often inaccessible. In many parts of the world, seeing a specialist can take weeks or months. AI chatbots fill that gap with instant responses, zero cost, and no judgment. Visionaries like Bill Gates have even described a future where AI functions as an always-on medical resource for everyone. That vision is inspiring, but convenience is not the same as competence, and the BMJ Open study makes it very clear that these tools are not yet ready to bear the weight of that trust.
AI Companies Are Expanding into Healthcare Anyway
Despite these findings, AI companies are moving aggressively into the healthcare space. OpenAI launched new health-focused tools for both everyday users and clinicians earlier in 2026. Anthropic announced a healthcare offering for its Claude platform around the same time. Other competitors are rolling out similar products, intensifying the race to dominate AI-driven health information. This expansion makes the study's findings all the more urgent. As these tools become more embedded in how people seek and interpret health information, the potential for widespread harm scales accordingly.
A spokesperson from OpenAI responded to earlier research by arguing that the study in question used an older version of ChatGPT and that the company has since addressed some of the concerns raised. That may be true in part. But the BMJ Open study, which evaluated current platforms, suggests the problems are far from resolved across the industry as a whole.
What Researchers and Doctors Are Calling For
The authors of the BMJ Open study were direct in their conclusions. They stated that the findings highlight important behavioral limitations and the urgent need to reevaluate how AI chatbots are deployed in public-facing health and medical communication. They called specifically for public education, professional training, and regulatory oversight to ensure that generative AI supports rather than erodes public health. These are not radical demands. They are the minimum standards anyone should expect when technology intersects with human life.
Some physicians take a more measured view of the long-term picture. As NPR reported, doctors are increasingly acknowledging that AI and medicine are already deeply entangled. One physician expressed hope that AI could one day function as an extension of a human relationship, partnering with both doctors and patients to improve communication and cut through medical bureaucracy. That vision is compelling. But it depends on AI reaching a standard of reliability it has not yet demonstrated.
The Risk of Misinformation at Scale
One of the study's most pointed warnings concerns the amplification of misinformation. When a single piece of bad medical advice spreads through a chatbot used by millions, the downstream effect is not comparable to one person reading a bad article. The reach is exponential. The study authors warned explicitly that deploying chatbots without public education and oversight creates a serious risk of amplifying health misinformation at a scale that was previously impossible.
There is also a subtler risk that researchers pointed to. AI chatbots have shown a tendency to favor answers that match what the user seems to believe, rather than what the evidence supports. This confirmation bias in a medical context is particularly dangerous. A person who already distrusts vaccines, for example, may receive responses that reinforce rather than correct that belief, depending on how the question is framed and how the chatbot's training data skews.
How to Use AI for Health Information More Safely
None of this means you must stop using AI chatbots entirely for health-related questions. It does mean you need to use them very differently than many people currently do. Treat AI responses as a starting point for research, not a final answer. Use chatbots to help you understand terminology, generate questions to ask your doctor, or get a broad overview of a condition. Do not use them to self-diagnose, self-medicate, or decide whether or not a symptom requires emergency care.
The quality of your prompt also matters significantly. Research shows that AI tools perform better with specific, closed-ended questions than with vague, open-ended ones. Asking "What are the common side effects of metformin?" will generally yield a more reliable response than "Is my blood sugar problem serious?" The more context and specificity you provide, and the more critically you evaluate the answer, the safer your interaction with these tools will be.
The Bottom Line: A Useful Tool, Not a Safe Doctor
AI chatbots are genuinely impressive tools. They have democratized access to information in ways that would have seemed extraordinary a decade ago. But impressive is not the same as reliable, and when it comes to your health, reliability is non-negotiable. The BMJ Open study is not an argument against AI in healthcare. It is an argument for honesty about where these tools currently stand, and for ensuring that the public, policymakers, and companies building these products take that gap between perception and reality seriously.
A 50% rate of problematic advice is not a beta-stage quirk to be ironed out quietly. It is a public health issue that deserves the same attention and urgency as any other widespread source of medical misinformation. Until AI chatbots can demonstrate a level of accuracy that justifies the trust millions of people are already placing in them, the only genuinely safe approach is to keep your doctor in the loop. Your health is worth more than a fast answer.
Source & AI Information: External links in this article are provided for informational reference to authoritative sources. This content was drafted with the assistance of Artificial Intelligence tools to ensure comprehensive coverage, and subsequently reviewed by a human editor prior to publication.
0 Comments