
Image by macrovector, from Freepik
A Typo Could Change Your AI Medical Advice, Study Warns
New research finds that AI used in healthcare alters medical advice based on typos, slang, and gender, raising urgent concerns about algorithmic fairness.
In a rush? Here are the quick facts:
- Minor typos in messages reduced AI accuracy by up to 9%.
- Female patients got worse advice 7% more often than male patients.
- AI changed recommendations based on tone, slang, and pronouns.
A new study reveals that large language models (LLMs) used in healthcare can be influenced by seemingly irrelevant details in patient messages.
This can result in inconsistent and even biased treatment recommendations. Presented at the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25), the research raises serious concerns about the reliability of AI tools in medical decision-making.
The study found that even minor tweaks in how a patient phrases their symptoms, such as typos, added spaces, or a change in tone, can significantly alter the AI’s treatment suggestions.
For instance, when patients used uncertain language like “I think I might have a headache,” the AI was 7–9% more likely to suggest self-care over professional medical attention, even in cases where further evaluation was warranted.
These changes weren’t just theoretical. Researchers used AI to simulate thousands of patient notes written in different tones and formats, mimicking people with limited English, poor typing skills, or emotional language.
Messages also included gender-neutral pronouns and stylized writing, showing how the way someone communicates can sway an AI’s diagnosis.
Gender bias also emerged as a major issue. Female patients were 7% more likely than male patients to receive incorrect self-management advice when non-clinical language cues were introduced.
Follow-up tests showed that AI models were more likely than human doctors to shift treatment suggestions based on perceived gender or communication style, even when clinical symptoms remained the same.
The performance of these models worsened in more realistic, conversational chat settings. Diagnostic accuracy dropped by over 7% when minor text changes were introduced into these AI-patient interactions.
This matters because AI is increasingly being used to diagnose illness, respond to patient questions, and draft clinical notes. But the study shows that the way a message is written, its tone, errors, or structure, can distort AI reasoning.
This could lead to under-treatment of vulnerable groups such as women, non-binary people, individuals with health anxiety, non-native English speakers, and those less familiar with digital communication.
“Insidious bias can shift the tenor and content of AI advice, and that can lead to subtle but important differences,” said Karandeep Singh of the University of California, San Diego, who was not involved in the research, as reported by New Scientist.
Lead researcher Abinitha Gourabathina emphasized, “Our findings suggest that AI models don’t just process medical facts—they’re influenced by how information is presented. This could deepen healthcare disparities if not addressed before deployment.”
The researchers tested multiple leading AI models, including OpenAI’s GPT-4, Meta’s Llama-3 models, and Writer’s healthcare-specific Palmyra-Med model. All showed the same weakness: format and tone changes led to less reliable advice. Despite this, companies like Writer state that their models should not be used for clinical decision-making without a human in the loop.
Experts warn that as generative AI becomes more common in health records and patient services, better evaluation systems are urgently needed.
To prevent harm, the research team is urging more rigorous testing of AI medical tools to ensure they remain fair and accurate, regardless of how patients express their concerns. They’ve made their bias evaluation framework public to help developers improve AI systems in healthcare.