AI Is Transforming Medicine, but Experts Say It Can't Replace Clinicians
Rapid deployment of diagnostic and workflow algorithms promises benefits — and risks — prompting calls for local validation, continuous monitoring and designs that support, not supplant, human judgment.

Artificial intelligence is rapidly moving into hospitals and clinics, interpreting tests and aiding decisions, but physicians and researchers say it should augment—not replace—human clinicians.
More than 1,000 health-related AI tools have been authorized for use by the U.S. Food and Drug Administration, and a recent American Medical Association survey found that more than two-thirds of physicians use AI to some degree. Proponents point to faster diagnoses, automated charting, personalized patient support and new drug-discovery pathways as examples of substantial gains. Critics and some researchers warn, however, that unchecked or poorly integrated systems can produce errors, weaken clinician skills and work unevenly across different patient groups.
Researchers have documented both benefits and important limitations in recent studies. A team at Duke University evaluated an FDA-cleared algorithm designed to detect swelling and microbleeds on brain MRIs from patients with Alzheimer’s disease. The tool improved experts’ ability to identify subtle spots but also increased false positives, frequently mislabeling harmless artifacts as dangerous findings. The authors concluded the algorithm can be useful as a second opinion, but warned against using it as the first or sole reader.
A separate European study of gastroenterologists found a different risk: an AI-assisted system for polyp detection increased initial detection rates during colonoscopies, but physicians subsequently detected fewer precancerous polyps when returning to procedures without the AI. The researchers said the pattern suggests clinicians may develop overreliance on the assistive system, and that the phenomenon of “deskilling” observed in other domains could blunt human vigilance.
Other work has documented mechanisms that could underlie such effects. One study found that reliance on computerized aids can narrow the visual scan, reducing attention to peripheral information. A broader survey of more than 600 people across varying ages and education levels reported an association between higher AI use and weaker critical-thinking skills, a pattern researchers describe as cognitive off-loading. While these studies do not prove causation for every clinical setting, they have prompted calls for caution.
Experts emphasize that AI systems are powerful pattern recognizers and predictors but are not infallible. Performance can vary by population, imaging device, clinical workflow and local disease prevalence, meaning tools cleared by regulators may not perform identically in every hospital. Few health systems independently validate commercial algorithms before deployment, and clinicians sometimes assume regulatory clearance guarantees local suitability.
Former Food and Drug Administration commissioner Robert Califf and other regulators have urged continuous post-market monitoring of medical AI tools. Both the algorithms and the ways clinicians interact with them can change over time, altering risks and benefits. Clinical leaders and health systems are being urged to conduct local quality checks, monitor outcomes, train staff and maintain the ability of clinicians to work without AI support.
Design choices matter. Researchers and clinicians advocating for a different approach recommend systems that bolster rather than supplant clinical judgment. One proposed model, called Intelligent Choice Architecture, aims to nudge clinicians to look again and weigh alternatives rather than delivering definitive labels. Instead of presenting an absolute diagnosis, an ICA-like tool might highlight an image region and prompt a careful review, reinforcing human oversight.
An example from India illustrates the concept in practice. Apollo Hospitals, the country’s largest private health system, has begun using an ICA-style tool to help clinicians assess heart-attack risk. Earlier models provided a single risk score; the newer system breaks the score into personalized contributing factors, helping clinicians and patients see which risk elements to address. Health system leaders say this approach supports clinical decision-making without removing clinician autonomy.
The debate over AI’s role in medicine has broader educational implications. Medical educators and professional organizations are discussing how to train students and clinicians to use AI tools critically, maintain hands-on diagnostic skills and recognize when an algorithm may be wrong. Advocates note that medicine has long incorporated technologies that amplify human abilities—the stethoscope enhanced auscultation, and laboratory tests expanded diagnostic reach—without replacing fundamental clinical reasoning.
Regulators, hospital leaders and developers are being urged to ask whether new AI products make clinicians more thoughtful or less. If a tool reduces observation or judgment, it may not be ready for broad clinical use or may require redesigned workflows and safeguards. Developers and health systems alike are being encouraged to prioritize transparency, local validation and ongoing monitoring so that AI’s potential to improve outcomes is realized without eroding the clinical skills and judgment that remain central to patient care.