• Salisbury's Take
  • Posts
  • Brilliant or Bumbling? AI Tools Aren’t Quite Ready for Genetic Diagnosis

Brilliant or Bumbling? AI Tools Aren’t Quite Ready for Genetic Diagnosis

Large language models show promise as an assistant to clinical experts, but they’re not ready for prime time

Image by Mohamed Hassan

Two presentations at the recent annual meeting of the American Society of Human Genetics showed both sides of the same AI coin for clinical genetics applications. In separate studies, researchers used large language models (LLMs) to diagnose genetic diseases. Results from one of those studies looked so promising that the speaker had to assure attendees that there would continue to be a role for real humans in genetic diagnosis. In the other study, results were so awful that attendees wound up laughing out loud.

The first presentation came from Ivy Bethea, a recently minted genetic counselor who performed this study as part of her graduate work. She and her team began with a real challenge facing the clinical community: a shortage of clinical geneticists who have the training required to understand potentially complex results from genetic tests. “Access to genetic services is limited,” she reported. That places the burden of interpreting results on other healthcare professionals who often lack the training and the confidence to field patients’ questions about their test reports.

This problem is expected to get even worse as more and more people receive genetic tests as part of their healthcare. Bethea’s team wanted to understand whether LLMs could help fill the gap, allowing clinical genetics experts to focus their time on the cases where they’re needed most. They created 40 cases based on real-world genetic diseases and disorders, including family history and other data as it might be reported to genetic counselors or clinicians. That information was fed into several LLMs, including ChatGPT and the AI offerings from Meta and Google, which were then prompted to give a likely diagnosis, as well as other possible diagnoses. For the purposes of comparison, the team sent the same cases to board-certified geneticists for their interpretation. Results were assessed with a tiered scoring model that considered a variety of factors beyond whether the initial diagnosis was correct.

The study outcome was stark: LLMs outperformed clinical experts by a wide margin. This was partly because LLMs were more likely to present the correct diagnosis first, and partly because clinical geneticists were more likely to provide a high-level diagnosis that was less specific. Even with these results, though, Bethea saw potential LLM use in the clinic as no more than an assistant to the real experts. “They don’t replace clinical reasoning,” she said. “They can help with the what of a diagnosis but not the how.”

In a separate study reported by Sarath Babu Krishna Murthy from Columbia University’s Center for Precision Medicine and Genomics, a team evaluated the use of LLMs for genetic counseling related to kidney diseases. In this case, they developed six likely clinical scenarios and presented them to ChatGPT and several other freely available LLMs. In almost every case, the AI tools spat out a lot of irrelevant information or poorly reasoned logic. Three out of the five tools tested made recommendations that directly contradict clinical guidelines.

I’ve written about the limitations of AI tools for clinical use in the past, but it’s worth taking a closer look. We know that even the best-performing tools available today are prone to making stuff up, hallucinating facts, events, and resources that don’t exist. While AI tools in the future may be far more reliable, current versions are not ready for mainstream medical use. They are worth assessing, though, to help clinical experts figure out where they could be used and where they couldn’t.

These latest studies show both the potential and the drawbacks of current LLMs — important information that can feed into a broader understanding of how these tools need to be improved to make a difference for real-world patient cases.

Short Takes

In other news, inadvertently all about aging …

Scientists studied bowhead whales, which can live hundreds of years, in a hunt for the secret behind their longevity. They found that these enormous creatures have finely tuned proteins designed to repair damaged DNA — damage that can lead to cancer and other life-limiting conditions.

In another study, researchers analyzed immune cells from nearly 100 people ranging in age from 25 years to 90 years. They discovered that memory T cells, a critical component in the immune response, don’t perform as well as people age, which could explain why older people are more likely to get seriously ill from an infection.

Catch Up on Salisbury’s Take