News

Article

Study highlights current and potential roles of AI in glaucoma diagnosis

Author(s):

Key Takeaways

  • GPT-4o showed potential in differential diagnosis of glaucoma, matching human ophthalmologists in accuracy and surpassing them in completeness.
  • The AI model underperformed in primary diagnosis, highlighting its limitations in cognitive reasoning and clinical information prioritization.
SHOW MORE

According to researchers, GPT-4o demonstrates potential in generating comprehensive differential glaucoma diagnoses but falls short in primary diagnostic accuracy, underscoring its role as a complementary tool rather than a standalone diagnostic solution.

(Image credit: Adobe Stock/lucegrafiar)

(Image credit: Adobe Stock/lucegrafiar)

Artificial intelligence (AI), particularly large language models (LLMs) like GPT-4o, is emerging as a potential tool for enhancing diagnostic accuracy in healthcare. This study evaluated the diagnostic performance of GPT-4o compared to human ophthalmologists in glaucoma cases, exploring its strengths and limitations.

Study objectives and design

The primary aim was to assess GPT-4o’s performance in primary and differential diagnoses of glaucoma relative to human ophthalmologists. While GPT-4o demonstrated potential in generating comprehensive differential diagnoses, it fell short in primary diagnosis accuracy and completeness, highlighting the current limitations of AI as a standalone diagnostic tool.1

According to a group of Chinese researchers, the prospective, observational study was conducted at a tertiary care ophthalmology center. Twenty-six glaucoma cases, encompassing both primary and secondary types, were selected from publicly available databases and institutional records. These cases were analyzed by GPT-4o and three ophthalmologists with varying levels of experience.

Performance assessment and results

Diagnostic accuracy and completeness were evaluated using 10-point and 6-point Likert scales, respectively. Statistical analyses, including Kruskal–Wallis and Mann–Whitney U tests, revealed significant differences in performance:

  • Primary diagnosis:
    GPT-4o achieved a mean accuracy score of 5.500 (p < 0.001), significantly lower than the highest-performing ophthalmologist, Doctor C, who scored 8.038 (p < 0.001). Completeness scores for GPT-4o were also inferior (3.077, p < 0.001) compared to the lowest-scoring ophthalmologist, Doctor B (3.615, p < 0.001).
  • Differential diagnosis:
    In contrast, GPT-4o demonstrated comparable accuracy to the ophthalmologists for differential diagnoses, scoring 7.577 versus Doctor A (7.615) and Doctor C (7.673) (p < 0.0001). Notably, GPT-4o achieved the highest completeness score (4.096), outperforming Doctor C (3.846), Doctor A (2.923), and Doctor B (2.808) (p < 0.0001).

Insights and context

Primary diagnosis involves cognitive reasoning and prioritization of clinical information, areas where GPT-4o underperformed compared to human ophthalmologists. These findings align with prior research showing that while AI models like ChatGPT have improved in general medicine and certain specialties, they continue to struggle in highly specialized fields such as neuro-ophthalmology and ocular pathology.1

Conversely, GPT-4o’s superior performance in differential diagnosis reflects its capacity to reference extensive databases and generate exhaustive lists of potential diagnoses. However, this comprehensive approach may overwhelm clinicians, increasing the risk of cognitive bias and misdiagnosis.

Limitations and implications

The study’s limitations include its small sample size (n = 26) and case selection, which emphasized a broad range of glaucoma subtypes over real-world prevalence patterns. Future studies should address these constraints by utilizing larger, more representative datasets.

“Recognizing this limitation, future research with a larger, more representative sample that aligns with real-world prevalence rates would improve the applicability of these findings across diverse clinical settings,” the researchers noted.

According to the researchers, the findings underscore the importance of improving AI models for primary diagnosis by incorporating clinician feedback, enhancing training datasets, and exploring more sophisticated reasoning algorithms. Evaluating AI’s utility in primary care or non-specialist settings, where it might serve as an adjunct to optometrists or general practitioners, is another avenue for future research.

Conclusions

GPT-4o demonstrated promise as a complementary tool for differential diagnosis in glaucoma cases but remains inadequate for primary diagnosis. The study also highlighted concerning gaps in diagnostic accuracy among human ophthalmologists, emphasizing the need for continuous self-evaluation and transparent communication with patients.

“Future advancements in AI may eventually enhance diagnostic accuracy, but until then, it should be viewed as a complementary tool, not a replacement for human expertise,” the researchers concluded.

Reference
1. Zhang, J., Ma, Y., Zhang, R. et al. A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis. Sci Rep 14, 30385 (2024). Published December 5, 2024. Accessed December 6, 2024. https://doi.org/10.1038/s41598-024-80917-x

Related Videos
1 expert is featured in this series.
Lana Rifkin, MD, uveitis committee chair at EnVision Summit 2025
Bonnie An Henderson, MD, and EnVision Summit 2025 preview
AAO 2024: Matt Giegengack, MD: Injectable endothelial cell therapy shows promise for improving vision and reducing glare in corneal edema
EyeCon 2024: Adam Wenick, MD, talks about myopic interventions across the lifespan
Adam Wenick, MD, chairs EyeCon session: New treatments in geographic atrophy from detection to intervention
© 2025 MJH Life Sciences

All rights reserved.