Publication|Articles|December 9, 2024

Ophthalmology Times: December 2024
Volume 49
Issue 12

Generative AI: It’s only just begun

Listen

0:00 / 0:00

Key Takeaways

ChatGPT 4.0 showed a 12% hallucination/incorrect answer rate and poor agreement with experts in cornea subspecialty scenarios.
The AI model performed better in subcomponents but struggled with open-ended clinical scenarios, reflecting real-world complexities.
ChatGPT's performance in cataract, cornea, and refractive surgery scenarios was compared to published expert answers, revealing significant limitations.
The study underscores the premature application of AI in complex medical contexts, despite its potential in simpler, closed-set scenarios.

The use of ChatGPT may be premature to replace anterior segment experts.

Artificial intelligence (AI) may someday be a valuable asset in medical practice by achieving equivalency with medical experts, but that time has not yet arrived in the cornea subspecialty, according to a recent study evaluating AI’s performance in cataract, cornea, and refractive surgery clinical scenarios.

ChatGPT 4.0 demonstrated suboptimal absolute agreement with expert users and a 12% hallucination or incorrect answer rate, though it performed relatively better in some areas, according to Laura Palazzolo, MD, ABO, and Gaurav Prakash, MD, FRCS. The investigators are, respectively, from New York University Grossman School of Medicine, in Huntington Station, and the Department of Ophthalmology, University of Pittsburgh School of Medicine, in Pennsylvania.

By way of background, large language models (LLMs) represent an evolving frontier in generative AI, capable of learning language patterns and nuances, including medical terminology and concepts, through extensive training data. ChatGPT 4.0, the study’s focus, is one of the most popular LLMs and was trained with 1.7 trillion parameters, the investigators explained.

Several studies have assessed LLMs’ capabilities with single-correct-response multiple-choice questions.^1-5 ChatGPT’s performance in more complex, open-ended clinical scenarios has been explored through small studies on retina, glaucoma, and neuro-ophthalmology.^6-8

However, Palazzolo and Prakash noted a lack of robust data on open-ended clinical cases compared among international experts and LLMs in cataract, cornea, and refractive surgeries.

To address this gap, they conducted a comparative study evaluating ChatGPT 4.0, a commercially available LLM, in technically nuanced ophthalmic clinical scenarios, comparing its performance with published expert answers.

They presented ChatGPT with open-ended clinical scenarios previously published⁹ on PubMed (2019 to 2023). The published experts had been instructed as follows: Assume you are an experienced cornea, refractive, and anterior segment surgeon, analyze the given clinical scenario, and list your suggestions in bulleted points.

The investigators explained that the published expert answers (available behind a paywall and not presented to ChatGPT) were compared with ChatGPT’s responses, with each ChatGPT answer considered as a bulleted point. Cornea specialists evaluated the expert responses.

The answers, or study end point measurements, were labeled as correct, incorrect, or incomplete. The absolute concordance rate (ACR) (full questions and subcomponent concordance rate [SCR]) was calculated for each bulleted point divided by the total questions or bulleted points. Hallucination or incorrect answer rates and incomplete answer rates were also measured.

Because LLMs generate text based on the prompt given, they may produce factually incorrect information that they regard as accurate, known as hallucinations. A hallucination occurs when the AI generates an answer based on limited data. Incorrect answers stem from wrong or outdated information. Due to the black box nature of LLMs, hallucination and incorrect answer rates were combined.

Expert and ChatGPT comparison

ChatGPT responded to 33 questions, yielding 275 bulleted points.

“The SCR was 76.7% (211 of 275), which dropped to 24.2% (8 of 33) when absolute (‘AND’-gated) clustering was performed at the question level for ACR (P = .02, chi-square test). ChatGPT covered all points noted by experts in only 36.4% (12 of 33) of cases,” the authors reported.

Additionally, ChatGPT concurred with the editors’ differential diagnosis in 20 of 33 cases, compared with 33 of 33 for the experts (P < .001, Fisher’s exact test). At the bulleted-point level, the hallucination/incorrect answer rate was 12.4% (34 of 275), the incomplete answer rate was 10.5% (29 of 275), and the correct-to-incorrect answer ratio was 6.2:1.

“ChatGPT 4.0 showed poor absolute agreement with expert users and had a 12% hallucination/incorrect answer rate,” Palazzolo and Prakash concluded. “Its relatively better performance in the subcomponents aligns with more optimistic published results in closed-set answers (multiple-choice questions). Open-ended clinical scenarios reflect real-world circumstances, and ChatGPT appears premature for this use.”

Gaurav Prakash, MD, FRCS

E: [email protected]

Prakash has no financial interests related to the content of this article.

Laura Palazzolo, MD, ABO

E: [email protected]

Palazzolo has no financial interests related to the content of this article. The data was presented at the American Society of Cataract and Refractive Surgery Annual Meeting; April 5-9, 2024, in Boston, Massachusetts; paper session: Surgical Outcomes III.

References

Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi:10.2196/45312
Lin SY, Chan PK, Hsu WH, Kao CH. Exploring the proficiency of ChatGPT-4: an evaluation of its performance in the Taiwan advanced medical licensing examination. Digit Health. 2024;(10):20552076241237678. doi:10.1177/20552076241237678
Panthier C, Gatinel D. Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: a novel approach to medical knowledge assessment. J Fr Ophtalmol. 2023;46(7):706-711. doi:10.1016/j.jfo.2023.05.006
Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved performance of ChatGPT-4 on the OKAP examination: a comparative study with ChatGPT-3.5. J Acad Ophthalmol (2017). 2023;15(2):e184-e187. doi:10.1055/s-0043-1774399
Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108(10):1379-1383. doi:10.1136/bjo-2023-324091
Maywood MJ, Parikh R, Deobhakta A, Begaj T. Performance assessment of an artificial intelligence Chatbot in clinical vitreoretinal scenarios. Retina. 2024;44(6):954-964. doi:10.1097/IAE.0000000000004053
Delsoz M, Raja H, Madadi Y, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports.Ophthalmol Ther.2023;12(6):3121-3132. doi:10.1007/s40123-023-00805-x
Madadi Y, Delsoz M, Lao PA, et al. ChatGPT assisting diagnosis of neuro-ophthalmology diseases based on case reports. medRxiv. Preprint posted online September 14, 2023. doi:10.1101/2023.09.13.23295508
Nuijts RMMA, Kartal S. Epithelial ingrowth after LASIK September consultation #1. J Cataract Refract Surg. 2021;47(9):1242. doi:10.1097/j.jcrs.0000000000000764

Articles in this issue

10 months ago

Article

Playing matchmaker with Schlemm canal surgery

10 months ago

Article

Beyond VEGF-A: Next-generation therapies aim to enhance vision and quality of life in neovascular AMD

10 months ago

Article

Advancements in ocular drug delivery

10 months ago

Article

Advanced therapies for retinal vascular diseases

10 months ago

Article

Glaucoma 360 offering cutting-edge innovations and networking opportunities

10 months ago

Article

Specular microscopy: A practical guide for the assessment of corneal endothelial health and device selection

11 months ago

Article

Enhancing precision in GA imaging

11 months ago

Article

Diagnostic Maze: Navigating the complexity of isolated optic disc edema in syphilitic uveitis

Don’t miss out—get Ophthalmology Times updates on the latest clinical advancements and expert interviews, straight to your inbox.

Subscribe Now!

Latest CME

$(CME Track) The Neural Frontier: Mapping Neurostimulation Across the DED Patient Spectrum for Refractive Surgery$

Generative AI: It’s only just begun

Key Takeaways

Expert and ChatGPT comparison

References

Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi:10.2196/45312

Lin SY, Chan PK, Hsu WH, Kao CH. Exploring the proficiency of ChatGPT-4: an evaluation of its performance in the Taiwan advanced medical licensing examination. Digit Health. 2024;(10):20552076241237678. doi:10.1177/20552076241237678

Panthier C, Gatinel D. Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: a novel approach to medical knowledge assessment. J Fr Ophtalmol. 2023;46(7):706-711. doi:10.1016/j.jfo.2023.05.006

Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved performance of ChatGPT-4 on the OKAP examination: a comparative study with ChatGPT-3.5. J Acad Ophthalmol (2017). 2023;15(2):e184-e187. doi:10.1055/s-0043-1774399

Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108(10):1379-1383. doi:10.1136/bjo-2023-324091

Maywood MJ, Parikh R, Deobhakta A, Begaj T. Performance assessment of an artificial intelligence Chatbot in clinical vitreoretinal scenarios. Retina. 2024;44(6):954-964. doi:10.1097/IAE.0000000000004053

Delsoz M, Raja H, Madadi Y, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports.Ophthalmol Ther.2023;12(6):3121-3132. doi:10.1007/s40123-023-00805-x

Madadi Y, Delsoz M, Lao PA, et al. ChatGPT assisting diagnosis of neuro-ophthalmology diseases based on case reports. medRxiv. Preprint posted online September 14, 2023. doi:10.1101/2023.09.13.23295508

Nuijts RMMA, Kartal S. Epithelial ingrowth after LASIK September consultation #1. J Cataract Refract Surg. 2021;47(9):1242. doi:10.1097/j.jcrs.0000000000000764

Articles in this issue

Newsletter

Related Content

Evolving glaucoma therapy: A new era of interventional strategies

Development of maculopathy associated with systemic medications

Throwback Series: When tools of the past teach new lessons

Proactive strategies and sustained delivery advance glaucoma care

STAAR postpones Alcon merger vote, Broadwood calls for removal of board members

Latest CME

(CME Track) The Neural Frontier: Mapping Neurostimulation Across the DED Patient Spectrum for Refractive Surgery

20th Annual Controversies in Modern Eye Care

(CME Track) Clinical Consultations™: Framing a New Approach to Geographic Atrophy Management – Expert Insights into Recent Developments

(CME Track) Rapid Reviews in Retina™: Emerging Updates from Winter 2025 – Addressing the Wealth of New Data in Treatments for nAMD and DME

(COPE Track) Rapid Reviews in Retina™: Emerging Updates from Winter 2025 – Addressing the Wealth of New Data in Treatments for nAMD and DME

Living With X-Linked Retinitis Pigmentosa: What We Can Learn From a Patient’s Experience

Living With X-Linked Retinitis Pigmentosa: What We Can Learn From a Patient’s Experience

(CME Track) Collaborative Community Connections™: Mastering the Management of nAMD and DME Through Therapeutic Innovation

(COPE Track) Collaborative Community Connections™: Mastering the Management of nAMD and DME Through Therapeutic Innovation

Navigating the Glaucoma Therapeutic and Surgical Landscape: From Conventional to Cutting-Edge

(CME Track) Neurotrophic Keratitis: Multidisciplinary Approaches to Enhance Patient Outcomes

(COPE Track) Neurotrophic Keratitis: Multidisciplinary Approaches to Enhance Patient Outcomes

(CME Track) The Neural Network: Exploring The Role of Neuromodulation in Dry Eye Disease Management

(COPE Track) The Neural Network: Exploring The Role of Neuromodulation in Dry Eye Disease Management

(CME Track) Toric IOLs Unleashed: From Technological Progress to Patient Success

(CME Track) Clinical Case Connections: Expert Insights on Applying Therapeutic Innovations in nAMD

(CME Track) Clinical Case Connections: Understanding the Impact of Advances in Treatment for DME and DR

(COPE Track) Clinical Case Connections: Understanding the Impact of Advances in Treatment for DME and DR

(COPE Track) Toric IOLs Unleashed: From Technological Progress to Patient Success

(COPE Track) Clinical Case Connections: Expert Insights on Applying Therapeutic Innovations in nAMD

(CME Credit) Navigating Pharmacological Presbyopia Treatment for Enhanced Patient Care

Neurotrophic Keratitis Insights: An Interactive Corneal Sensitivity Testing Workshop

(COPE Credit) Navigating Pharmacological Presbyopia Treatment for Enhanced Patient Care

(COPE Track) Small Mites, Big Impact: Revolutionizing Demodex Blepharitis Care

(CME Track) Small Mites, Big Impact: Revolutionizing Demodex Blepharitis Care

Rapid Reviews in Retina™: Emerging Updates from Spring 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Interventional Dry Eye: A Stepwise Treatment & Management Approach

(CME Track) Rapid Reviews in Retina™: Emerging Updates from Summer 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

(CME Track) Collaborating Across the Continuum™: Best Practices in Patient-Centric Team Management of XLRP

(CME Track) A Forward Look at Anti-VEGF Therapies: A Paradigm Shift in Neovascular Retinal Disease Management

(CME Track) The Evolution of MacTel Management: Integrating Neuroprotective Therapies Into Clinical Practice

(CME Track) Community Collaborative Connections™: Optimizing the Collaborative Care of Neovascular Retinal Disease in a New Age of Treatment

Navigating Advances in Neovascular Retinal Disease: Translating Evidence to Practice in AMD, DME, and RVO

(COPE) Optimizing Management of Ocular Toxicity in Cancer Patients: The Role of Ophthalmologists in the Spectrum of Care

Community Practice Connections™: Transforming Treatment in nAMD, DME, and DR – Keeping an Eye on Optimal Outcomes

(CME) Optimizing Management of Ocular Toxicity in Cancer Patients: The Role of Ophthalmologists in the Spectrum of Care

Expert Perspectives on Technological Advances in Cataract Surgery

Expanding Treatment Options for Demodex Blepharitis: Patient-Centric Approaches to Therapy

Expanding Treatment Options for Demodex Blepharitis: Patient-Centric Approaches to Therapy

(CME Credit) Community Practice Connections™: Applying Advances in Neovascular Retinal Disease - Expanding Treatment Intervals & Enhancing Outcomes

Interventional Glaucoma Treatment: Evolving Paradigms for Addressing Unmet Needs

Burst CME™ Part 3: Initiating Early Intervention in Patients With Glaucoma Who Fail Pharmacological Therapy

(CME Track) Burst CME™ Part 1: Insights Into Glaucoma and the Need for Early Intervention

Interventional Glaucoma Treatment: Evolving Paradigms for Addressing Unmet Needs

(COPE Credit) Community Practice Connections™: Applying Advances in Neovascular Retinal Disease - Expanding Treatment Intervals & Enhancing Outcomes

(CME Track) Neurotrophic Keratitis in Focus – From Early Recognition to Strategic Intervention

(COPE Track) Neurotrophic Keratitis in Focus – From Early Recognition to Strategic Intervention

(COPE Track) Leveraging Rho Kinase Inhibition in Glaucoma Management — How Can Novel Treatment Pathways Impact Patient Outcomes?

(CME Track) Leveraging Rho Kinase Inhibition in Glaucoma Management — How Can Novel Treatment Pathways Impact Patient Outcomes?

Rapid Reviews in Retina™: Emerging Updates from Fall 2024 - Addressing the Wealth of New Data in Treatments for nAMD and DME

EnVision Summit

(CME Credit) Time Matters in GA: The Impact of Early Detection and Proactive Treatment Approaches

(CME Track) Expanding Horizons in Toric IOLs: Translating Technological Advances Into Improved Patient Outcomes

(CME Track) Patient-Centered Treatment Strategies in the Management of nAMD and DME

(CME Track) The TED Perspective: A Multidisciplinary Approach to Thyroid Eye Care

(CME Track) Visionary Approaches: Rethinking Therapeutic and Interventional Glaucoma Management

Practical Approaches to Modern Dry Eye Treatment and Management

(CME Track) Revolutionizing nAMD and DME Management: Collaborative Strategies in the Age of Durable Treatments