News

Article

ChatGPT may have a future use in glaucoma

Author(s):

(Image Credit: AdobeStock/Diego)

(Image Credit: AdobeStock/Diego)

Large language models (LLMs) show great promise in the realm of glaucoma with additional capabilities of self-correction, a recent study found.1 However, use of the technology in glaucoma is still in its infancy, and further research and validation are needed, according to first author Darren Ngiap Hao Tan, MD, a researcher from the Department of Ophthalmology, National University Hospital, Singapore, Singapore.

He and his colleagues wanted to determine if LLMs were useful in medicine. “Most LLMs available for public use are based on a general model and are not trained nor fine-tuned specifically for the medical field, let alone a specialty such as ophthalmology,” they explained.

Tan and colleagues evaluated the responses of an artificial intelligence chatbot ChatGPT (version GPT-3.5, OpenAI),2 which is based on a LLM and was trained on a massive dataset of text (570 gigabytes worth of data with a model size of 175 billion parameters).3 While previous studies4-8 showed that ChatGPT was a tool that could be leveraged in the healthcare industry, no studies have evaluated its performance in answering queries pertaining to the glaucoma.

The investigators recounted that they curated 24 clinically relevant questions on 4 categories in glaucoma; diagnosis, treatment, surgeries, and ocular emergencies. An expert grader panel of 3 glaucoma specialists with combined experience of more than 30 years in the field graded the responses of the LLM to each question. When the responses were poor, the LLM was prompted to self-correct, and the expert panel then re-evaluated the subsequent responses

The main outcome measures were the accuracy, comprehensiveness, and safety of the responses of ChatGPT. The scores were ranked from 1 to 4, where 4 represents the best score with a complete and accurate response.

ChatGPT performance

The investigators reported a total of 72 responses to the 24 questions.

“The mean score of the expert panel was 3.29 with a standard deviation of 0.484. Of the 24 question-response pairs, 7 (29.2%) had a mean inter-grader score of 3 or less. The mean score of the original seven question-response pairs was 2.96, which rose to 3.58 after an opportunity to self-correct (z-score − 3.27, p = 0.001, Mann–Whitney U). The 7 of the 24 question-response pairs that performed poorly were given a chance to self-correct. After self-correction, the proportion of responses obtaining a full score increased from 22/72 (30.6%) to 12/21 (57.1%), (p = 0.026, χ2 test),” the study authors reported.

Yousef and colleagues concluded, “LLMs show great promise in the realm of glaucoma with additional capabilities of self-correction, with the caveat that the application of LLMs in glaucoma is still in its infancy and requires further research and validation.”

References:
  1. Tan DNH, Tham Y-C, Koh V, et al. Evaluating Chatbot responses to patient questions in the field of glaucoma. Front Med. 2024;11; https://doi.org/10.3389/fmed.2024.1359073
  2. Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, et al., editors. Advances in Neural Information Processing Systems. Curran Associates, Inc. 2020;1877–901; https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
  3. Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47:33; doi: 10.1007/s10916-023-01925-4
  4. Potapenko I, Boberg-Ans LC, Stormly Hansen M, et al. Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT. Acta Ophthalmol. 2023;101:829–31.
  5. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. (2023) 9:e45312. doi: 10.2196/45312
  6. Antaki F, Touma S, Milad D, et al. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3:100324; doi: 10.1016/j.xops.2023.100324
  7. Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141:589–97; doi: 10.1001/jamaophthalmol.2023.1144
  8. Delsoz M, Raja H, Madadi Y, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmol Ther. 2023;12:3121–32; doi: 10.1007/s40123-023-00805-x
Related Videos
Bonnie An Henderson, MD, and EnVision Summit 2025 preview
1 KOL is featured in this series.
1 KOL is featured in this series.
1 KOL is featured in this series.
1 KOL is featured in this series.
EyeCon 2024: Peter J. McDonnell, MD, marvels on mentoring, modern technology, and ophthalmology’s future
Lorraine Provencher, MD, presenting slides
© 2024 MJH Life Sciences

All rights reserved.