|Articles|May 5, 2023

Study: AI cannot assist in preparing for ophthalmology board certification exams

A study of ChatGPT found the artificial intelligence tool answered less than half of the test questions correctly from a study resource commonly used by physicians when preparing for board certification in ophthalmology.

Using artificial intelligence to prepare for ophthalmic board certification through the Ophthalmic Knowledge Assessment Program (OKAP) and Written Qualitfying Exam (WQE) examinations likely won’t make the process any easier, according to a new study.

According to a study, researchers found that ChatGPT could answer approximately half of the presented multiple-choice questions correctly when prompted.

The study, published in JAMA Ophthalmology1 and led by St. Michael’s Hospital, a site of Unity Health Toronto, found ChatGPT correctly answered 46 per cent of questions when initially conducted in Jan. 2023. When researchers conducted the same test one month later, ChatGPT scored more than 10 percent higher.

In a news release, St. Michael’s Hospital noted the use of AI in medicine and exam preparation has received plenty of attention since ChatGPT became publicly available in November 2022. The technology also is raising concern for the potential of incorrect information and cheating in academia. ChatGPT is free, available to anyone with an internet connection, and works in a conversational manner.

“ChatGPT may have an increasing role in medical education and clinical practice over time, however it is important to stress the responsible use of such AI systems,” Rajeev H. Muni, MD, MSc, FRCSC, principal investigator of the study and a researcher at the Li Ka Shing Knowledge Institute at St. Michael’s said in the news release. “ChatGPT as used in this investigation did not answer sufficient multiple choice questions correctly for it to provide substantial assistance in preparing for board certification at this time.”

According to the news release, ChatGPT is an AI chatbot developed by OpenAI that can interact with users conversationally and act as an educational tool when used appropriately. Using ChatGPT responsibly in medical education and clinical practice is vital in the future, the authors of the current study noted.

Although a past study found that ChatGPT has knowledge equivalent to that of a third-year medical student when answering questions related to the United States Medical Licensing Examination, the performance of ChatGPT in other disciplines is unclear. The current study aimed to assess the knowledge of ChatGPT against practice questions used for board certification examinations for ophthalmology.

All questions were collected from the free trial of OphthoQuestions, which provides practice questions for the OKAP and WQE tests. Questions that required input of images and videos were excluded, whereas text-based questions were left in.

The researchers’ primary outcome was the performance of ChatGPT in answering the questions; secondary outcomes included whether ChatGPT provided explanations, the mean length of questions and responses, performance in answering questions without multiple-choice options, and changes in performance.

“ChatGPT is an artificial intelligence system that has tremendous promise in medical education. Though it provided incorrect answers to board certification questions in ophthalmology about half the time, we anticipate that ChatGPT’s body of knowledge will rapidly evolve,” said Marko Popovic, MD, a co-author of the study and a resident physician in the Department of Ophthalmology and Vision Sciences at the University of Toronto.

All conversations in ChatGPT were cleared before asking each question to avoid responses being influenced by past conversations. A new account was also used to avoid any past history influencing the answers. The primary analysis used the January 9 version of ChatGPT, whereas the secondary analysis used the February 13 version. All answers were manually reviewed by the authors.

ChatGPT answered questions from January 9 to 16, 2023, in the primary analysis and on February 17, 2023, in the secondary analysis. There were 125 text-based questions asked and analyzed by ChatGPT of the 166 available. All included questions were high yield for board certification examinations per OphthoQuestions.

ChatGPT had high demand when responding to 44 questions (35%) and its mean (SD) response time was 17.8 (14.4) seconds. ChatGPT was able to answer 58 of 125 questions (46.4%) correctly in January 2023. General medicine questions had the best results, with 11 of 14 (79%) questions correctly answered. Retina and vitreous questions had the worst results, with ChatGPT incorrectly answering all of them.

Additional insight or explanations were provided for 79 of 125 questions (63%); the proportion of questions given explanations or insights was similar between the questions answered incorrectly and correctly (difference, 5.8%; 95% CI, –11.0% to 22.0%). Length of questions was similar between questions that were answered correctly and incorrectly (difference, 21.4 characters; SE, 36.8; 95% CI, –51.5 to 94.3) and length of answers was also similar regardless of accuracy (difference, –80.0 characters; SE, 65.4; 95% CI, –209.5 to 49.5).

ChatGPT closely matched how trainees answer questions, and selected the same multiple-choice response as the most common answer provided by ophthalmology trainees 44 per cent of the time. ChatGPT selected the multiple-choice response that was least popular among ophthalmology trainees 11 per cent of the time, second least popular 18 per cent of the time, and second most popular 22 per cent of the time.

Andrew Mihalache, lead author of the study and undergraduate student at Western University, noted that ChatGPT performed most accurately on general medicine questions, answering 79 percent of them correctly.

“On the other hand, its accuracy was considerably lower on questions for ophthalmology subspecialties. For instance, the chatbot answered 20 percent of questions correctly on oculoplastics and zero per cent correctly from the subspecialty of retina,” he said. “The accuracy of ChatGPT will likely improve most in niche subspecialties in the future,.”

ChatGPT improved in the February 2023 analysis, with questions answered correctly in 73 of the 125 questions (58%). Stand-alone questions without multiple-choice options performed similarly to multiple-choice questions, with 42 of 78 (54%) stand-alone questions answered correctly (difference, 4.6%; 95% CI, –9.2% to 18.3%).

Internet speed, online traffic, and delays in response time could have biased certain parameters, the authors wrote in diacussing the study’s limitations. Different results could be yielded in a separate study due to ChatGPT providing unique answers. Questions that were not text based were excluded from the study. Questions may have been answered more broadly when not multiple choice, which could have led to an incorrect response.

The researchers concluded that ChatGPT was not able to “answer sufficient multiple-choice questions correctly for it to provide substantial assistance in preparing for board certification at this time.” However, they acknowledged that future studies should evaluate the progression of AI chatbots’ performance.

Reference

1. Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. Published online April 27, 2023. doi:10.1001/jamaophthalmol.2023.1144

Don’t miss out—get Ophthalmology Times updates on the latest clinical advancements and expert interviews, straight to your inbox.

Subscribe Now!

Latest CME

In-Person Event

EnVision Summit

February 13-16, 2026

$(CME Track) The Neural Frontier: Mapping Neurostimulation Across the DED Patient Spectrum for Refractive Surgery$

Study: AI cannot assist in preparing for ophthalmology board certification exams

Newsletter

Related Content

MeiraGTx Licenses complement-targeted geographic atrophy program from ZipBio

Looking back at the 2025 EnVision Summit

Last year in glaucoma at EnVision Summit 2025

China's NMPA approves ZEISS ARTEVO 750 and ARTEVO 850 ophthalmic surgical microscopes for clinical use

NeuroOp Guru: Understanding optic disc cupping after optic neuritis

Latest CME

EnVision Summit

(CME Track) The Neural Frontier: Mapping Neurostimulation Across the DED Patient Spectrum for Refractive Surgery

(CME Credit) Time Matters in GA: The Impact of Early Detection and Proactive Treatment Approaches

(COPE Credit) Time Matters in GA: The Impact of Early Detection and Proactive Treatment Approaches

(CME Track) Expanding Horizons in Toric IOLs: Translating Technological Advances Into Improved Patient Outcomes

Practical Approaches to Modern Dry Eye Treatment and Management

(CME Track) Revolutionizing nAMD and DME Management: Collaborative Strategies in the Age of Durable Treatments

(CME Track) Patient-Centered Treatment Strategies in the Management of nAMD and DME

(CME Track) The TED Perspective: A Multidisciplinary Approach to Thyroid Eye Care

(CME Track) Visionary Approaches: Rethinking Therapeutic and Interventional Glaucoma Management

(COPE Track) Expanding Horizons in Toric IOLs: Translating Technological Advances Into Improved Patient Outcomes

(COPE Track) The TED Perspective: A Multidisciplinary Approach to Thyroid Eye Care

(COPE Track) Patient-Centered Treatment Strategies in the Management of nAMD and DME

(COPE Track) Revolutionizing nAMD and DME Management: Collaborative Strategies in the Age of Durable Treatments

(COPE Track) The Neural Frontier: Mapping Neurostimulation Across the DED Patient Spectrum for Refractive Surgery

(COPE Track) Visionary Approaches: Rethinking Therapeutic and Interventional Glaucoma Management

(CME Track) Clinical Consultations™: Framing a New Approach to Geographic Atrophy Management – Expert Insights into Recent Developments

(COPE Track) Clinical Consultations™: Framing a New Approach to Geographic Atrophy Management – Expert Insights into Recent Developments

(CME Track) Rapid Reviews in Retina™: Emerging Updates from Winter 2025 – Addressing the Wealth of New Data in Treatments for nAMD and DME

(COPE Track) Rapid Reviews in Retina™: Emerging Updates from Winter 2025 – Addressing the Wealth of New Data in Treatments for nAMD and DME

Living With X-Linked Retinitis Pigmentosa: What We Can Learn From a Patient’s Experience

Living With X-Linked Retinitis Pigmentosa: What We Can Learn From a Patient’s Experience

(CME Track) Collaborative Community Connections™: Mastering the Management of nAMD and DME Through Therapeutic Innovation

(COPE Track) Collaborative Community Connections™: Mastering the Management of nAMD and DME Through Therapeutic Innovation

Navigating the Glaucoma Therapeutic and Surgical Landscape: From Conventional to Cutting-Edge

(COPE Track) Neurotrophic Keratitis: Multidisciplinary Approaches to Enhance Patient Outcomes

(CME Track) Neurotrophic Keratitis: Multidisciplinary Approaches to Enhance Patient Outcomes

(CME Track) The Neural Network: Exploring The Role of Neuromodulation in Dry Eye Disease Management

(COPE Track) The Neural Network: Exploring The Role of Neuromodulation in Dry Eye Disease Management

(COPE Track) Clinical Case Connections: Expert Insights on Applying Therapeutic Innovations in nAMD

(COPE Track) Toric IOLs Unleashed: From Technological Progress to Patient Success

(CME Track) Clinical Case Connections: Expert Insights on Applying Therapeutic Innovations in nAMD

(CME Track) Toric IOLs Unleashed: From Technological Progress to Patient Success

(CME Track) Clinical Case Connections: Understanding the Impact of Advances in Treatment for DME and DR

(COPE Track) Clinical Case Connections: Understanding the Impact of Advances in Treatment for DME and DR

(CME Credit) Navigating Pharmacological Presbyopia Treatment for Enhanced Patient Care

Neurotrophic Keratitis Insights: An Interactive Corneal Sensitivity Testing Workshop

(COPE Credit) Navigating Pharmacological Presbyopia Treatment for Enhanced Patient Care

(COPE Track) Small Mites, Big Impact: Revolutionizing Demodex Blepharitis Care

(CME Track) Small Mites, Big Impact: Revolutionizing Demodex Blepharitis Care

Rapid Reviews in Retina™: Emerging Updates from Spring 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Interventional Dry Eye: A Stepwise Treatment & Management Approach

Trending on Ophthalmology Times - Clinical Insights for Eye Specialists

MeiraGTx Licenses complement-targeted geographic atrophy program from ZipBio

Metformin use associated with reduced incidence of intermediate AMD

Last year in glaucoma at EnVision Summit 2025

Looking back at the 2025 EnVision Summit

NeuroOp Guru: Understanding optic disc cupping after optic neuritis