Publication
Article
Digital Edition
Author(s):
Additional testing is needed to ensure accuracy across different groups.
Reviewed by Damon Wong, PhD
Machine learning (ML) seems to be the wave of the future in medicine and, when perfected, it will be a most valuable diagnostic asset. However, right now the technology is in its infancy and the kinks have to be worked out and diagnostic capabilities perfected.
A recent study1 found that ML is still wanting, in that the reproducibility across various data sets was poor in different ethnic groups for detecting glaucoma, according to senior author Damon Wong, PhD, from the Singapore Eye Research Institute-(SERI), Singapore National Eye Centre; SERI-Nanyang Technological University Advanced Ocular Engineering; School of Chemical and Biomedical Engineering, Nanyang Technological University, all in Singapore; and the Institute of Molecular and Clinical Ophthalmology, Basel, Switzerland.
This result is in contrast to other studies2-6 that used ML approaches to detect glaucoma. While most of the study reported high diagnostic accuracies (area under the receiver operating curve [AUC] = 0.88-0.98) for glaucoma detection, they did not assess the models with independently sampled data from a different ethnicity group (external test), which limits the generalizability of the models across ethnicities,7 Wong and colleagues explained.
In light of this deficiency, the investigators conducted a prospective, cross-sectional study in which they wanted to externally validate the ability of ML models to detect glaucoma using optical coherence tomography (OCT) images. The study included 514 Asian patients, ie, 257 patients with glaucoma and 257 controls without glaucoma, who were enrolled to construct ML models for glaucoma detection. The models then were evaluated in 356 Asian patients, ie, 183 with glaucoma and 173 controls without glaucoma, and also in 138 Caucasians patients, ie, 57 with glaucoma and 81 controls without glaucoma.
The retinal nerve fiber layer (RNFL) thickness values were used in the study; they were produced by the compensation model, which the authors described as a multiple regression model fitted on healthy subjects that corrects the RNFL profile for anatomic factors and the original OCT data (measured) to build 2 classifiers, respectively.
Data evaluation
With the exception of the foveal distance (P = .029), the investigators found no significant differences between the training data set and the Asian test data set (P ≥ .174).
They did find significant differences in the demographic data between the training data set and the Caucasian test data set, ie, the participants in the external test data set were younger, more were female, and more eyes had mild and moderate glaucoma.
In addition, the ocular characteristics also differed between those 2 datasets; specifically, fewer Caucasian patients had significantly shorter fovea distances, smaller foveal angles, less elliptical optic discs (ratio closer to 1.0), and thicker retinal vessel densities (P ≤ .009), the authors reported.
In the glaucoma data set, the Caucasian patients had significantly less elliptical optic discs, higher optic disc orientations, and thicker retinal vessel densities; were more hyperopic; and had greater RNFL thicknesses (P ≤.001).
“Both the ML models (AUC = 0.96 and accuracy = 92%) outperformed the measured data (AUC = 0.93; P < .001) for glaucoma detection in the Asian data set. However, in the Caucasian data set, the ML model trained with compensated data (AUC = 0.93 and accuracy = 84%) outperformed the ML model trained with original data (AUC = 0.83 and accuracy = 79%; P < .001) and measured data (AUC = 0.82; P < .001) for glaucoma detection,” Wong and colleagues reported.
In commenting on their findings, the investigators said,“The results showed poor reproducibility of the performance with the ML model trained on original RNFL data across different data sets. The performance of the ML model trained on compensated RNFL seemed to be maintained. To the best of our knowledge, our study is the first to assess the performance of ML classifiers to detect glaucoma between ethnicities.”
This next step of evaluating the ML performance in different ethnic groups is the next critical step in the process to determine the model’s generalizability to other populations, they explained and advised that care must be taken be exercised in cohorts of patients representing different ethnic groups.