Effects of Study Population, Labeling and Training on Glaucoma Detection Using Deep Learning Algorithms

6533b855fe1ef96bd12aff1a

RESEARCH PRODUCT

Effects of Study Population, Labeling and Training on Glaucoma Detection Using Deep Learning Algorithms

Masaki Tanito Yuri Fujino Yuri Fujino Akram Belghith James A. Proudfoot Mark Christopher Jeffrey M. Liebmann Ryo Asaoka Ryo Asaoka Linda M. Zangwill Naoto Shibata Yoshiaki Kiuchi Michael H. Goldbaum Massimo A. Fazio Robert N. Weinreb Gustavo De Moraes Christopher A. Girkin Kana Tokumo Masato Matsuura Masato Matsuura Kenichi Nakahara Hiroshi Murata Jasmin Rezapour Jasmin Rezapour Christopher Bowd

subject

0301 basic medicine Aging genetic structures Fundus Oculi African descent Population Biomedical Engineering Glaucoma Primary care Neurodegenerative optic disc 03 medical and health sciences 0302 clinical medicine Deep Learning Opthalmology and Optometry Artificial Intelligence medicine Humans education Mild disease education.field_of_study Receiver operating characteristic business.industry Special Issue Deep learning imaging artificial intelligence medicine.disease eye diseases Ophthalmology 030104 developmental biology glaucoma machine learning 030221 ophthalmology & optometry Population study Artificial intelligence business Psychology Algorithm Algorithms

description

Author(s): Christopher, Mark; Nakahara, Kenichi; Bowd, Christopher; Proudfoot, James A; Belghith, Akram; Goldbaum, Michael H; Rezapour, Jasmin; Weinreb, Robert N; Fazio, Massimo A; Girkin, Christopher A; Liebmann, Jeffrey M; De Moraes, Gustavo; Murata, Hiroshi; Tokumo, Kana; Shibata, Naoto; Fujino, Yuri; Matsuura, Masato; Kiuchi, Yoshiaki; Tanito, Masaki; Asaoka, Ryo; Zangwill, Linda M | Abstract: PurposeTo compare performance of independently developed deep learning algorithms for detecting glaucoma from fundus photographs and to evaluate strategies for incorporating new data into models.MethodsTwo fundus photograph datasets from the Diagnostic Innovations in Glaucoma Study/African Descent and Glaucoma Evaluation Study and Matsue Red Cross Hospital were used to independently develop deep learning algorithms for detection of glaucoma at the University of California, San Diego, and the University of Tokyo. We compared three versions of the University of California, San Diego, and University of Tokyo models: original (no retraining), sequential (retraining only on new data), and combined (training on combined data). Independent datasets were used to test the algorithms.ResultsThe original University of California, San Diego and University of Tokyo models performed similarly (area under the receiver operating characteristic curve = 0.96 and 0.97, respectively) for detection of glaucoma in the Matsue Red Cross Hospital dataset, but not the Diagnostic Innovations in Glaucoma Study/African Descent and Glaucoma Evaluation Study data (0.79 and 0.92; P l .001), respectively. Model performance was higher when classifying moderate-to-severe compared with mild disease (area under the receiver operating characteristic curve = 0.98 and 0.91; P l .001), respectively. Models trained with the combined strategy generally had better performance across all datasets than the original strategy.ConclusionsDeep learning glaucoma detection can achieve high accuracy across diverse datasets with appropriate training strategies. Because model performance was influenced by the severity of disease, labeling, training strategies, and population characteristics, reporting accuracy stratified by relevant covariates is important for cross study comparisons.Translational relevanceHigh sensitivity and specificity of deep learning algorithms for moderate-to-severe glaucoma across diverse populations suggest a role for artificial intelligence in the detection of glaucoma in primary care.

year	journal	country	edition	language
2020-04-01	Translational Vision Science & Technology

10.1167/tvst.9.2.27 http://europepmc.org/articles/PMC7396194