Computer-aided detection AI reduces interreader variability in grading hip abnormalities with MRI Academic Article uri icon


  • BACKGROUND:Accurate interpretation of hip MRI is time-intensive and difficult, prone to inter- and intrareviewer variability, and lacks a universally accepted grading scale to evaluate morphological abnormalities. PURPOSE:To 1) develop and evaluate a deep-learning-based model for binary classification of hip osteoarthritis (OA) morphological abnormalities on MR images, and 2) develop an artificial intelligence (AI)-based assist tool to find if using the model predictions improves interreader agreement in hip grading. STUDY TYPE:Retrospective study aimed to evaluate a technical development. POPULATION:A total of 764 MRI volumes (364 patients) obtained from two studies (242 patients from LASEM [FORCe] and 122 patients from UCSF), split into a 65-25-10% train, validation, test set for network training. FIELD STRENGTH/SEQUENCE:3T MRI, 2D T2 FSE, PD SPAIR. ASSESSMENT:Automatic binary classification of cartilage lesions, bone marrow edema-like lesions, and subchondral cyst-like lesions using the MRNet, interreader agreement before and after using network predictions. STATISTICAL TESTS:Receiver operating characteristic (ROC) curve, area under curve (AUC), specificity and sensitivity, and balanced accuracy. RESULTS:For cartilage lesions, bone marrow edema-like lesions and subchondral cyst-like lesions the AUCs were: 0.80 (95% confidence interval [CI] 0.65, 0.95), 0.84 (95% CI 0.67, 1.00), and 0.77 (95% CI 0.66, 0.85), respectively. The sensitivity and specificity of the radiologist for binary classification were: 0.79 (95% CI 0.65, 0.93) and 0.80 (95% CI 0.59, 1.02), 0.40 (95% CI -0.02, 0.83) and 0.72 (95% CI 0.59, 0.86), 0.75 (95% CI 0.45, 1.05) and 0.88 (95% CI 0.77, 0.98). The interreader balanced accuracy increased from 53%, 71% and 56% to 60%, 73% and 68% after using the network predictions and saliency maps. DATA CONCLUSION:We have shown that a deep-learning approach achieved high performance in clinical classification tasks on hip MR images, and that using the predictions from the deep-learning model improved the interreader agreement in all pathologies. LEVEL OF EVIDENCE:3 TECHNICAL EFFICACY STAGE: 1 J. Magn. Reson. Imaging 2020;52:1163-1172.


  • Tibrewala, R
  • Ozhinsky, E
  • Shah, R
  • Flament, I
  • Crossley, Kay
  • Srinivasan, R
  • Souza, R
  • Link, TM
  • Pedoia, V
  • Majumdar, S

publication date

  • October 1, 2020