Comprehensive analysis of machine learning models for prediction of sub-clinical mastitis: Deep Learning and Gradient-Boosted Trees outperform other models Academic Article uri icon


  • Sub-clinical bovine mastitis decreases milk quality and production. Moreover, sub-clinical mastitis leads to the use of antibiotics with consequent increased risk of the emergence of antibiotic-resistant bacteria. Therefore, early detection of infected cows is of great importance. The Somatic Cell Count (SCC) day-test used for mastitis surveillance, gives data that fluctuate widely between days, creating questions about its reliability and early prediction power. The recent identification of risk parameters of sub-clinical mastitis based on milking parameters by machine learning models is emerging as a promising new tool to enhance early prediction of mastitis occurrence. To develop the optimal approach for early sub-clinical mastitis prediction, we implemented 2 steps: (1) Finding the best statistical models to accurately link patterns of risk factors to sub-clinical mastitis, and (2) Extending this application from the farms tested to new farms (method generalization). Herein, we applied various machine learning-based prediction systems on a big milking dataset to uncover the best predictive models of sub-clinical mastitis. Data from 364,249 milking instances were collected by an electronic automated in-line monitoring system where milk volume, lactose concentration, electrical conductivity (EC), protein concentration, peak flow and milking time for each sample were measured. To provide a platform for the application of the models developed to other farms, the Z transformation approach was employed. Following this, various prediction systems [Deep Learning (DL), Naïve Bayes, Generalized Liner Model, Logistic Regression, Decision Tree, Gradient-Boosted Tree (GBT) and Random Forest] were applied to the non-transformed milking dataset and to a Z-standardized dataset. ROC (Receiver Operating Characteristics Curve), AUC (Area Under The Curve), and high accuracy demonstrated the high sensitivity of GBT and DL in detecting sub-clinical mastitis. GBT was the most accurate model (accuracy of 84.9%) in prediction of sub-clinical bovine mastitis. These data demonstrate how these models could be applied for prediction of sub-clinical mastitis in multiple bovine herds regardless of the size and sampling techniques.

publication date

  • 2019