Hierarchical pattern recognition in milking parameters predicts mastitis prevalence Academic Article uri icon


  • The aim of this study was to develop a predictive model for mastitis incidence, independent from Somatic Cell Count (SCC), to provide an alternative, simple, and cost-effective approach for mastitis risk management based on available milking parameters. The test-day Somatic Cell Count (SCC) is the most common indicator for Sub-Clinical Mastitis (SCM) surveillance in dairy industries worldwide. However, SCC is highly variable between days, raising major concerns for its reliability. This caveat highlights the need for longitudinal/frequent monitoring of SCC and/or developing alternative approaches for SCM surveillance. A considerable proportion of available milking data such as Milk Volume, Protein, Lactose, Electrical Conductivity (EC), Milking Time, and Peak Flow provide the possibility of pattern recognition and model discovery towards mastitis occurrence. Developing a predictive model involves: (1) finding the threshold (cutoff) of different predictive milking parameters and (2) finding the best combination of features that lead to mastitis and their hierarchical pattern/order. Here, in a large-scale study on 346,248 milking records, for the first time, we evaluated four different decision tree algorithms (Decision Tree, Stump Decision Tree, Parallel Decision Tree and Random Forest Decision Tree) with four different criteria (Accuracy, Info Gain, Gini Index and Gain Ratio) run on 11 datasets (original dataset and 10 created datasets by attribute weighting selection algorithms). Therefore, 572 models were evaluated and compared by 10-fold cross validation. The performance of each decision tree in drawing an inverted tree; with the most important feature at the root and less important variables as the leaf; was calculated by 10-fold cross validation. Random Forest Decision Tree with Gini Index criterion was the best model for predicting mastitis from milking parameters with a high accuracy of 90%. Decision Tree models identified a strong pattern for SCM in milking data where all (100%) of cows with low levels of lactose (Lactose ≤ 4.5 g/L) and low milk volume (Volume ≤ 21.7 L) had mastitis. In addition, a significant pattern was found for identifying healthy cows by high levels of lactose (Lactose ≥ 4.5 g/L) and low levels of EC (EC ≤ 5.2). This study doccuments that milking parameters mined by the Decision Tree Random Forest model can be utilised to accurately predict SCM. The findings can be employed to increase the reliability of test-day SCC or as SCC-independent and cost-effective predictors of SCM.


publication date

  • 2018