Most livestock metabolomic studies involve relatively small, homogenous populations of animals. However, livestock farming systems are non-homogenous, and large and more diverse datasets are required to ensure that biomarkers are robust. The aims of this study were therefore to (1) investigate the feasibility of using a large and diverse dataset for untargeted proton nuclear magnetic resonance (1H NMR) serum metabolomic profiling, and (2) investigate the impact of fixed effects (farm of origin, parity and stage of lactation) on the serum metabolome of early-lactation dairy cows. First, we used multiple linear regression to correct a large spectral dataset (707 cows from 13 farms) for fixed effects prior to multivariate statistical analysis with principal component analysis (PCA). Results showed that farm of origin accounted for up to 57% of overall spectral variation, and nearly 80% of variation for some individual metabolite concentrations. Parity and week of lactation had much smaller effects on both the spectra as a whole and individual metabolites (<3% and <20%, respectively). In order to assess the effect of fixed effects on prediction accuracy and biomarker discovery, we used orthogonal partial least squares (OPLS) regression to quantify the relationship between NMR spectra and concentrations of the current gold standard serum biomarker of energy balance, β-hydroxybutyrate (BHBA). Models constructed using data from multiple farms provided reasonably robust predictions of serum BHBA concentration (0.05 ≤ RMSE ≤ 0.18). Fixed effects influenced the results biomarker discovery; however, these impacts could be controlled using the proposed method of linear regression spectral correction.