BACKGROUND:Language impairment (LI) in the preschool years is known to vary over time. Stability in the diagnosis of LI may be influenced by children’s individual variability, the measurement error of commonly used assessment instruments and the cut-points used to define impairment. AIMS:To investigate the agreement between two different age-based versions of a language assessment instrument and the stability of the classification of LI using the two measures over a 12-month period. METHODS & PROCEDURES:A total of 945 participants completed the Clinical Evaluation of Language Fundamentals(CELF—Preschool 2 or 4th Edn) at 4 and 5 years of age. Agreement and stability were analysed using Bland–Altman plots, correlation and odds ratios. Sensitivity and specificity were calculated for two thresholds of the CELF-P2 using the diagnostic category on the child’s subsequent CELF-4. OUTCOMES & RESULTS:For all CELF scores, mean differences for the cohort between 4 and 5 years were within 1.5 scale score units. In contrast, at the individual level variability was found across the range of scores and was of a greater magnitude than previously reported. Stability in LI classification was low, with 36% of 5-year-olds with LI (defined as a standard score below –1.25) classified as typical at 4 years, even though odds ratios calculated from classifications at the two time points suggested that 4-year-olds with LI had 23 times greater odds than their typical peers to receive a diagnosis of LI at 5 years. The CELF-P2 did not demonstrate adequate levels of diagnostic accuracy for LI at 5 years: sensitivity of 64% and specificity of 92.9%. CONCLUSIONS:Substantial variability across the entire range of possible CELF scores was observed in this community cohort between the ages of 4 and 5 years. The stability of LI classification was lower than that reported in previous research conducted primarily on smaller clinical cohorts. The current study’s results suggest that the variability observed in developmental language pathways is the result of a combination of limitations in measurement instruments, individual children’s abilities and the arbitrary nature of the boundaries defining LI.