Multiple imputation methods for handling missing values in longitudinal studies with sampling weights: Comparison of methods implemented in Stata Academic Article uri icon


  • Many analyses of longitudinal cohorts require incorporating sampling weights to account for unequal sampling probabilities of participants, as well as the use of multiple imputation (MI) for dealing with missing data. However, there is no guidance on how MI and sampling weights should be implemented together. We simulated a target population based on the Australian Bureau of Statistics Estimated Resident Population and drew 1000 random samples dependent on three design variables to mimic the Longitudinal Study of Australian Children. The target analysis was the weighted prevalence of overweight/obesity over childhood. We evaluated the performance of several MI approaches available in Stata, based on multivariate normal imputation (MVNI), fully conditional specification (FCS) and twofold FCS: a weighted imputation model, imputing missing data separately for each quintile sampling weight grouping, including the design stratum indicator in the imputation model, and using sampling weights as a covariate in the imputation model. Approaches based on available cases and inverse probability weighting (IPW), with time-varying weights, were also compared. We observed severe issues of convergence with FCS and twofold FCS. All MVNI-based approaches performed similarly, producing minimal bias and nominal coverage, except for when imputation was conducted separately for each quintile sampling weight group. IPW performed equally as well as MVNI-based approaches in terms of bias, however, was less precise. In similar longitudinal studies, we recommend using MVNI with the design stratum as a covariate in the imputation model. If this is unknown, including the sampling weight as a covariate is an appropriate alternative.

publication date

  • 2020