Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study Academic Article uri icon

abstract

  • BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data, with Markov chain Monte Carlo assuming multivariate normality (MVN) a commonly used approach. Imputing categorical variables (which are clearly non-normal) using MVN imputation is challenging, and several approaches have been suggested. However, it remains unclear which approach should be preferred. METHODS: We explore methods for imputing ordinal variables using MVN imputation, including imputing as a continuous variable and as a set of indicators, and various methods for assigning imputed values to the possible categories (rounding), for estimating a non-linear association between an ordinal exposure and binary outcome. We introduce a new approach where we impute as continuous and assign imputed values into categories based on the mean indicators imputed in a separate round of imputation. We compare these approaches in a simple setting where we make 50% of data in an ordinal exposure missing completely at random, within an otherwise complete real dataset. RESULTS: Methods that impute the ordinal exposure as continuous distorted the non-linear exposure-outcome association by biasing the relationship towards linearity irrespective of the rounding method. In contrast, imputing using indicators preserved the non-linear association but not the marginal distribution of the ordinal variable. CONCLUSIONS: Imputing ordinal variables as continuous can bias the estimation of the exposure-outcome association in the presence of non-linear relationships. Further work is needed to develop optimal methods for handling ordinal (and nominal) variables when using MVN imputation.

publication date

  • 2012