A maximum likelihood method for secondary analysis of nested case-control data Academic Article uri icon


  • Many epidemiological studies use a nested case-control (NCC) design to reduce cost while maintaining study power. Because NCC sampling is conditional on the primary outcome, routine application of logistic regression to analyze a secondary outcome will generally be biased. Recently, many studies have proposed several methods to obtain unbiased estimates of risk for a secondary outcome from NCC data. Two common features of all current methods requires that the times of onset of the secondary outcome are known for cohort members not selected into the NCC study and the hazards of the two outcomes are conditionally independent given the available covariates. This last assumption will not be plausible when the individual frailty of study subjects is not captured by the measured covariates. We provide a maximum-likelihood method that explicitly models the individual frailties and also avoids the need to have access to the full cohort data. We derive the likelihood contribution by respecting the original sampling procedure with respect to the primary outcome. We use proportional hazard models for the individual hazards, and Clayton's copula is used to model additional dependence between primary and secondary outcomes beyond that explained by the measured risk factors. We show that the proposed method is more efficient than weighted likelihood and is unbiased in the presence of shared frailty for the primary and secondary outcome. We illustrate the method with an application to a study of risk factors for diabetes in a Swedish cohort.

publication date

  • 2014