Analysis of incidence and prognosis from 'extreme' case-control designs Academic Article uri icon


  • The significant investment in measuring biomarkers has prompted investigators to improve cost-efficiency by sub-sampling in non-standard study designs. For example, investigators studying prognosis may assume that any differences in biomarkers are likely to be most apparent in an extreme sample of the earliest deaths and the longest-surviving controls. Simple logistic regression analysis of such data does not exploit the information available in the survival time, and statistical methods that model the sampling scheme may be more efficient. We derive likelihood equations that reflect the complex sampling scheme in unmatched and matched 'extreme' case-control designs. We investigated the performance and power of the method in simulation experiments, with a range of underlying hazard ratios and study sizes. Our proposed method resulted in hazard ratio estimates close to those obtained from the full cohort. The standard error estimates also performed well when compared with the empirical variance. In an application to a study investigating markers for lethal prostate cancer, an extreme case-control sample of lethal cases and the longest-surviving controls provided estimates of the effect of Gleason score in close agreement with analysis of all the data. By using the information in the sampling design, our method enables efficient and valid estimation of the underlying hazard ratio from a study design that is intuitive and easily implemented.


publication date

  • 2014