Yaqi Cao

Semiparametric Maximum Likelihood Estimation with Two-phase Stratified Case-control Sampling

Thumbnail of Poster PDF
Click to View


Photo of Yaqi Cao
Yaqi Cao, Biostatistics

Postdoctoral researcher of Biostatistics
Department of Biostatistics, Epidemiology and Informatics
University of Pennsylvania Perelman School of Medicine


Y Cao1, L Chen2, Y Yang3, J Chen2

  1. University of Pennsylvania; Tsinghua University
  2. University of Pennsylvania
  3. Tsinghua University


We study statistical inference methods for fitting logistic regression models to data arising from the two-phase stratified case-control sampling design, where a subset of covariates are available only for a portion of cases and controls who are selected based on the case-control status and fully collected covariates. We are additionally interested in characterizing the distribution of incomplete covariates conditional on fully observed ones. It is desirable to include all subjects in the analysis to achieve consistency of parameter estimation and optimal statistical efficiency. We develop a semiparametric maximum likelihood approach under the rare disease assumption, where parameter estimates are obtained through a novel reparametrized profile likelihood technique. We study the large sample distribution theory for the proposed estimator, and demonstrate through simulation studies that it performs well in finite samples and has improved statistical efficiency compared with existing approaches. We apply the proposed method to analyze a stratified case-control study of breast cancer nested within the Breast Cancer Detection and Demonstration Project, where one breast cancer risk predictor, percent mammographic density, was measured only for a subset of study women.


Logistic regression model; Profile likelihood; Semiparametric maximum likelihood; Stratified case-control study; Two-phase sampling.

Commenting is now closed.

About Us

To understand health and disease today, we need new thinking and novel science —the kind  we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics. 

© 2023 Trustees of the University of Pennsylvania. All rights reserved.. | Disclaimer

Follow Us