Semiparametric Maximum Likelihood Estimation with Two-phase Stratified Case-control Sampling
Postdoctoral researcher of Biostatistics
Department of Biostatistics, Epidemiology and Informatics
University of Pennsylvania Perelman School of Medicine
We study statistical inference methods for fitting logistic regression models to data arising from the two-phase stratified case-control sampling design, where a subset of covariates are available only for a portion of cases and controls who are selected based on the case-control status and fully collected covariates. We are additionally interested in characterizing the distribution of incomplete covariates conditional on fully observed ones. It is desirable to include all subjects in the analysis to achieve consistency of parameter estimation and optimal statistical efficiency. We develop a semiparametric maximum likelihood approach under the rare disease assumption, where parameter estimates are obtained through a novel reparametrized profile likelihood technique. We study the large sample distribution theory for the proposed estimator, and demonstrate through simulation studies that it performs well in finite samples and has improved statistical efficiency compared with existing approaches. We apply the proposed method to analyze a stratified case-control study of breast cancer nested within the Breast Cancer Detection and Demonstration Project, where one breast cancer risk predictor, percent mammographic density, was measured only for a subset of study women.
KeywordsLogistic regression model; Profile likelihood; Semiparametric maximum likelihood; Stratified case-control study; Two-phase sampling.
Commenting is now closed.
To understand health and disease today, we need new thinking and novel science —the kind we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics.