Bryan Blette

Assessing Treatment Effect Heterogeneity in the Presence of Missing Effect Modifier Data in Cluster-Randomized Trials

Thumbnail of Poster PDF
Click to View


Photo of

Bryan Blette is a Postdoctoral Fellow at the Center for Causal Inference and DBEI. He completed his Ph.D. in Biostatistics from UNC - Chapel Hill in 2021. His main methodological interests are in causal inference, measurement error, and Bayesian methods.


B Blette1, F Li2, M Harhay1

  1. University of Pennsylvania
  2. Yale University


Understanding whether and how treatment effects vary across individuals is crucial to inform clinical practice and recommendations. Accordingly, the assessment of heterogeneous treatment effects (HTE) based on pre-specified potential effect modifiers has become a common goal in modern randomized trials. However, when one or more potential effect modifiers are missing, complete-case analysis may lead to bias, under-coverage, inflated type I error, or low power. While statistical methods for handling missing data have been proposed and compared for individually randomized trials with missing effect modifiers, few guidelines exist for the cluster-randomized setting, where intracluster correlations in the effect modifiers, outcomes, or even missingness mechanisms may introduce further threats to accurate assessment of HTE. In this paper, the performance of various missing data methods are neutrally compared in a simulation study of cluster-randomized trials with missing effect modifier data, and a Bayesian multilevel multiple imputation approach is proposed and evaluated. Thereafter, we impose controlled missing data scenarios to potential effect modifiers from the Work, Family, and Health Study to illustrate the proposed missing data method.


Cluster Randomized Trials, Effect Modification, Heterogeneous Treatment Effects, Missing Data, Multilevel Data, Multiple Imputation


Very nice presentation, Bryan.
What was the sample size and/or average cluster size in your simulations? Can you provide intuition on why the coverage decreases with increasing number of clusters in the mis-specified models?

Thanks Knashawn. Each cluster had a sample size simulated as a random Poisson variable with mean 50. I think the coverage decreases as the number of clusters increases simply because there was stable (greater than 0) bias for those misspecified models. I didn't have enough space to put bias results in the poster, but if say the relative bias was 50% for 20, 50, and 100 clusters, then as the number of clusters increases, the standard error decreases and the confidence intervals become more narrow around the biased estimate, reducing empirical coverage. Thanks!

About Us

To understand health and disease today, we need new thinking and novel science —the kind  we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics. 

© 2023 Trustees of the University of Pennsylvania. All rights reserved.. | Disclaimer

Follow Us