Eun Jeong Oh
Risk Prediction for Partially Heterogeneous Subgroups via Fusion
Accurate risk modeling using electronic health record (EHR) data is challenging partly because of variation in baseline risk and risk predictors across patient subgroups. Such risk heterogeneity, if left unrecognized, can lead to unfair prediction with compromised accuracy. While this challenge can be overcome by developing separate models across subgroups, the data for many subgroups is usually not sufficiently rich in reality. Recognizing that subgroups may share some common predictors, we propose a partially heterogeneous model that includes predictors that is common to all groups while allowing type- specific prognostic factors. The model is fitted by extending a fusion technique that encourages similarities among group-specific parameters of the common predictor while selecting group-specific prognostic factors from the high-dimensional EHR variables. We derive the upper bounds on the error measured in ℓ2-norm regarding local optima of the estimators. Results from extensive simulation studies show that our method greatly improves model calibration across subgroups and accurately identifies subgroup specific risk predictors. The proposed method is applied to predict short-term risk of mortality using structured data extracted from EHRs for oncology patients in the University of Pennsylvania Health System.
KeywordsRisk prediction, Heterogeneity, Fused lasso, Electronic health records, Variable selection
To understand health and disease today, we need new thinking and novel science —the kind we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics. LEARN MORE ABOUT US