The identification of comorbidity risk given disease history via disease-disease network: an application to pre-eclamptic women in the UK Biobank
Vivek Sriram is a second year PhD student in the Graduate Group in Genomics and Computational Biology (GCB) and a member of Dr. Dokyoon Kim's lab for Integrative Omics and Biomedical Informatics. His research interests include translational bioinformatics and personalized medicine, network analysis, deep learning and interpretable machine learning, data visualization, and the genomics of human disease.
Pre-eclampsia, a hypertensive disease that occurs during pregnancy, can lead to exacerbated health outcomes and increased comorbid risk. A disease-disease network (DDN), a graph where nodes represent phenotypes and edges represent SNPs shared between phenotypes, can help visualize the genetic relationships across diseases. By applying graph-based semi-supervised learning (GBSSL), a machine learning approach for signal propagation according to the topology of a network, we hope to identify novel comorbidity correlations and rank phenotypes according to their genetic similarity to source diseases.
We constructed a SNP-based DDN from UK Biobank (UKBB) PheWAS summary data, which included roughly 1400 phenotypes for 28 million imputed variants. We then assigned “+1” labels to pre-eclampsia phenotypes and “0” labels to all other phenotypes to establish source nodes for GBSSL. Our method identified several known comorbidities of pre-eclampsia, including placenta previa, abruptio placentae, and hemorrhage during pregnancy. We also found diseases that have not been clearly demonstrated to be associated with pre-eclampsia, such as subarachnoid hemorrhage and nonspecific abnormal findings on examination of biliary tract.
In order to evaluate our results, we used clinical data alone from the UKBB electronic health records to compare occurrences of diseases in pre-eclamptic patients to occurrences in controls. We determined that our GBSSL method, which considers the full network structure, had an accuracy of 96.11% compared to 92.63% when considering just direct genetic associations. This current result suggests that our methodology holds promise as a clinical tool for the identification of disease risk given prior disease history and genetic background.
Keywordspre-eclampsia; disease-disease network; PheWAS; comorbidity; risk scoring; graph-based semi-supervised learning
Commenting is now closed.
To understand health and disease today, we need new thinking and novel science —the kind we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics.