Construction of scalable and robust relationships across multiple phenotypes from biobank-scaled PheWAS data
Motivation: The biobank-scaled phenome-wide association studies (PheWAS) offer the associations between many phenotypes and common genetic variants. The disease-disease network (DDN) can intuitively observe the relationships across multiple phenotypes from PheWAS summary data. Unlike the genome-wide association study that analyzes associated specific genetic variations with a particular disease, the constructed DDN from PheWAS shows different associations between diseases depending on the significance threshold. Thus, we developed a novel method to increase the significance of associations and decrease uncertainty of associations simultaneously while constructing DDN according to the arbitrary selection of significance levels.
Results: We constructed the enhanced disease-disease network (eDDN) among 421 phenotypes using UK Biobank PheWAS summary data. We developed a novel method as variant frequency-inverse phenotype frequency, which weight disease-SNP associations by incorporating the phenome-wide SNP importance and significances to enhance associations across multiple diseases. In order to show the utility of eDDN and validate the improvement of disease-disease associations, we applied graph-based semi-supervised learning to eDDN for obtaining predicted scores of co-occurrence diseases. Ground truths for co-occurrences were generated from UKBB inpatient data. Comparing the conventional DDNs built by limiting a certain significance level, the eDDN showed outperformed performance in terms of AUC. To show the utility of eDDN in which to transfer the topology of the network to clinical significance, we verified the predicted co-occurrence diseases from eDDN when myocardial infarction was given to index disease of interest. The further investigations were validated by external co-occurrence information generated from Penn medicine biobank.
KeywordsPheWAS, Disease-Disease Network, Co-occurrence disease, Graph-based semi-supervised learning
To understand health and disease today, we need new thinking and novel science —the kind we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics.