DeepADEMiner: A Deep Learning Pharmacovigilance Pipeline for Extraction and Normalization of Adverse Drug Effect Mentions on Twitter
Arjun Magge is a researcher at the Center for Health Language Processing in the Department of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine, University of Pennsylvania. He is an expert in building information extraction and natural language processing pipelines using optimized deep learning architectures for applications in public health. He graduated with a Masters in Science in Computer Science in 2016 and a PhD in Biomedical Informatics in 2019 from Arizona State University.
Objective: Research on pharmacovigilance from social media data has focused on mining adverse drug effects (ADEs) using annotated datasets, with publications generally focusing on one of three tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs.
Materials and Methods: We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average `natural balance' with ADEs present in about 7% of the Tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all three tasks.
Results: The system presented achieved state-of-the-art performance on comparable datasets and scored a classification performance of F1 = 0.63, span extraction performance of F1 = 0.44 and an end-to-end entity resolution performance of F1 = 0.34 on the presented dataset.
Discussion: The performance of the models continues to highlight multiple challenges when deploying pharmacovigilance systems that use social media data. We discuss the implications of such models in the downstream tasks of signal detection and suggest future enhancements.
Conclusion: Mining ADEs from Twitter posts using a pipeline architecture requires the different components to be trained and tuned based on input data imbalance in order to ensure optimal performance on the end-to-end resolution task.
KeywordsSocial Media Mining, Natural Language Processing, Information Extraction, Pharmacovigilance, Drug Safety
Commenting is now closed.
To understand health and disease today, we need new thinking and novel science —the kind we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics. LEARN MORE ABOUT US