Automatically Identifying Twitter Users for PrEP-Related Interventions
Ari Z. Klein, PhD is a Staff Scientist in the Health Language Processing Center, in the Division of Informatics.
Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of Human Immunodeficiency Virus (HIV). There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and those who are prescribed PrEP. While Twitter has been analyzed as a source of PrEP-related data (e.g., barriers), methods have not been developed to enable the use of Twitter as a platform for implementing interventions. The objectives of this study were to (1) develop an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or other men who have sex with men (MSM)—the population most affected by HIV—and (2) assess the extent to which they demographically represent MSM with HIV new diagnoses. Between September 2020 and January 2021, our pipeline identified more than 10,000 users, with a precision of 0.85. Based on validated NLP tools, the majority of the users identified by our pipeline are in the top 10 states with new HIV diagnoses, in counties or states considered priority jurisdictions by the Ending the HIV Epidemic initiative, and in the same two age groups as the majority of MSM with new HIV diagnoses. Therefore, our pipeline can be used to identify MSM in the United States who may be at risk for acquiring HIV, laying the groundwork for using Twitter on a large scale to target PrEP-related interventions directly at this population.
Keywordsnatural language processing; social media; data mining; PrEP; HIV; AIDS
To understand health and disease today, we need new thinking and novel science —the kind we create when multiple disciplines work together from the ground up. That is why this department has put forward a bold vision in population-health science: a single academic home for biostatistics, epidemiology and informatics. LEARN MORE ABOUT US