Introduction

Dubé E, Gagnon D, Nickels E, Jeram S, Schuster M. Mapping vaccine hesitancy-Country-specific characteris- tics of a global phenomenon. Vaccine.

Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy

Michael C. Smith

M.S.E

Mark Dredze

Ph.D.

Sandra Crouse Quinn

Ph.D.

David A. Broniatowski

Ph.D.

Literature Review

0 The George Washington University , Washington, DC , USA 1 The Johns Hopkins University 2 The University of Maryland , College Park, Maryland , USA

2014

32 49 121 128

Social media provide the potential to keep up with public discussions more quickly, at lower cost, and at potentially higher granularity and scope than do traditional surveys9. This paper details a preliminary system of real-time geographical monitoring and analysis using the context of the vaccine-hesitancy discussion across the United States, a valuable backdrop for such a system because of the diverse and impactful nature of the vaccination discussions as they appear, change, and influence the public12,20. We combine various methods in machine learning to geolocate, categorize, and classify vaccination discussions on Twitter. As a proof of concept, we show analyses with a prominent anti-vaccine discussion that validate the system with results from traditional surveys, yet also provide valuable spatial statistical power on top of such surveys on maps of the United States. We detail limitations and future work, yet still conclude that the system and the answers it enables are important because they will allow for more targeted and effective communication and reaction to the discussion as a first step towards monitoring people's views.

Introduction

To validate the system one can look at ground truth of how people make decisions in similar contexts; Quinn and colleagues have produced a body of work with such aims13,24,25,27. Overall, they showed via surveys that such factors as public trust, demographics, risk perception, and social norms influence vaccine decision making25,27. For example, in a qualitative study, Quinn et al. showed that dimensions of public trust affect medical decisions in a study about postal workers’ reactions during the 2001 anthrax attacks27. These “attitudinal and experience variables [and] demographic characteristics”13 provide insight into how rationales about vaccination decisions may vary. They provide a means of validating the system, a starting point for exploring the spatial and sociodemographic variability in vaccination decisions, and an opportunity to confirm that hypotheses established in limited survey environments hold in wider contexts. For example, is there a significant pocket of people in a certain area who are hesitant to vaccinate because they do not trust the government? The monitoring system able to broadly, cheaply, and quickly test such survey and spatial hypotheses in this context is the novel contribution of this work. Specifically, it is A) a processing system for classifying messages and their sentiment that B) integrates existing analyses for topic and location and C) provides an extensible framework for statistically testing spatial hypotheses about vaccine hesitancy given the generated messages and metadata on social media. This system involves using targeted methodologies and leverages theoretical advantages from using social media data in concert with survey data. How might such a system shed light on vaccine-hesitancy discussions across the USA, what are its limitations, and how could it be used as a first stepping stone to augment survey methods?

Methods Methods – Data

Given our context of vaccination discussion, our approach to the system is the following combination of natural language processing and geospatial techniques: we collect and classify social media posts on Twitter related to vaccination; then categorize these posts by their sentiment, location and topic; then interpret the topics related to vaccine refusal and hesitancy; then spatially join and aggregate the results. This process enables evaluation of our survey hypotheses using the spatial topic clusters, as well as spatial examination of new discussions as it can be re-run over time. What follows are descriptions of each of the system's sub-processes.

The data related to vaccines for our context, also described briefly by Dredze et al.9, are Twitter posts (tweets) from the USA that we began collecting around the aforementioned measles outbreak in Disneyland. The system collects the data and classifies for relevance and sentiment. Initially we flagged tweets by keyword using the Twitter Streaming APIi, specifying more than fifty hand-chosen keywords such as 'vaccine', 'shot', and 'immunization' similar to and validated by common practice2,6,31.

Methods – Relevance and Sentiment

We trained supervised machine learning algorithms that, as part of the system, automatically classify tweets for relevance to vaccination and for sentiment, as sentiment analysis produces a measure of the expressed opinion in messages21. We obtained labeled training data using Amazon Mechanical Turkii on randomly chosen subsets of our tweets to 1) tag them as being relevant to the topic of vaccines or not; 2) of those relevant, randomly choose and tag as having sentiment toward vaccines (neutral or non-neutral); 3) and of those that bear sentiment, randomly choose and tag as having positive or negative sentiment. While training the classifiers we conducted cross-validation and maximized i https://dev.twitter.com/streaming/overview ii https://www.mturk.com/mturk/welcome precision and recall given tunable parameters.iii See Dredze et al., for further details on these classifiers.9 Given these classifiers and the ability to run them over any tweets (our dataset and those incoming in real-time), we thus have the first part of the system, namely a real-time categorization of vaccine-related Twitter posts: those relevant to vaccines; of those relevant, those that bear sentiment; and of those that bear sentiment, their sentiment polarity.

Methods – Location Classification

The next part of the system involves location classification. We use the Carmen Geolocation Toolkit10 to automatically classify a tweet's location; such geolocation has been shown to be appropriate and effective in other public health studies10. Carmen improves upon information provided by the Streaming API, and returns location information at the country, state, county, and latitude/longitude levels if able.

Methods – Topic Classification

The system uses topic modeling to determine the content discussed in our relevant tweets. Latent Dirichlet Allocation (LDA) is a commonly-used machine learning algorithm that automatically determines topics in collections of text1, common practice to automatically extract patterns and groups in text collections, of which social media data is a prime example. LDA assumes words in documents co-locate near other words (possibly across documents) because they are related, and the algorithm collects and reports groups of such related words, with the groups representing topics. Using the MAchine Learning for LanguagE Toolkit (MALLET)18, the system involves running LDA over our tweet dataset (the documents labeled as relevant to vaccination) to evaluate topics relevant to vaccine hesitancy. This produces an overall list of topics, and a parameterization of each tweet by topic (which is roughly proportional to relative composition by topic). Note that LDA is unsupervised; in general there is no guarantee that the algorithm will return a specific topic, and it is up to the analyst to determine topics' relevance and substance by analyzing the words and groups returned3. We leverage public health researchers' domain expertise to make such determinations. We note that in our context, LDA will show relevant topics because we have collected and categorized the tweets to fit a specific meta-topic (that of vaccines). By contrast, the substance of the relevant topics will be outputs of the system enabling hypothesis testing of our ground truth factors and exploration beyond.

Methods – Joins and Aggregations

The system enables nonspatial aggregation and analysis on the tweets by sentiment and topic. More central to this paper, however: the tweets also have location data, which one may spatially join and aggregate using ArcMap (version 10.3), part of ArcGIS. ArcGIS is a geographic information system software that can generate maps of aggregated data and can calculate and display spatial statistics on those maps. One such statistic is the Getis-Ord Gi* statistic for hotspot analysis14, valuable to the system because it indicates statistically significant high (low) point data if a point and its neighbors are high (low) in terms of some common variable.iv Using these maps and statistics, one may spatially analyze where vaccine tweets (our point data) occur, where sentiment occurs, and where topics occur (our common variables), with notions of how often they occur and whether statistically significant differences exist. Accordingly, the system provides a geographic result to accompany our topic-substance result concerning the survey results.

Results Results – Topics

Running the topic model over all tweets in our dataset, we obtained information about topics and their distribution over our tweets. The system also produces classification results for each tweet in terms of its relevance, sentiment, and location. We may use a tweet's ID (e.g. “532385146419560448”) to link its sentiment, location, and topic distribution. Given locations of relevant messages, we may filter by classification category and weight by topic distribution to find hotspots for discussion of a given discussion. iii Relevance classifier (recall .91, precision .96); if relevant, whether contains sentiment about vaccines (recall .28, precision .63); if contains sentiment, is it positive vs negative (recall .85, precision .75). We chose to maximize precision in the second case because we were relying on the precision of our results in the positive/negative classifier. Such low recall is not an issue given the size of our dataset. iv The definition of ‘neighbor’ is variable; what is appropriate depends highly on the input data. Some of many possible options for our topic proportions and tweet data are k-nearest-neighbors (weighting influence such that all points have k neighbors) or weighting influence based on inverse Euclidian. We chose the former due to ease of interpretation and calculation. Specifically, the topic information consists of a relative weighting parameter for each topic for each tweet (roughly proportional to the proportion of each topic in the tweet), so one can get messages most representative of each topic. We ran the topic model on all messages, filtered by the regular expression *vacc* to prune irrelevant / noisy topics in advance, and qualitatively interpreted the topics. Needing to specify the number of topics, we chose 50 to capture enough variability in our large dataset. As a proof of concept, we considered topic 46 in our further analyses. Topic 46 pertains to the California government's bill eliminating exemptions from vaccinations in schoolchildren. Below are example messages from this topic; our domain experts who performed identification and validation looked at both the tokens in the topic and representative messages when doing so, as is good practice5. • “california governor signs strict school vaccine legislation gov jerry brown signs california bill imposing...“ • “jim carrey brands governor 'fascist' over vaccine law jim carrey called california gov jerry brown“ • “ahf criticizes dumb amp dumber star jim carrey for calling gov brown a fascist“ • “calif gov jerry brown launching frosted mercury flakes children's cereal to accompany vaccine mandate“ We chose this topic for two reasons: it is an arguably prominent anti-vaccination discussion in our data, and it is pertinent to a hypothesis validated by Quinn's previous work that “public trust / trust in government” affects such attitudes about medical decisions as vaccination, a common thread for validation. The analysis steps are the same regardless of topic chosen.

Results – Hotspots for Topics

To identify hotbeds of these vaccine hesitancy discussions, we used the “Hot Spot Analysis” tool in ArcMap, which calculates the Gi* statistic. We continued the proof of concept by considering only the contiguous United States, but the analysis is identical using different geographical boundaries (e.g. an individual state or a different country). We also limited our hot spot analysis only to the tweet messages classified as having negative sentiment about vaccines since our chosen topic was 46. As the definition of a neighborhood may vary depending on input data, we chose to spatially weight our input data via the k-nearest-neighbor (KNN) algorithm (using the default value of 8 neighbors suggested by ArcMap) to elegantly allow for such variations. This yielded the following map. The hot-spot map of topic 46 shows statistically significant areas in the contiguous USA where the highest proportion of discussion of topic 46 is occurring in negative-sentiment vaccine messages on Twitter. For example, topic 46 is often discussed near LA and in the northern Appalachian region, among other areas. Such maps may be created for any permutation of classification and topic, and would yield any statistically significant results to be found among the spatial data for each permutation. Note that this statistic does not merely highlight points that contain a lot of messages, but highlights points with statistically significant differences of message totals compared to neighboring points. Such significant results would (and do in the case of topic 46) yield convergent findings with survey data. Future work will more rigorously relate and apply this mixed methods approach.

Discussion

The results outlined above yielded statistically significant geographic hot and cold spots in terms of individual topics in negative-sentiment vaccine messages on Twitter as a proof of concept. Such hotspots in a topic correspond to a discussion being statistically prevalent, and more prevalent in certain areas than others. That discussions pertinent to the trust in government results from Quinn's surveys (topic 46) are statistically significant in the first place both validates our approach and supports Quinn's findings on a larger scale. The fact that no significant cold spots are found among the topic 46 negative-sentiment map also validates our approach, as one would expect only hotspots in such topics pertaining to anti-vaccine discussions. This proof of concept showed that social media contains valuable information that is more granular and available more cheaply and quickly than through traditional survey methods. With further refinement, this information may be leveraged to replicate and compare with survey results. In addition, such hot spot information is immediately actionable from a public health perspective, a valuable quality in the context of vaccine hesitancy. For example, one may target messages towards public policy think tanks in Arkansas to foster a more balanced approach to the debate about the government mandates on vaccination. Identification of such geolocated issues is valuable to public health officials as it provides low hanging fruit to address if interventions are known. For example, officials might value being able to reach all of Arkansas in a messaging campaign by only messaging Little Rock (if that were the only hotspot). The other side of the coin is also valuable, however, as evidence-based interventions may not yet exist. Officials may have been unaware of a specific geographic area and its opinions on a sub-issue of vaccine hesitancy, as hesitancy itself has been shown to vary across regions and within countries without a successful strategy.11 Thus the system’s analysis of its real-time sentiment-topic data allowed us to identify individual discussions from the aggregate meta-topic, suggested the ability to verify survey hypotheses relating to those discussions, and suggested spatial targets for more effective use of public health resources. With expertise in both content and data analysis to fully understand and leverage the social media data, the system provides a promising opportunity to monitor real-time views.

Discussion – Limitations

However, this system and its underlying approach may be improved. For example, the ability of Carmen10 to augment location information could be increased such that it identifies information at a more granular level in more messages. This would affect the geospatial hotspot analysis, as one could improve results by grouping by levels of granularity with more and better location information. In addition, this proof of concept topic analysis returned 50 topics, but sensitivity analysis on this number as up- or downsizing could reduce noise. Thirdly, the open debate of social media analysis applies as well: whether social media discussions are a valid and accurate proxy for the rationales of the population at large. This applies both in terms of users' demographics (see below) and in terms of the potential for fake users, which recent research may be used to filter4. Fourthly, one should be cognizant of the (limited but nonzero) amount of technical supervision required: the system requires computational capacity and server administration, and it requires creating machine learning classifiers.9

Discussion – Future Work

An additional limitation is that the topic models in LDA are subjective; there are alternative models and means of interpretation associated with them that could be employed. Paul and Dredze created an elegant framework for supervised topic models17,23, which could be adapted to our system, that would return topics seeded by specific a priori values (i.e., those in Quinn's survey results). Such seeded topics would remove subjectivity of topic interpretation, quantifiably associating topics with pre-determined results. Secondly, LDA is merely a long-running industry standard; an alternative is Linguistic Inquiry and Word Count (LIWC)33. In contrast to LDA which returns words that are co-located, LIWC counts psychologically relevant words into categories, producing output along dimensions such as “negative emotion words” or “tentative language”. These categories and their relative frequencies paint a picture of how the word user(s) consider their subject matter, in this case discussions about vaccines. Using LIWC would provide an alternative viewpoint that may be more easily interpreted using the framework of psychology. Another aim of future work involves more explicit relations to traditional survey methods. One immediate improvement would be to aggregate tweets by user, which will enable user demographic classification7,30 and other user-level statistics such as comparisons to known outbreaks of disease or to news coverage. With this information, and analysis related to retweets and news mentions, one might operationalize survey questions to individuals as different slices of our dataset, which for example would allow exploring and validating if demographics are related to one's rationales and opinions, especially those opinions relating to trust in government, as previous work has suggested25,26. Aggregating information by user would also allow the system to further the question of whether social media may be used as a proxy for the population at large, both in terms of demographics and in terms of coverage of topic discussion. The representativeness of social media users is an open question, whether relating to pro- or anti-vaccination communities or to the population as a whole. The analyses in this paper combined with demographic classification would allow us to determine how representative our social media users are of our target population(s).

Conclusions

Given the problem of tracking and understanding discussion in a population and the context of vaccine hesitancy, we have as a first step created a pipeline of natural language processing and geospatial techniques that enable real-time statistical analysis of different discussions in a population across space. This system showed statistically significant spatial hotspots of discussion in the USA that provide actionable insights for the time-sensitive context. Given the financial and computational ease of gathering and processing swaths of social media data, this system can be used to monitor real-time views, and, easily extensible, suggests the ability to verify traditional survey methods in broader spatial contexts.

Acknowledgements

Thank you to Amelia Jamison for her helpful feedback and topic analysis.

Dr. Dredze has received consulting fees from Directing Medicine LLC and Sickweather LLC, who use social media for public health surveillance.

1. Blei

, Ng

, Jordan

. Latent Dirichlet Allocation . J Mach Learn Res . 2003 Mar; 3 : 993 - 1022 .

2. Broniatowski

, Paul

, Dredze

National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic . PLOS ONE . 2013 Dec 9 ; 8 ( 12 ): e83672 .

3 . Chang

, Boyd-Graber

, Gerrish

, Wang

, Blei

. Reading tea leaves: How humans interpret topic models . In: Nips [Internet] . 2009 [cited 2017 Mar 8 ]. p. 1 - 9 . Available from: https://papers.nips.cc/paper/3700-readingtea -leaves-how-humans-interpret-topic-models .pdf

4. Cheng

, Danescu-Niculescu-Mizil

, Leskovec J. Antisocial Behavior in Online Discussion Communities . arXiv: 150400680 [cs, stat] [Internet]. 2015 Apr 2 [cited 2016 Dec 8 ]; Available from: http://arxiv.org/abs/1504.00680

5. Chuang

, Manning

, Heer

. Termite: Visualization Techniques for Assessing Textual Topic Models . In: Proceedings of the International Working Conference on Advanced Visual Interfaces [Internet] . New York, NY, USA: ACM; 2012 [ cited 2017 Mar 8] . p. 74 - 77 . (AVI '12). Available from: http://doi.acm. org/10 .1145/2254556.2254572

6. Conover

, Goncalves

, Ratkiewicz

, Flammini

, Menczer

. Predicting the Political Alignment of Twitter Users . In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom) . 2011 . p. 192 - 9 .

7. Culotta

, Ravi and NK , Cutler J . Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data . Journal of Artificial Intelligence Research . 2016 ; 55 : 389 - 408 .

8. Culotta

. Towards Detecting Influenza Epidemics by Analyzing Twitter Messages . In: Proceedings of the First Workshop on Social Media Analytics [Internet] . New York, NY, USA: ACM; 2010 [cited 2016 Mar 10 ]. p. 115 - 122 . (SOMA '10). Available from: http://doi.acm. org/10 .1145/1964858.1964874

9. Dredze

, Broniatowski

, Smith

, Hilyard

. Understanding Vaccine Refusal: Why We Need Social Media Now . American Journal of Preventive Medicine . 2016 Apr; 50 ( 4 ): 550 - 2 .

10. Dredze

, Paul

, Bergsma

, Tran

Carmen : A twitter geolocation system with applications to public health . In: AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI) . Citeseer; 2013 . p. 20 - 24 .

11. Dubé

, Gagnon

, MacDonald NE. Strategies intended to address vaccine hesitancy: Review of published reviews . Vaccine. 2015 Aug 14 ; 33 ( 34 ): 4191 - 203 .