Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy

Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy M.S.EMichaelCSmith The George Washington University

Washington DC USA

Ph.DMarkDredze The Johns Hopkins Univer-sity Ph.DSandraCrouse Quinn The University of Maryland

College Park Maryland USA

Ph.DDavidABroniatowski The George Washington University

Washington DC USA

Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy E9BE5DD66068200EE97D1129B41A1A89 GROBID - A machine learning software for extracting information from scholarly documents

Social media provide the potential to keep up with public discussions more quickly, at lower cost, and at potentially higher granularity and scope than do traditional surveys 9 . This paper details a preliminary system of real-time geographical monitoring and analysis using the context of the vaccine-hesitancy discussion across the United States, a valuable backdrop for such a system because of the diverse and impactful nature of the vaccination discussions as they appear, change, and influence the public 12,20 . We combine various methods in machine learning to geolocate, categorize, and classify vaccination discussions on Twitter. As a proof of concept, we show analyses with a prominent anti-vaccine discussion that validate the system with results from traditional surveys, yet also provide valuable spatial statistical power on top of such surveys on maps of the United States. We detail limitations and future work, yet still conclude that the system and the answers it enables are important because they will allow for more targeted and effective communication and reaction to the discussion as a first step towards monitoring people's views.

Introduction

Achieving herd immunity is a critical element of a vaccine's global effectiveness because it is vital to limiting transmission and reducing incidence. In 2007, most vaccine-preventable diseases were at an all-time low incidence 28 , but in recent years the spread of vaccine-preventable diseases has increased. For example, there were outbreaks of measles in Disneyland in 2014 15 , of pertussis in 2014 34 , and others. This urgent rise in such diseases has highlighted recent increases in vaccine hesitancy as prominent and controversial issues seen as potential threats to herd immunity 20 .

Reasons for this vaccine hesitancy, defined by the WHO as "refer [ring] to delay in acceptance or refusal of vaccines despite availability of vaccination services...complex and context specific varying across time, place and vaccines...includ [ing] factors such as complacency, convenience and confidence" 16 , are varied. Not only are there myriad drivers of these decisions that vary spatially 12 , but no "strategies intended to address vaccine hesitancy", were universally effective, if effective at all 11 . Because of the potential impact of these decisions, the public health community needs to track and understand these rationales as they appear, change, and influence the public. This paper details a preliminary system of real-time geographical monitoring and analysis in the context of the vaccine hesitancy discussion across the United States. The system and the answers it enables, with accompanying theoretical advantages of using social media and survey data together, are important because they will allow a first step towards more targeted and effective communication and reaction to widespread discussion, critical to reducing such hesitancy.

Literature Review

This paper aims to produce an efficient, broad monitoring system for the discussions that exist in the United States about vaccine hesitancy. We detail the reasons for the system's potential for improvement over survey methods, and pertinent survey research as an option for validating the system.

Exploring new factors relevant to people's vaccination decisions across the nation would be slow and expensive using traditional survey methodsregardless of methods chosen, it would be difficult to keep up with new outbreaks of hesitancy or disease. Enter social media, "which provide unprecedented, realtime access to the attitudes, beliefs, and behaviors of people from across demographic groups" 9 . Social media is already being used to effectively, quickly, and cheaply track health issues such as disease incidence 2,8,22,29 . Not only do social media successfully track disease, but studies have shown that it may also track public opinion related to medical and health issues 19,32 . Vaccination decisions are opinions of sections of the public; therefore, social media can and should be used to track vaccine hesitancy 9 . This work aims to spatially track discussion about vaccination using Twitter messages as a starting point towards capturing and validating opinions; to our knowledge no system exists with such spatial granularity and capability in this context.

To validate the system one can look at ground truth of how people make decisions in similar contexts; Quinn and colleagues have produced a body of work with such aims 13,24,25,27 . Overall, they showed via surveys that such factors as public trust, demographics, risk perception, and social norms influence vaccine decision making 25,27 . For example, in a qualitative study, Quinn et al. showed that dimensions of public trust affect medical decisions in a study about postal workers' reactions during the 2001 anthrax attacks 27 . These "attitudinal and experience variables [and] demographic characteristics" 13 provide insight into how rationales about vaccination decisions may vary. They provide a means of validating the system, a starting point for exploring the spatial and sociodemographic variability in vaccination decisions, and an opportunity to confirm that hypotheses established in limited survey environments hold in wider contexts. For example, is there a significant pocket of people in a certain area who are hesitant to vaccinate because they do not trust the government?

The monitoring system able to broadly, cheaply, and quickly test such survey and spatial hypotheses in this context is the novel contribution of this work. Specifically, it is A) a processing system for classifying messages and their sentiment that B) integrates existing analyses for topic and location and C) provides an extensible framework for statistically testing spatial hypotheses about vaccine hesitancy given the generated messages and metadata on social media. This system involves using targeted methodologies and leverages theoretical advantages from using social media data in concert with survey data. How might such a system shed light on vaccine-hesitancy discussions across the USA, what are its limitations, and how could it be used as a first stepping stone to augment survey methods?

Methods

Given our context of vaccination discussion, our approach to the system is the following combination of natural language processing and geospatial techniques: we collect and classify social media posts on Twitter related to vaccination; then categorize these posts by their sentiment, location and topic; then interpret the topics related to vaccine refusal and hesitancy; then spatially join and aggregate the results. This process enables evaluation of our survey hypotheses using the spatial topic clusters, as well as spatial examination of new discussions as it can be re-run over time. What follows are descriptions of each of the system's sub-processes.

Methods -Data

The data related to vaccines for our context, also described briefly by Dredze et al. 9 , are Twitter posts (tweets) from the USA that we began collecting around the aforementioned measles outbreak in Disneyland. The system collects the data and classifies for relevance and sentiment. Initially we flagged tweets by keyword using the Twitter Streaming API i , specifying more than fifty hand-chosen keywords such as 'vaccine', 'shot', and 'immunization' similar to and validated by common practice 2,6,31 .

Table 1: Keywords used to filter Twitter data vaccine,vaccines,shot,mmr,tdap,flushot,hpv,polio,rotavirus,chickenpox,smallpox,hepatitis,hepa,hepb,dtap,meningitis,shingles,vaccinate,vaccinated,vaccine,vaccines,vacine,vacines,tetanus,diptheria,pertussis,whoopingcough,dtp,dtwp,chickenpox,measles,mumps,rubella,varicella,diphtheria,haemophilus,papillomavirus,meningococcal,pneumococcal,rabies,tuberculosis,typhoid,yellowfever,immunizations,immunization,imunization,immune,imune,cholera,globulin,encephalitis,lyme

The Twitter Streaming API selects all tweets based on a given search (or a random 1% sample if a size threshold is exceeded); we obtained all matching tweets in the US during this time period, on the order of millions of tweets. See Dredze et al., for further details on these data. 9

Methods -Relevance and Sentiment

We trained supervised machine learning algorithms that, as part of the system, automatically classify tweets for relevance to vaccination and for sentiment, as sentiment analysis produces a measure of the expressed opinion in messages 21 . We obtained labeled training data using Amazon Mechanical Turk ii on randomly chosen subsets of our tweets to 1) tag them as being relevant to the topic of vaccines or not; 2) of those relevant, randomly choose and tag as having sentiment toward vaccines (neutral or non-neutral); 3) and of those that bear sentiment, randomly choose and tag as having positive or negative sentiment. While training the classifiers we conducted cross-validation and maximized precision and recall given tunable parameters. iii See Dredze et al., for further details on these classifiers. 9 Given these classifiers and the ability to run them over any tweets (our dataset and those incoming in real-time), we thus have the first part of the system, namely a real-time categorization of vaccine-related Twitter posts: those relevant to vaccines; of those relevant, those that bear sentiment; and of those that bear sentiment, their sentiment polarity.

Methods -Location Classification

The next part of the system involves location classification. We use the Carmen Geolocation Toolkit 10 to automatically classify a tweet's location; such geolocation has been shown to be appropriate and effective in other public health studies 10 . Carmen improves upon information provided by the Streaming API, and returns location information at the country, state, county, and latitude/longitude levels if able.

Methods -Topic Classification

The system uses topic modeling to determine the content discussed in our relevant tweets. Latent Dirichlet Allocation (LDA) is a commonly-used machine learning algorithm that automatically determines topics in collections of text 1 , common practice to automatically extract patterns and groups in text collections, of which social media data is a prime example. LDA assumes words in documents co-locate near other words (possibly across documents) because they are related, and the algorithm collects and reports groups of such related words, with the groups representing topics. Using the MAchine Learning for LanguagE Toolkit (MALLET) 18 , the system involves running LDA over our tweet dataset (the documents labeled as relevant to vaccination) to evaluate topics relevant to vaccine hesitancy. This produces an overall list of topics, and a parameterization of each tweet by topic (which is roughly proportional to relative composition by topic). Note that LDA is unsupervised; in general there is no guarantee that the algorithm will return a specific topic, and it is up to the analyst to determine topics' relevance and substance by analyzing the words and groups returned 3 . We leverage public health researchers' domain expertise to make such determinations. We note that in our context, LDA will show relevant topics because we have collected and categorized the tweets to fit a specific meta-topic (that of vaccines). By contrast, the substance of the relevant topics will be outputs of the system enabling hypothesis testing of our ground truth factors and exploration beyond.

Methods -Joins and Aggregations

The system enables nonspatial aggregation and analysis on the tweets by sentiment and topic. More central to this paper, however: the tweets also have location data, which one may spatially join and aggregate using ArcMap (version 10.3), part of ArcGIS. ArcGIS is a geographic information system software that can generate maps of aggregated data and can calculate and display spatial statistics on those maps. One such statistic is the Getis-Ord Gi* statistic for hotspot analysis 14 , valuable to the system because it indicates statistically significant high (low) point data if a point and its neighbors are high (low) in terms of some common variable. iv Using these maps and statistics, one may spatially analyze where vaccine tweets (our point data) occur, where sentiment occurs, and where topics occur (our common variables), with notions of how often they occur and whether statistically significant differences exist. Accordingly, the system provides a geographic result to accompany our topic-substance result concerning the survey results.

Results

Running the topic model over all tweets in our dataset, we obtained information about topics and their distribution over our tweets. The system also produces classification results for each tweet in terms of its relevance, sentiment, and location. We may use a tweet's ID (e.g. "532385146419560448") to link its sentiment, location, and topic distribution. Given locations of relevant messages, we may filter by classification category and weight by topic distribution to find hotspots for discussion of a given discussion.

Results -Topics

iii Relevance classifier (recall .91, precision .96); if relevant, whether contains sentiment about vaccines (recall .28, precision .63); if contains sentiment, is it positive vs negative (recall .85, precision .75). We chose to maximize precision in the second case because we were relying on the precision of our results in the positive/negative classifier. Such low recall is not an issue given the size of our dataset. iv The definition of 'neighbor' is variable; what is appropriate depends highly on the input data. Some of many possible options for our topic proportions and tweet data are k-nearest-neighbors (weighting influence such that all points have k neighbors) or weighting influence based on inverse Euclidian. We chose the former due to ease of interpretation and calculation. Specifically, the topic information consists of a relative weighting parameter for each topic for each tweet (roughly proportional to the proportion of each topic in the tweet), so one can get messages most representative of each topic. We ran the topic model on all messages, filtered by the regular expression *vacc* to prune irrelevant / noisy topics in advance, and qualitatively interpreted the topics. Needing to specify the number of topics, we chose 50 to capture enough variability in our large dataset. As a proof of concept, we considered topic 46 in our further analyses. Topic 46 pertains to the California government's bill eliminating exemptions from vaccinations in schoolchildren. Below are example messages from this topic; our domain experts who performed identification and validation looked at both the tokens in the topic and representative messages when doing so, as is good practice 5 .

• "california governor signs strict school vaccine legislation gov jerry brown signs california bill imposing..."

• "jim carrey brands governor 'fascist' over vaccine law jim carrey called california gov jerry brown" • "ahf criticizes dumb amp dumber star jim carrey for calling gov brown a fascist" • "calif gov jerry brown launching frosted mercury flakes children's cereal to accompany vaccine mandate"

We chose this topic for two reasons: it is an arguably prominent anti-vaccination discussion in our data, and it is pertinent to a hypothesis validated by Quinn's previous work that "public trust / trust in government" affects such attitudes about medical decisions as vaccination, a common thread for validation. The analysis steps are the same regardless of topic chosen.

Results -Hotspots for Topics

To identify hotbeds of these vaccine hesitancy discussions, we used the "Hot Spot Analysis" tool in ArcMap, which calculates the Gi* statistic. We continued the proof of concept by considering only the contiguous United States, but the analysis is identical using different geographical boundaries (e.g. an individual state or a different country). We also limited our hot spot analysis only to the tweet messages classified as having negative sentiment about vaccines since our chosen topic was 46. As the definition of a neighborhood may vary depending on input data, we chose to spatially weight our input data via the k-nearest-neighbor (KNN) algorithm (using the default value of 8 neighbors suggested by ArcMap) to elegantly allow for such variations. This yielded the following map. The hot-spot map of topic 46 shows statistically significant areas in the contiguous USA where the highest proportion of discussion of topic 46 is occurring in negative-sentiment vaccine messages on Twitter. For example, topic 46 is often discussed near LA and in the northern Appalachian region, among other areas. Such maps may be created for any permutation of classification and topic, and would yield any statistically significant results to be found among the spatial data for each permutation. Note that this statistic does not merely highlight points that contain a lot of messages, but highlights points with statistically significant differences of message totals compared to neighboring points. Such significant results would (and do in the case of topic 46) yield convergent findings with survey data. Future work will more rigorously relate and apply this mixed methods approach.

Discussion

The results outlined above yielded statistically significant geographic hot and cold spots in terms of individual topics in negative-sentiment vaccine messages on Twitter as a proof of concept. Such hotspots in a topic correspond to a discussion being statistically prevalent, and more prevalent in certain areas than others. That discussions pertinent to the trust in government results from Quinn's surveys (topic 46) are statistically significant in the first place both validates our approach and supports Quinn's findings on a larger scale. The fact that no significant cold spots are found among the topic 46 negative-sentiment map also validates our approach, as one would expect only hotspots in such topics pertaining to anti-vaccine discussions. This proof of concept showed that social media contains valuable information that is more granular and available more cheaply and quickly than through traditional survey methods. With further refinement, this information may be leveraged to replicate and compare with survey results.

In addition, such hot spot information is immediately actionable from a public health perspective, a valuable quality in the context of vaccine hesitancy. For example, one may target messages towards public policy think tanks in Arkansas to foster a more balanced approach to the debate about the government mandates on vaccination. Identification of such geolocated issues is valuable to public health officials as it provides low hanging fruit to address if interventions are known. For example, officials might value being able to reach all of Arkansas in a messaging campaign by only messaging Little Rock (if that were the only hotspot). The other side of the coin is also valuable, however, as evidence-based interventions may not yet exist. Officials may have been unaware of a specific geographic area and its opinions on a sub-issue of vaccine hesitancy, as hesitancy itself has been shown to vary across regions and within countries without a successful strategy. 11 Thus the system's analysis of its real-time sentiment-topic data allowed us to identify individual discussions from the aggregate meta-topic, suggested the ability to verify survey hypotheses relating to those discussions, and suggested spatial targets for more effective use of public health resources. With expertise in both content and data analysis to fully understand and leverage the social media data, the system provides a promising opportunity to monitor real-time views.

Discussion -Limitations

However, this system and its underlying approach may be improved. For example, the ability of Carmen 10 to augment location information could be increased such that it identifies information at a more granular level in more messages. This would affect the geospatial hotspot analysis, as one could improve results by grouping by levels of granularity with more and better location information. In addition, this proof of concept topic analysis returned 50 topics, but sensitivity analysis on this number as up-or downsizing could reduce noise. Thirdly, the open debate of social media analysis applies as well: whether social media discussions are a valid and accurate proxy for the rationales of the population at large. This applies both in terms of users' demographics (see below) and in terms of the potential for fake users, which recent research may be used to filter 4 . Fourthly, one should be cognizant of the (limited but nonzero) amount of technical supervision required: the system requires computational capacity and server administration, and it requires creating machine learning classifiers. 9

Discussion -Future Work

An additional limitation is that the topic models in LDA are subjective; there are alternative models and means of interpretation associated with them that could be employed. Paul and Dredze created an elegant framework for supervised topic models 17,23 , which could be adapted to our system, that would return topics seeded by specific a priori values (i.e., those in Quinn's survey results). Such seeded topics would remove subjectivity of topic interpretation, quantifiably associating topics with pre-determined results. Secondly, LDA is merely a long-running industry standard; an alternative is Linguistic Inquiry and Word Count (LIWC) 33 . In contrast to LDA which returns words that are co-located, LIWC counts psychologically relevant words into categories, producing output along dimensions such as "negative emotion words" or "tentative language". These categories and their relative frequencies paint a picture of how the word user(s) consider their subject matter, in this case discussions about vaccines. Using LIWC would provide an alternative viewpoint that may be more easily interpreted using the framework of psychology.

Another aim of future work involves more explicit relations to traditional survey methods. One immediate improvement would be to aggregate tweets by user, which will enable user demographic classification 7,30 and other user-level statistics such as comparisons to known outbreaks of disease or to news coverage. With this information, and analysis related to retweets and news mentions, one might operationalize survey questions to individuals as different slices of our dataset, which for example would allow exploring and validating if demographics are related to one's rationales and opinions, especially those opinions relating to trust in government, as previous work has suggested 25,26 . Aggregating information by user would also allow the system to further the question of whether social media may be used as a proxy for the population at large, both in terms of demographics and in terms of coverage of topic discussion. The representativeness of social media users is an open question, whether relating to pro-or anti-vaccination communities or to the population as a whole. The analyses in this paper combined with demographic classification would allow us to determine how representative our social media users are of our target population(s).

Conclusions

Given the problem of tracking and understanding discussion in a population and the context of vaccine hesitancy, we have as a first step created a pipeline of natural language processing and geospatial techniques that enable real-time statistical analysis of different discussions in a population across space. This system showed statistically significant spatial hotspots of discussion in the USA that provide actionable insights for the time-sensitive context. Given the financial and computational ease of gathering and processing swaths of social media data, this system can be used to monitor real-time views, and, easily extensible, suggests the ability to verify traditional survey methods in broader spatial contexts.

Figure 1 :1Figure 1: Hotspots of the proportion of discussion of Topic 46 in the contiguous USA i https://dev.twitter.com/streaming/overview ii https://www.mturk.com/mturk/welcome

Acknowledgements

Thank you to Amelia Jamison for her helpful feedback and topic analysis.

Dr. Dredze has received consulting fees from Directing Medicine LLC and Sickweather LLC, who use social media for public health surveillance.

Latent Dirichlet Allocation DMBlei AYNg MIJordan J Mach Learn Res 3 2003 Mar National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic DABroniatowski MJPaul MDredze PLOS ONE 8 12 e83672 2013 Dec 9 Reading tea leaves: How humans interpret topic models JChang JLBoyd-Graber SGerrish CWang DMBlei Nips 2009 cited 2017 Mar 8 Antisocial Behavior in Online Discussion Communities JCheng CDanescu-Niculescu-Mizil JLeskovec arXiv:150400680 2015 Apr 2. 2016 Dec 8 cs, stat Termite: Visualization Techniques for Assessing Textual Topic Models JChuang CDManning JHeer Proceedings of the International Working Conference on Advanced Visual Interfaces the International Working Conference on Advanced Visual Interfaces Internet <idno type="DOI">10.1145/2254556.2254572</idno> <ptr target="http://doi.acm.org/10.1145/2254556.2254572" /> <imprint> <date type="published" when="2012">2012. 2017 Mar 8</date> <publisher>ACM</publisher> <biblScope unit="page" from="74" to="77" /> <pubPlace>New York, NY, USA</pubPlace> </imprint> </monogr> <note>AVI '12</note> </biblStruct> <biblStruct xml:id="b6"> <analytic> <title level="a" type="main">Predicting the Political Alignment of Twitter Users MDConover BGoncalves JRatkiewicz AFlammini FMenczer 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom) 2011 Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data ACulotta NKRavi JCutler Journal of Artificial Intelligence Research 55 2016 Towards Detecting Influenza Epidemics by Analyzing Twitter Messages ACulotta Proceedings of the First Workshop on Social Media Analytics the First Workshop on Social Media Analytics Internet <idno type="DOI">10.1145/1964858.1964874</idno> <ptr target="http://doi.acm.org/10.1145/1964858.1964874" /> <imprint> <date type="published" when="2010">2010. 2016 Mar 10</date> <publisher>ACM</publisher> <biblScope unit="page" from="115" to="122" /> <pubPlace>New York, NY, USA</pubPlace> </imprint> </monogr> <note>SOMA '10</note> </biblStruct> <biblStruct xml:id="b10"> <analytic> <title level="a" type="main">Understanding Vaccine Refusal: Why We Need Social Media Now MDredze DABroniatowski MCSmith KMHilyard American Journal of Preventive Medicine 50 4 2016 Apr Carmen: A twitter geolocation system with applications to public health MDredze MJPaul SBergsma HTran AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI) Citeseer 2013 Strategies intended to address vaccine hesitancy: Review of published reviews EDubé DGagnon NEMacdonald Vaccine 33 34 2015 Aug 14 Mapping vaccine hesitancy-Country-specific characteristics of a global phenomenon EDubé DGagnon ENickels SJeram MSchuster Vaccine 32 49 2014 Nov 20 Trust during the early stages of the 2009 H1N1 pandemic VSFreimuth DMusa KHilyard SCQuinn KKim Journal of health communication 19 3 2014 The Analysis of Spatial Association by Use of Distance Statistics AGetis JKOrd Geographical Analysis 24 3 1992 Jul 1 Measles at Disneyland, a Problem for All AgesMeasles at Disneyland NAHalsey DASalmon Ann Intern Med 162 9 2015 May 5 Vaccine hesitancy: Definition, scope and determinants NEMacdonald Vaccine 33 34 2015 Aug 14 Supervised Topic Models JDMcauliffe DMBlei Advances in Neural Information Processing Systems 20 JCPlatt DKoller YSinger STRoweis Internet <author> <persName><forename type="first">Inc</forename><surname>Curran Associates</surname></persName> </author> <ptr target="http://papers.nips.cc/paper/3328-supervised-topic-models.pdf" /> <imprint> <date type="published" when="2008">2008. 2016 Mar 17</date> <biblScope unit="page" from="121" to="128" /> </imprint> </monogr> </biblStruct> <biblStruct xml:id="b20"> <monogr> <title level="m" type="main">MALLET: A Machine Learning for Language Toolkit AKMccallum 2002 Internet Disease Detection or Public Opinion Reflection? Content Analysis of Tweets, Other Social Media, and Online Newspapers During the Measles Outbreak in the Netherlands in 2013 LMollema IAHarmsen EBroekhuizen RClijnk DeMelker HPaulussen T J Med Internet Res 17 5 2015 May 26. 2016 Mar 4 Vaccine Refusal, Mandatory Immunization, and the Risks of Vaccine-Preventable Diseases SBOmer DASalmon WAOrenstein MPDehart NHalsey New England Journal of Medicine 360 19 2009 May 7 Opinion Mining and Sentiment Analysis BPang LLee Found Trends Inf Retr 2 1-2 2008 Jan Twitter Improves Influenza Forecasting MJPaul MDredze DBroniatowski PLoS Currents 2014. 2016 Mar 5 Internet SPRITE: Generalizing Topic Models with Structured Priors MJPaul MDredze Transactions of the Association for Computational Linguistics 3 2015 Jan 20 Public acceptance of peramivir during the 2009 H1N1 influenza pandemic: implications for other drugs or vaccines under emergency use authorizations SCQuinn KHilyard NCastaneda-Angarita VSFreimuth Disaster Med Public Health Prep 9 2 2015 Apr Public willingness to take a vaccine or drug under Emergency Use Authorization during the 2009 H1N1 pandemic. Biosecurity and bioterrorism: biodefense strategy, practice SCQuinn SKumar VSFreimuth KKidwell DMusa and science 7 3 2009 Exploring communication, trust in government, and vaccination intention later in the 2009 H1N1 pandemic: results of a national survey. Biosecurity and bioterrorism: biodefense strategy, practice SCQuinn JParmer VSFreimuth KMHilyard DMusa KHKim and science 11 2 2013 The Anthrax Vaccine and Research: Reactions from Postal Workers and Public Health Professionals SCQuinn TThomas SKumar Biosecur Bioterror 6 4 2008 Dec Disease Table Working Group a. HIstorical comparisons of morbidity and mortality for vaccine-preventable diseases in the united states SWRoush TVMurphy Vaccine-Preventable JAMA 298 18 2007 Nov 14 Influenza A (H7N9) and the Importance of Digital Epidemiology MSalathé CCFreifeld SRMekaru AFTomasulo JSBrownstein New England Journal of Medicine 369 5 2013 Aug 1 Developing Age and Gender Predictive Lexica over Social Media MSap GPark JCEichstaedt MLKern DJStillwell MKosinski Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) [Internet the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) [Internet 2014. 2016 Jun 8 Association for Computational Linguistics The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic ASignorini AMSegre PMPolgreen PLOS ONE 6 5 e19467 2011 May 4 Towards Real-Time Measurement of Public Epidemic Awareness: Monitoring Influenza Awareness through Twitter MCSmith DABroniatowski MJPaul MDredze AAAI Spring Symposium on Observational Studies through Social Media and Other Human-Generated Content

Stanford, CA

2016 The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods YRTausczik JWPennebaker Journal of Language and Social Psychology 29 1 2010 Mar 1 Pertussis Epidemic -California KWinter CarolGlaser JamesWatt KathleenHarriman 2014. 2014 Internet. cited 2017 Mar 8