<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Dubé E, Gagnon D, Nickels E, Jeram S, Schuster M. Mapping vaccine hesitancy-Country-specific characteris-
tics of a global phenomenon. Vaccine.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael C. Smith</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M.S.E</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Dredze</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ph.D.</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandra Crouse Quinn</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ph.D.</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David A. Broniatowski</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ph.D.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Literature Review</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The George Washington University</institution>
          ,
          <addr-line>Washington, DC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Johns Hopkins University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The University of Maryland</institution>
          ,
          <addr-line>College Park, Maryland</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>32</volume>
      <issue>49</issue>
      <fpage>121</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>Social media provide the potential to keep up with public discussions more quickly, at lower cost, and at potentially higher granularity and scope than do traditional surveys9. This paper details a preliminary system of real-time geographical monitoring and analysis using the context of the vaccine-hesitancy discussion across the United States, a valuable backdrop for such a system because of the diverse and impactful nature of the vaccination discussions as they appear, change, and influence the public12,20. We combine various methods in machine learning to geolocate, categorize, and classify vaccination discussions on Twitter. As a proof of concept, we show analyses with a prominent anti-vaccine discussion that validate the system with results from traditional surveys, yet also provide valuable spatial statistical power on top of such surveys on maps of the United States. We detail limitations and future work, yet still conclude that the system and the answers it enables are important because they will allow for more targeted and effective communication and reaction to the discussion as a first step towards monitoring people's views.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>To validate the system one can look at ground truth of how people make decisions in similar contexts; Quinn and
colleagues have produced a body of work with such aims13,24,25,27. Overall, they showed via surveys that such factors
as public trust, demographics, risk perception, and social norms influence vaccine decision making25,27. For example,
in a qualitative study, Quinn et al. showed that dimensions of public trust affect medical decisions in a study about
postal workers’ reactions during the 2001 anthrax attacks27. These “attitudinal and experience variables [and]
demographic characteristics”13 provide insight into how rationales about vaccination decisions may vary. They provide a
means of validating the system, a starting point for exploring the spatial and sociodemographic variability in
vaccination decisions, and an opportunity to confirm that hypotheses established in limited survey environments hold in wider
contexts. For example, is there a significant pocket of people in a certain area who are hesitant to vaccinate because
they do not trust the government?
The monitoring system able to broadly, cheaply, and quickly test such survey and spatial hypotheses in this context is
the novel contribution of this work. Specifically, it is A) a processing system for classifying messages and their
sentiment that B) integrates existing analyses for topic and location and C) provides an extensible framework for
statistically testing spatial hypotheses about vaccine hesitancy given the generated messages and metadata on social media.
This system involves using targeted methodologies and leverages theoretical advantages from using social media data
in concert with survey data. How might such a system shed light on vaccine-hesitancy discussions across the USA,
what are its limitations, and how could it be used as a first stepping stone to augment survey methods?</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
    </sec>
    <sec id="sec-3">
      <title>Methods – Data</title>
      <p>Given our context of vaccination discussion, our approach to the system is the following combination of natural
language processing and geospatial techniques: we collect and classify social media posts on Twitter related to
vaccination; then categorize these posts by their sentiment, location and topic; then interpret the topics related to vaccine
refusal and hesitancy; then spatially join and aggregate the results. This process enables evaluation of our survey
hypotheses using the spatial topic clusters, as well as spatial examination of new discussions as it can be re-run over
time. What follows are descriptions of each of the system's sub-processes.</p>
      <p>The data related to vaccines for our context, also described briefly by Dredze et al.9, are Twitter posts (tweets) from
the USA that we began collecting around the aforementioned measles outbreak in Disneyland. The system collects
the data and classifies for relevance and sentiment. Initially we flagged tweets by keyword using the Twitter Streaming
APIi, specifying more than fifty hand-chosen keywords such as 'vaccine', 'shot', and 'immunization' similar to and
validated by common practice2,6,31.</p>
    </sec>
    <sec id="sec-4">
      <title>Methods – Relevance and Sentiment</title>
      <p>We trained supervised machine learning algorithms that, as part of the system, automatically classify tweets for
relevance to vaccination and for sentiment, as sentiment analysis produces a measure of the expressed opinion in
messages21. We obtained labeled training data using Amazon Mechanical Turkii on randomly chosen subsets of our tweets
to 1) tag them as being relevant to the topic of vaccines or not; 2) of those relevant, randomly choose and tag as having
sentiment toward vaccines (neutral or non-neutral); 3) and of those that bear sentiment, randomly choose and tag as
having positive or negative sentiment. While training the classifiers we conducted cross-validation and maximized
i https://dev.twitter.com/streaming/overview
ii https://www.mturk.com/mturk/welcome
precision and recall given tunable parameters.iii See Dredze et al., for further details on these classifiers.9 Given these
classifiers and the ability to run them over any tweets (our dataset and those incoming in real-time), we thus have the
first part of the system, namely a real-time categorization of vaccine-related Twitter posts: those relevant to vaccines;
of those relevant, those that bear sentiment; and of those that bear sentiment, their sentiment polarity.</p>
    </sec>
    <sec id="sec-5">
      <title>Methods – Location Classification</title>
      <p>The next part of the system involves location classification. We use the Carmen Geolocation Toolkit10 to automatically
classify a tweet's location; such geolocation has been shown to be appropriate and effective in other public health
studies10. Carmen improves upon information provided by the Streaming API, and returns location information at the
country, state, county, and latitude/longitude levels if able.</p>
    </sec>
    <sec id="sec-6">
      <title>Methods – Topic Classification</title>
      <p>The system uses topic modeling to determine the content discussed in our relevant tweets. Latent Dirichlet Allocation
(LDA) is a commonly-used machine learning algorithm that automatically determines topics in collections of text1,
common practice to automatically extract patterns and groups in text collections, of which social media data is a prime
example. LDA assumes words in documents co-locate near other words (possibly across documents) because they
are related, and the algorithm collects and reports groups of such related words, with the groups representing topics.
Using the MAchine Learning for LanguagE Toolkit (MALLET)18, the system involves running LDA over our tweet
dataset (the documents labeled as relevant to vaccination) to evaluate topics relevant to vaccine hesitancy. This
produces an overall list of topics, and a parameterization of each tweet by topic (which is roughly proportional to relative
composition by topic). Note that LDA is unsupervised; in general there is no guarantee that the algorithm will return
a specific topic, and it is up to the analyst to determine topics' relevance and substance by analyzing the words and
groups returned3. We leverage public health researchers' domain expertise to make such determinations. We note that
in our context, LDA will show relevant topics because we have collected and categorized the tweets to fit a specific
meta-topic (that of vaccines). By contrast, the substance of the relevant topics will be outputs of the system enabling
hypothesis testing of our ground truth factors and exploration beyond.</p>
    </sec>
    <sec id="sec-7">
      <title>Methods – Joins and Aggregations</title>
      <p>The system enables nonspatial aggregation and analysis on the tweets by sentiment and topic. More central to this
paper, however: the tweets also have location data, which one may spatially join and aggregate using ArcMap (version
10.3), part of ArcGIS. ArcGIS is a geographic information system software that can generate maps of aggregated data
and can calculate and display spatial statistics on those maps. One such statistic is the Getis-Ord Gi* statistic for
hotspot analysis14, valuable to the system because it indicates statistically significant high (low) point data if a point
and its neighbors are high (low) in terms of some common variable.iv Using these maps and statistics, one may spatially
analyze where vaccine tweets (our point data) occur, where sentiment occurs, and where topics occur (our common
variables), with notions of how often they occur and whether statistically significant differences exist. Accordingly,
the system provides a geographic result to accompany our topic-substance result concerning the survey results.</p>
    </sec>
    <sec id="sec-8">
      <title>Results</title>
    </sec>
    <sec id="sec-9">
      <title>Results – Topics</title>
      <p>Running the topic model over all tweets in our dataset, we obtained information about topics and their distribution
over our tweets. The system also produces classification results for each tweet in terms of its relevance, sentiment,
and location. We may use a tweet's ID (e.g. “532385146419560448”) to link its sentiment, location, and topic
distribution. Given locations of relevant messages, we may filter by classification category and weight by topic distribution
to find hotspots for discussion of a given discussion.
iii Relevance classifier (recall .91, precision .96); if relevant, whether contains sentiment about vaccines (recall .28, precision .63);
if contains sentiment, is it positive vs negative (recall .85, precision .75). We chose to maximize precision in the second case
because we were relying on the precision of our results in the positive/negative classifier. Such low recall is not an issue given the
size of our dataset.
iv The definition of ‘neighbor’ is variable; what is appropriate depends highly on the input data. Some of many possible options
for our topic proportions and tweet data are k-nearest-neighbors (weighting influence such that all points have k neighbors) or
weighting influence based on inverse Euclidian. We chose the former due to ease of interpretation and calculation.
Specifically, the topic information consists of a relative weighting parameter for each topic for each tweet (roughly
proportional to the proportion of each topic in the tweet), so one can get messages most representative of each topic.
We ran the topic model on all messages, filtered by the regular expression *vacc* to prune irrelevant / noisy topics in
advance, and qualitatively interpreted the topics. Needing to specify the number of topics, we chose 50 to capture
enough variability in our large dataset. As a proof of concept, we considered topic 46 in our further analyses. Topic
46 pertains to the California government's bill eliminating exemptions from vaccinations in schoolchildren. Below are
example messages from this topic; our domain experts who performed identification and validation looked at both the
tokens in the topic and representative messages when doing so, as is good practice5.
• “california governor signs strict school vaccine legislation gov jerry brown signs california bill imposing...“
• “jim carrey brands governor 'fascist' over vaccine law jim carrey called california gov jerry brown“
• “ahf criticizes dumb amp dumber star jim carrey for calling gov brown a fascist“
• “calif gov jerry brown launching frosted mercury flakes children's cereal to accompany vaccine mandate“
We chose this topic for two reasons: it is an arguably prominent anti-vaccination discussion in our data, and it is
pertinent to a hypothesis validated by Quinn's previous work that “public trust / trust in government” affects such
attitudes about medical decisions as vaccination, a common thread for validation. The analysis steps are the same
regardless of topic chosen.</p>
    </sec>
    <sec id="sec-10">
      <title>Results – Hotspots for Topics</title>
      <p>To identify hotbeds of these vaccine hesitancy discussions, we used the “Hot Spot Analysis” tool in ArcMap, which
calculates the Gi* statistic. We continued the proof of concept by considering only the contiguous United States, but
the analysis is identical using different geographical boundaries (e.g. an individual state or a different country). We
also limited our hot spot analysis only to the tweet messages classified as having negative sentiment about vaccines
since our chosen topic was 46. As the definition of a neighborhood may vary depending on input data, we chose to
spatially weight our input data via the k-nearest-neighbor (KNN) algorithm (using the default value of 8 neighbors
suggested by ArcMap) to elegantly allow for such variations. This yielded the following map.
The hot-spot map of topic 46 shows statistically significant areas in the contiguous USA where the highest proportion
of discussion of topic 46 is occurring in negative-sentiment vaccine messages on Twitter. For example, topic 46 is
often discussed near LA and in the northern Appalachian region, among other areas. Such maps may be created for
any permutation of classification and topic, and would yield any statistically significant results to be found among the
spatial data for each permutation. Note that this statistic does not merely highlight points that contain a lot of
messages, but highlights points with statistically significant differences of message totals compared to neighboring points.
Such significant results would (and do in the case of topic 46) yield convergent findings with survey data. Future
work will more rigorously relate and apply this mixed methods approach.</p>
    </sec>
    <sec id="sec-11">
      <title>Discussion</title>
      <p>The results outlined above yielded statistically significant geographic hot and cold spots in terms of individual topics
in negative-sentiment vaccine messages on Twitter as a proof of concept. Such hotspots in a topic correspond to a
discussion being statistically prevalent, and more prevalent in certain areas than others. That discussions pertinent to
the trust in government results from Quinn's surveys (topic 46) are statistically significant in the first place both
validates our approach and supports Quinn's findings on a larger scale. The fact that no significant cold spots are found
among the topic 46 negative-sentiment map also validates our approach, as one would expect only hotspots in such
topics pertaining to anti-vaccine discussions. This proof of concept showed that social media contains valuable
information that is more granular and available more cheaply and quickly than through traditional survey methods. With
further refinement, this information may be leveraged to replicate and compare with survey results.
In addition, such hot spot information is immediately actionable from a public health perspective, a valuable quality
in the context of vaccine hesitancy. For example, one may target messages towards public policy think tanks in
Arkansas to foster a more balanced approach to the debate about the government mandates on vaccination. Identification
of such geolocated issues is valuable to public health officials as it provides low hanging fruit to address if
interventions are known. For example, officials might value being able to reach all of Arkansas in a messaging campaign by
only messaging Little Rock (if that were the only hotspot). The other side of the coin is also valuable, however, as
evidence-based interventions may not yet exist. Officials may have been unaware of a specific geographic area and
its opinions on a sub-issue of vaccine hesitancy, as hesitancy itself has been shown to vary across regions and within
countries without a successful strategy.11
Thus the system’s analysis of its real-time sentiment-topic data allowed us to identify individual discussions from the
aggregate meta-topic, suggested the ability to verify survey hypotheses relating to those discussions, and suggested
spatial targets for more effective use of public health resources. With expertise in both content and data analysis to
fully understand and leverage the social media data, the system provides a promising opportunity to monitor real-time
views.</p>
    </sec>
    <sec id="sec-12">
      <title>Discussion – Limitations</title>
      <p>However, this system and its underlying approach may be improved. For example, the ability of Carmen10 to augment
location information could be increased such that it identifies information at a more granular level in more messages.
This would affect the geospatial hotspot analysis, as one could improve results by grouping by levels of granularity
with more and better location information. In addition, this proof of concept topic analysis returned 50 topics, but
sensitivity analysis on this number as up- or downsizing could reduce noise. Thirdly, the open debate of social media
analysis applies as well: whether social media discussions are a valid and accurate proxy for the rationales of the
population at large. This applies both in terms of users' demographics (see below) and in terms of the potential for
fake users, which recent research may be used to filter4. Fourthly, one should be cognizant of the (limited but nonzero)
amount of technical supervision required: the system requires computational capacity and server administration, and
it requires creating machine learning classifiers.9</p>
    </sec>
    <sec id="sec-13">
      <title>Discussion – Future Work</title>
      <p>An additional limitation is that the topic models in LDA are subjective; there are alternative models and means of
interpretation associated with them that could be employed. Paul and Dredze created an elegant framework for
supervised topic models17,23, which could be adapted to our system, that would return topics seeded by specific a priori
values (i.e., those in Quinn's survey results). Such seeded topics would remove subjectivity of topic interpretation,
quantifiably associating topics with pre-determined results. Secondly, LDA is merely a long-running industry
standard; an alternative is Linguistic Inquiry and Word Count (LIWC)33. In contrast to LDA which returns words that are
co-located, LIWC counts psychologically relevant words into categories, producing output along dimensions such as
“negative emotion words” or “tentative language”. These categories and their relative frequencies paint a picture of
how the word user(s) consider their subject matter, in this case discussions about vaccines. Using LIWC would provide
an alternative viewpoint that may be more easily interpreted using the framework of psychology.
Another aim of future work involves more explicit relations to traditional survey methods. One immediate
improvement would be to aggregate tweets by user, which will enable user demographic classification7,30 and other user-level
statistics such as comparisons to known outbreaks of disease or to news coverage. With this information, and analysis
related to retweets and news mentions, one might operationalize survey questions to individuals as different slices of
our dataset, which for example would allow exploring and validating if demographics are related to one's rationales
and opinions, especially those opinions relating to trust in government, as previous work has suggested25,26.
Aggregating information by user would also allow the system to further the question of whether social media may be used
as a proxy for the population at large, both in terms of demographics and in terms of coverage of topic discussion.
The representativeness of social media users is an open question, whether relating to pro- or anti-vaccination
communities or to the population as a whole. The analyses in this paper combined with demographic classification would
allow us to determine how representative our social media users are of our target population(s).</p>
    </sec>
    <sec id="sec-14">
      <title>Conclusions</title>
      <p>Given the problem of tracking and understanding discussion in a population and the context of vaccine hesitancy, we
have as a first step created a pipeline of natural language processing and geospatial techniques that enable real-time
statistical analysis of different discussions in a population across space. This system showed statistically significant
spatial hotspots of discussion in the USA that provide actionable insights for the time-sensitive context. Given the
financial and computational ease of gathering and processing swaths of social media data, this system can be used to
monitor real-time views, and, easily extensible, suggests the ability to verify traditional survey methods in broader
spatial contexts.</p>
    </sec>
    <sec id="sec-15">
      <title>Acknowledgements</title>
      <p>Thank you to Amelia Jamison for her helpful feedback and topic analysis.</p>
      <p>Dr. Dredze has received consulting fees from Directing Medicine LLC and Sickweather LLC, who use social media
for public health surveillance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blei</surname>
            <given-names>DM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            <given-names>AY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            <given-names>MI</given-names>
          </string-name>
          .
          <article-title>Latent Dirichlet Allocation</article-title>
          .
          <source>J Mach Learn Res</source>
          .
          <year>2003</year>
          Mar;
          <volume>3</volume>
          :
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Broniatowski</surname>
            <given-names>DA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paul</surname>
            <given-names>MJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic</article-title>
          .
          <source>PLOS ONE</source>
          .
          <source>2013 Dec</source>
          <volume>9</volume>
          ;
          <issue>8</issue>
          (
          <issue>12</issue>
          ):
          <fpage>e83672</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>3</issue>
          .
          <string-name>
            <surname>Chang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyd-Graber</surname>
            <given-names>JL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerrish</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            <given-names>DM</given-names>
          </string-name>
          .
          <article-title>Reading tea leaves: How humans interpret topic models</article-title>
          .
          <source>In: Nips [Internet]</source>
          .
          <source>2009 [cited 2017 Mar</source>
          <volume>8</volume>
          ]. p.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . Available from: https://papers.nips.cc/paper/3700-readingtea
          <article-title>-leaves-how-humans-interpret-topic-models</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cheng</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danescu-Niculescu-Mizil</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskovec J. Antisocial</surname>
          </string-name>
          <article-title>Behavior in Online Discussion Communities</article-title>
          . arXiv:
          <volume>150400680</volume>
          [cs, stat] [Internet].
          <source>2015 Apr 2 [cited 2016 Dec</source>
          <volume>8</volume>
          ]; Available from: http://arxiv.org/abs/1504.00680
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chuang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            <given-names>CD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heer</surname>
            <given-names>J</given-names>
          </string-name>
          . Termite:
          <article-title>Visualization Techniques for Assessing Textual Topic Models</article-title>
          .
          <source>In: Proceedings of the International Working Conference on Advanced Visual Interfaces [Internet]</source>
          . New York, NY, USA: ACM;
          <year>2012</year>
          [
          <article-title>cited 2017 Mar 8]</article-title>
          . p.
          <fpage>74</fpage>
          -
          <lpage>77</lpage>
          . (AVI '12). Available from: http://doi.acm.
          <source>org/10</source>
          .1145/2254556.2254572
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Conover</surname>
            <given-names>MD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goncalves</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ratkiewicz</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flammini</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menczer</surname>
            <given-names>F</given-names>
          </string-name>
          .
          <article-title>Predicting the Political Alignment of Twitter Users</article-title>
          .
          <source>In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT)</source>
          and
          <source>2011 IEEE Third Inernational Conference on Social Computing (SocialCom)</source>
          .
          <year>2011</year>
          . p.
          <fpage>192</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Culotta</surname>
            <given-names>A</given-names>
          </string-name>
          , Ravi and
          <string-name>
            <given-names>NK</given-names>
            ,
            <surname>Cutler</surname>
          </string-name>
          <string-name>
            <surname>J</surname>
          </string-name>
          .
          <article-title>Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          .
          <year>2016</year>
          ;
          <volume>55</volume>
          :
          <fpage>389</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Culotta</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Towards Detecting Influenza Epidemics by Analyzing Twitter Messages</article-title>
          .
          <source>In: Proceedings of the First Workshop on Social Media Analytics [Internet]</source>
          . New York, NY, USA: ACM; 2010
          <source>[cited 2016 Mar</source>
          <volume>10</volume>
          ]. p.
          <fpage>115</fpage>
          -
          <lpage>122</lpage>
          . (SOMA '10). Available from: http://doi.acm.
          <source>org/10</source>
          .1145/1964858.1964874
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dredze</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broniatowski</surname>
            <given-names>DA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            <given-names>MC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hilyard</surname>
            <given-names>KM</given-names>
          </string-name>
          .
          <article-title>Understanding Vaccine Refusal: Why We Need Social Media Now</article-title>
          .
          <source>American Journal of Preventive Medicine</source>
          .
          <year>2016</year>
          Apr;
          <volume>50</volume>
          (
          <issue>4</issue>
          ):
          <fpage>550</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dredze</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paul</surname>
            <given-names>MJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergsma</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Carmen</surname>
          </string-name>
          :
          <article-title>A twitter geolocation system with applications to public health</article-title>
          .
          <source>In: AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI)</source>
          .
          <source>Citeseer; 2013</source>
          . p.
          <fpage>20</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Dubé</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gagnon</surname>
            <given-names>D</given-names>
          </string-name>
          , MacDonald NE.
          <article-title>Strategies intended to address vaccine hesitancy: Review of published reviews</article-title>
          .
          <source>Vaccine. 2015 Aug</source>
          <volume>14</volume>
          ;
          <volume>33</volume>
          (
          <issue>34</issue>
          ):
          <fpage>4191</fpage>
          -
          <lpage>203</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>