<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OMG U got flu? Analysis of shared health messages for bio-surveillance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nguyen Truong Son</string-name>
          <email>ntson@fit.hcmus.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ngoc Mai Nguyen</string-name>
          <email>maintn@uit.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Nigel Collier National Institute of Informatics 1-2-1 Hitotsubashi</institution>
          ,
          <addr-line>Chiyoda-ku, Tokyo 101-8430</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VietNam National University at HCMC HoChiMinh City</institution>
          ,
          <addr-line>VietNam</addr-line>
        </aff>
      </contrib-group>
      <fpage>18</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in realtime. However, Twitter posts ('tweets') are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis. In initial experiments on influenza tracking we report results for unigrams, bigrams and regular expressions employed in two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour categories plus a selfreported diagnosis. We report moderately strong Spearman's Rho correlation for the classifiers against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 influenza season.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Rising awareness of infectious disease
outbreaks and the high costs of extending
traditional sensor networks means that
we have an opportunity to harness new
forms of social communication for
crisis surveillance. The trend is already
underway with automatic map
generation from Twitter reports for earthquakes
and typhoons
        <xref ref-type="bibr" rid="ref17 ref7">(Earle, 2010; Sakaki et
al., 2010)</xref>
        , the symptom-based influenza
tracking portal Flutracking
        <xref ref-type="bibr" rid="ref4">(Dalton et al.,
2009)</xref>
        as well as the humanitarian
portal Ushahidi
        <xref ref-type="bibr" rid="ref14">(Okolloh, 2009)</xref>
        . Despite
a risk of high false reporting rates there
is nevertheless strong potential in having
multiple sensor sources for verification,
robustness and redundancy. In the case
of earthquake detection, Earle notes that
Twitter messages (tweets) can be
available up to 20 minutes before the official
report from the US Geological Survey.
With epidemics too the time period from
signal to detection is critical. Recent
studies such as
        <xref ref-type="bibr" rid="ref1">(Cheng et al., 2009)</xref>
        estimate that the average delay in receiving
and disseminating data from traditional
sentinel physician networks is about two
weeks.
      </p>
      <p>
        A small but growing number of early
warning systems have already developed
to mine event information from low cost
Web sources mainly focussing on edited
newswire reports (see
        <xref ref-type="bibr" rid="ref9">(Hartley et al.,
2010)</xref>
        for a survey). Success in
operationalizing such systems has crucially
depended on building close collaborations
with government and international public
health agencies in order to perform
detailed verification and risk assessment.
      </p>
      <p>
        Recent studies on alerting from
newswire reports
        <xref ref-type="bibr" rid="ref2">(Collier, 2010)</xref>
        are
beginning to make clear the operational
boundaries in terms of their selectivity,
volume and timeliness. In earlier work
Collier noted the issue of late warnings,
i.e. where there is a known outbreak in
a country but true alerts at the province
or city level are occluded by the
aggregated system data for the country
as a whole. To overcome this problem
micro-blogging might have a role to
play. Micro-blogs may be able to help
also with very early epidemic detection,
i.e. at the pre-diagnostic stage where
there is maximum scientific uncertainty
about symptoms, transmission routes and
infectivity rates. Automatic geo-coding
and the ability to send messages from
many types of mobile device are a key
advantage in this respect.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        In micro-blogging services such as
Twitter, users describe their experiences
directly in near-real time in short 140
character tweets. As of April 2010 it was
estimated that Twitter had approximately
106 million registered users with 300,000
new users being added each month.
Despite their potential coverage, timeliness
and low overhead, tweets present their
own unique challenges: pre-diagnostic
unedited reports mean that there is a large
trust issue to resolve within the
modeling technique; also social media can
reflect a high degree of reactivity to risk
perception as seen during the H1N1
pandemic in 2009 - redistributing links or
requests for information rather than
generating user experience. To a degree this
reflects newswire coverage and the amount
of uncertainty readers feel. Re-tweets in
themselves may provide useful signal but
their role has yet to be quantified.
Despite these obvious challenges we believe
there is potential for using very short
messages to detect epidemic trends, as
hinted at by the success of Google Flu
Trends
        <xref ref-type="bibr" rid="ref8">(Ginsberg et al., 2009)</xref>
        which
harness user’s search queries.
      </p>
      <p>
        In order to do this we propose to
employ aberration detection for
detecting sharp rises in the features that
signal epidemics. A precursor to this is in
identifying reliable features themselves.
In this study we started by looking at
precautionary actions as identified by
Jones and Salathe´ in their behaviour
response survey
        <xref ref-type="bibr" rid="ref11 ref13">(Jones and Salathe´, 2009)</xref>
        to A(H1N1). Modeling individual risk
perception based on local health
information appears to be an understudied area
in event alerting which may add signal to
early detection models.
      </p>
      <p>
        Recently a number of studies have
appeared looking at the effectiveness of
search queries and social media.
        <xref ref-type="bibr" rid="ref12">(Lampos et al., 2010)</xref>
        studied tweets in the
49 most populated urban centres of the
UK and found a strong linear correlation
with Health Protection Agency influenza
like illness (ILI) data from general
practitioner (GP) consultations during the
2009-2010 influenza season. Studies on
user query data from Google Flu Trends
has also shown strong correlations with
sentinel network data.
        <xref ref-type="bibr" rid="ref18">(Valdivia et al.,
2010)</xref>
        showed for the 2009 Influenza
A(H1N1) pandemic there was a strong
Spearman’s Rho correlation with ILI and
acute respiratory infection (ARI) data
from sentinel networks in Europe.
      </p>
      <p>
        Nevertheless challenges in
interpreting query and social networking data
remain.
        <xref ref-type="bibr" rid="ref15">(Ortiz et al., 2010)</xref>
        for
example discuss the potential for confusion in
Google Flu Trends between ILI and
noninfluenza illnesses. Influenza data was
compared from Google Flu, the CDC
outpatient surveillance network and the
US influenza virologica surveillance
system. Whilst correlation with ILI was
found to be high, it was found that
correlation with actual influenza test
positive results was lower. This result
highlights the fact that both social media
and user queries are secondary indicators
that should be correlated with patient
reported symptoms. Significant deviation
between user’s searching behaviour and
ILI rates was noted for the 2003-2004
influenza season when influenza activity,
pediatric deaths and news media
coverage of influenza were particularly high.
This highlights another understudied
issue: that we need to work hard to remove
elements of reporting bias by users
during media storms.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <sec id="sec-3-1">
        <title>Annotation</title>
        <p>Taking Jones and Salathe´’s behaviour
responses as a starting point we surveyed
potential messages in Twitter in
relation to H1N1 influenza topics. From
an initial group of thirteen categories we
decided, due to low frequency counts,
to conflate several into a final
grouping of four. e.g. avoiding people
who cough/sneeze, avoiding large
gatherings of people, avoiding school/work
and avoiding travel to infected areas
were joined into a general ‘avoidance
behaviour’ category. To this we added
a final category for direct reporting of
influenza. The final list of categories
is: (A) Avoidance Behavior - behaviours
which avoid agents thought to be at
risk of infection; (I) Increased sanitation
- sanitation measures to promote
individual health and prevent infection; (P)
Seeking pharmaceutical intervention
seeking clinical advice or using medicine
or vaccines; (W) Wearing a mask; and (S)
Self reported diagnosis - reporting that
one has influenza.</p>
        <p>As expected there are a number of
caveats to each of these broad classes.
We list up only a representative sample
here: (1) A message is only tagged
positive if the user or a close family
member is the subject of the tweet; (2) If the
message indicates that the action is
hypothetical then the classification is negative;
(3) The time of the reported event should
be within one week of the current time;
(4) Messages can belong to more than
one category. Examples of (anonymized)
messages are shown in Table 1.</p>
        <p>At a practical level the problem of
identifying self protection messages can
be characterised as classifying very
biased data. In order to handle this we
adopted two stages of filtering. The first
stage used a bag of 7 keywords to select
tweets on topics related to influenza (flu,
influenza, H1N1, H5N1, swine flu,
pandemic, bird flu). For 1st March 2010 to
April 30th 2010 this resulted in a pool of
about 225,000 tweets. This first stage of
filtering was also designed to reduce the
ambiguity of keywords such as ‘fever’
and ‘cough’ which occur in a wide
variety of contexts.</p>
        <p>The second stage used hand built
patterns to select a total of 14,508 tweets.
From these we randomly chose 7,412
tweets spread across the five classes. All
duplicates were removed leaving 5,283
messages and the resulting data was then
classified by hand using a single
annotator as detailed in Table 2. Results for
mean character length and standard
deviation showed no category-specific trend
except to illustrate the wide variety of
message lengths.</p>
        <p>In order to test the stability of the
annotation scheme and our
assumptions about its reproducibility we
calculated kappa for 2,116 messages balanced
across all the classes. For this another
annotator was chosen who did not take part
in the creation of the guidelines and was
not a co-investigator in this study. The
simple agreement ratio was 0.88 (the
total number of matched class assignments
divided by the total number of messages).
Kappa was calculated as κ = (pA −
pE)/(1 − pE), where pA was 0.88 and
pE was 0.12. κ was then found to be
0.86. Both results reveal a high level of
agreement in the annotation scheme and
give us confidence to move ahead with
automated classification.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Models</title>
        <p>
          We employed two widely used
classification models implemented in the Weka
Toolkit
          <xref ref-type="bibr" rid="ref10">(Holmes et al., 1994)</xref>
          , Naive
Bayes and Support Vector Machines
(SVMs)
          <xref ref-type="bibr" rid="ref3">(Cristianini and Shawe-Taylor,
2000)</xref>
          to classifying five data sets into
positive or negative. SVM used a
RBFkernel and grid search for finding the
best parameter settings. Since we
hypothesized that custom built regular
expressions might have more traction for
achieving precision we decided to use a
freely available toolkit called the Simple
Rule Language
          <xref ref-type="bibr" rid="ref13">(McCrae et al., 2009)</xref>
          for
this purpose. SRL comes with an
interface for maintaining the rule base which
can be run in testing mode to convert
surface expressions into structured
information.
        </p>
        <p>SRL rules were built from a held out
set of tweets not used in training. Rules
consist of string literals, skip
expressions, word lists, named entity classes
and guard expressions for limiting the
scope of matched entities. Rule
building took approximately 10 hours of work.
The rule book contains specialised
synonym sets to recognize common and
slang terms for medicines (e.g. shot,
vaccine, drug, tamiflu, jab, medicine, vacc),
physicians (e.g. doctor, doc, dr,
physician) and other key domain entities. Verb
lists are maintained for specialized
lexical classes such as prescribe (e.g.
prescribe, perscribe∗). Lists are also built
for pronouns, common temporal adverbs,
modal verbs and negations. Special rules
were built to recognize past events. The
exceptional class was I (increased
sanitation) where we were not able to
identify enough examples with confidence to
build meaningful rules by hand. In this
case only unigrams and bigrams were
used to train the classifiers.</p>
        <p>We found that the language used in
tweets to express user’s behaviour is very
diverse and idiosyncratic so it is
challenging to achieve a high degree of
coverage in the rules with surety. With this
in mind we combined features from the
rules with unigrams and bigrams. If a
rule matched a tweet its feature value was
set to 1, otherwise to 0.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results 1: Classification experiments</title>
      <p>All test runs used 10-fold cross
validation on each of the 5 test sets. We
calculated recall, precision and F-score
performance for each category. As we can
see in Tables 3 and 4, SVM overall
performs better on all categories except for
W (wearing a mask). Both model’s
performance generally follows the amount
of training data except for S (Self
diagnosis) where the F-score is slightly lower
than the trend in other classes despite
large numbers of examples. The overall
trend for Naive Bayes is to have stronger
recall than precision whereas for SVM
precision is generally higher than recall.</p>
      <p>The results suggest that our SRL rule
book seemed to offer substantial benefits
when combined with unigrams but less
certain improvements when combined
with unigrams plus bigrams. Looking
slightly deeper into the results we found a
correlation between message length and
classification accuracy in Naive Bayes
and SVM. For Naive Bayes, whilst the
length of messages didn’t seem to make
much difference to the false negative rate
which remained constant at about 0.2 to
0.25 on messages in the length range of
34 to 144 characters, it impacted to a
greater degree on false positives (0.23 on
shorter messages of length 34 to 56 down
to 0.08 for messages of length 122 to
144). For SVM there appeared to be a
general reduction in both false positives
and false negatives as message length
increased.</p>
      <p>As expected, frequent misspellings,
abbreviated word forms, slang and lack
of punctuation complicated the
classification task. Missing auxiliary verbs
and articles need to be compensated for
within the SRL rules in order to ensure
successful matching.</p>
      <p>Potentially issues of duplication
through re-tweeting still remain which
we have not modelled in this study.
Clearly we should have more confidence
in the alerting model if a larger number
of independent sources report an event at
the same time. This will form part of our
future work. Future work will also need
to ensure that the classification model
remains relevant over time as the data
content in tweets shifts.</p>
    </sec>
    <sec id="sec-5">
      <title>Results 2: Comparison to CDC data</title>
      <p>
        In order to provide a proof of concept
we operationalised the classifiers and ran
them on a corpus of Twitter data called
the Edinburgh Corpus
        <xref ref-type="bibr" rid="ref16">(Petrovic et al.,
2010)</xref>
        . The Edinburgh corpus holds 97
million tweets for the period November
11th 2009 to February 1st 2010 from 9
million users. This represents over 2
billion words from a variety of languages.
Of these 12.5 million are reported as
topic tags, 55 million are @ replies and
20 million are links.
      </p>
      <p>
        We applied the same keyword
filtering method used on the Edinburgh
corpus for the first set of experiments and
obtained 52,193 tweets for the period of
study. Following this we applied the
SVM UNI model and then compared the
week by week volumes against
laboratory results for weeks 47 to 5 of the
20092010 influenza season in the USA
        <xref ref-type="bibr" rid="ref6">(Division, 2010)</xref>
        . Counts are shown in Table 5.
Several intersting trends can be observed:
(a) The total volume of positively
identified Tweets was relatively small
compared to the volume of Tweets as a whole;
(b) Wearing a mask was totally absent
from our classified data; (c) The
aggregated counts for self protection (A+I+P,
data not shown) seem to have a close
correlation to CDC results (data not shown).
      </p>
      <p>To measure correlation we calculated
the Spearman’s Rho 1 between counts of
positive messages in each class and the
CDC laboratory data for A(H1N1).
Table 6 shows moderately strong
correlations. The strongest correlation appeared
when A,I and P were combined. Besides
W which failed to provide any data, the
weakest evidence came from Increased
saniation (I). Differences could be due
to (a) the global geographic coverage of
1See
http://en.wikipedia.org/wiki/Spearmans´rank correlation coefficient
tweets in our collection; and (b) the
syndromes covered in our self protection
behaviour and self reporting messages are
wider than A(H1N1) and could actually
be other diseases such as common colds,
strep throat, adenovirus infection and so
on.</p>
      <p>Drill down analysis reveals that we
still need to do more to remove false
positives by strengthening the linguistic
features within the limits of the 140
character length. Examples of false positives
include interogative sentences,
hypothetical sentences, reports on events that took
place in the distant past, comments on
influenza advice from others, etc.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper we have made the first
steps towards classifying Twitter
messages according to self reported risk
behaviour. The results have shown
moderately strong correlation with CDC
A(H1N1) data but we still need to make
further progress in order to achieve the
high degrees of correlation reported
bewteen Google Flu trends and sentinal
influenza data. The next step will be to
extend our training data, strengthen the
linguistic features and see if we can use
these signals to detect emerging disease
outbreaks. It was shown in Jones and
Salathe´ that after an initial peak in levels
of risk concern, anxiety faded once the
immediate threat of the A(H1N1)
pandemic had passed. In follow up work we
intend to look at how closely these
signals track epidemic case data.</p>
      <p>We also believe that the signals we
have modelled make them applicable to
a wide range of diseases within the
respiratory syndrome and we intend to explore
how these features can be used to detect
diseases other than influenza.</p>
      <p>
        Besides disaster alerting, results from
analysis of behavioural responses may
also help in the future to evaluate the
success of official prevention campaigns.
For example, it is known that little
notice was taken of antiviral therapies,
goggle or mask wearing advice in the
Netherlands after the Avian Influenza epidemic
was introduced to Europe
        <xref ref-type="bibr" rid="ref5">(De Zwart et
al., 2007)</xref>
        . Conversely, empirical studies
of individual risk perception in diesase
severity and susceptibility may help
official agencies in the future to avoid
’over-hyping’ epidemic threats and tune
risk communication strategies more
effectively.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>We gratefully acknowledge support from
the National Institute of Informatics
Global Liason Office for internship
funding for this study. The study was
conceived and directed by NC, corpus
collection and analysis was done by NM and
the machine learning experiments were
done by NS.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Cheng</surname>
            , C. K.,
            <given-names>E. H.</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>D. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ip</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          <string-name>
            <surname>Yeung</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ho</surname>
            , and
            <given-names>B. J.</given-names>
          </string-name>
          <string-name>
            <surname>Cowling</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>A profile of the online dissemination of national influenza surveillance data</article-title>
          .
          <source>BMC Public Health</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>339</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Collier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>What's unusual in online disease outbreak news?</article-title>
          <source>Biomedical Semantics</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ), March. doi:
          <volume>10</volume>
          .1186/2041- 1480-1-2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Cristianini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>J.</surname>
          </string-name>
          Shawe-Taylor.
          <year>2000</year>
          .
          <article-title>An introduction to support vector machines and other kernel-based learning methods</article-title>
          . Cambridge University Press, Cambridge, England; ISBN 0521780195.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Dalton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Durrheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fejsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Francis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carlson</surname>
          </string-name>
          , and E. et al.
          <source>ursan d'Espaignet</source>
          .
          <year>2009</year>
          .
          <article-title>Flutracking: A weekly australian community online survey of influenza-like illness in 2006, 2007 and 2008</article-title>
          .
          <source>Communicable Disease Intelligence</source>
          ,
          <volume>33</volume>
          (
          <issue>3</issue>
          ):
          <fpage>316</fpage>
          -
          <lpage>322</lpage>
          . doi:
          <volume>10</volume>
          .1038/ngeo832.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>De Zwart</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>I. K.</given-names>
            <surname>Veldhuijzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Elam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Abraham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Richardus</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Brug</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Avian influenza risk perception, europe and asia</article-title>
          .
          <source>Emerging Infectious Diseases</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ):
          <fpage>290</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Division</surname>
            ,
            <given-names>CDC</given-names>
          </string-name>
          <string-name>
            <surname>Influenza</surname>
          </string-name>
          .
          <year>2010</year>
          . Fluview:
          <fpage>2009</fpage>
          -2010
          <source>influenza season week 20 ending may 22</source>
          ,
          <year>2010</year>
          .
          <source>Technical report, Centers for Disease Control and Prevention</source>
          , May. Available at http://www.cdc.gov/flu/weekly/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Earle</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Earthquake twitter</article-title>
          .
          <source>Nature Geoscience</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>221</fpage>
          -
          <lpage>222</lpage>
          . doi:
          <volume>10</volume>
          .1038/ngeo832.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Ginsberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohebbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smolinski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Brilliant</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Detecting influenza epidemics using search engine query data</article-title>
          .
          <source>Nature</source>
          ,
          <volume>457</volume>
          :
          <fpage>1012</fpage>
          -
          <lpage>1014</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Hartley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Walters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arthur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yangarber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Madoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Linge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mawudeku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brownstein</surname>
          </string-name>
          , G. Thinus, and
          <string-name>
            <given-names>N.</given-names>
            <surname>Lightfoot</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The landscape of international biosurveillance</article-title>
          .
          <source>Emerging Health Threats J.</source>
          ,
          <volume>3</volume>
          (
          <issue>e3</issue>
          ), January. doi:
          <volume>10</volume>
          .1093/bioinformatics/btn534.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Donkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>WEKA: a machine learning workbench</article-title>
          .
          <source>Technical report</source>
          , Department of Computer Science, Waikato University, New Zealand, September.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and M. Salathe´.
          <year>2009</year>
          .
          <article-title>Early assessment of anxiety and behavioral response to novel swine-origin influenza a(h1n1)</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>4</volume>
          (
          <issue>12</issue>
          ):
          <fpage>e8032</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Lampos</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>T. De Bie</surname>
            , and
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Cristianini</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Flu detector - tracking epidemics on twitter</article-title>
          .
          <source>In Machine Learning and Knowledge Discovery in Databases</source>
          , volume
          <volume>6223</volume>
          /
          <year>2010</year>
          , pages
          <fpage>599</fpage>
          -
          <lpage>602</lpage>
          . Lecture Notes in Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Conway</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Simple rule language editor</article-title>
          .
          <source>Google code project, September</source>
          . Available from: http://code.google.com/p/srleditor/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Okolloh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Ushahidi, or 'testimony': Web 2.0 tools for crowdsourcing crisis information</article-title>
          .
          <source>Participatory Learning and Action</source>
          ,
          <volume>59</volume>
          (
          <issue>1</issue>
          ):
          <fpage>65</fpage>
          -
          <lpage>70</lpage>
          , June.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Ortiz</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Shay</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Neuzil</surname>
            , and
            <given-names>C. H.</given-names>
          </string-name>
          <string-name>
            <surname>Goss</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Does google influenza tracking correlate with laboratory tests positive for influenza?</article-title>
          <source>In Proc. , USA.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Petrovic</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Osborne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The edinburgh twitter corpus</article-title>
          .
          <source>In Proc. #SocialMedia Workshop: Computational Linguistics in a World of Social Media</source>
          ,
          <string-name>
            <surname>LA</surname>
          </string-name>
          , USA, pages
          <fpage>25</fpage>
          -
          <lpage>26</lpage>
          , June.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Sakaki</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okazaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Earthquake shakes twitter users: real-time event detection by social sensors</article-title>
          .
          <source>In Proc. of the 19th International World Wide Web Conference</source>
          , Raleigh,
          <string-name>
            <surname>NC</surname>
          </string-name>
          , USA.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Valdivia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lopez-Alcalde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vicente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pichiule</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ordobas</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Monitoring influenza activity in europe with google flu trends: comparison with the findings of sentinel physician networks - results for 2009-10</article-title>
          . Eurosurveillance,
          <volume>15</volume>
          (
          <issue>29</issue>
          ):
          <fpage>2</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>