<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Cross-lingual Alerting for Bursty Epidemic Events</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nigel Collier</string-name>
          <email>collier@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics and the Japan Science and Technology Agency 1-2-1 Hitotsubashi, Chiyoda-ku, Tokyo</institution>
          ,
          <addr-line>101-8430</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>9</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Online reports are increasingly becoming a source for early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challanges as opportunities due to the patterns of reporting complex spatio-temporal events. In this paper we propose a role for an automated system based on cross-language text mining. We track the evolution of 16 disease outbreaks using 5 aberration detection algorithms on textmined events. Using ProMED reports as a silver standard, news data for 13 languages over a 129 day trial period showed improved recall and timeliness using cross-lingual events.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        As electronic data expands, online
reports are coming to represent a new
modality in early warning surveillance
for natural disasters such as epidemics
        <xref ref-type="bibr" rid="ref7">(Hartley et al., 2010)</xref>
        , typhoons and
earthquakes
        <xref ref-type="bibr" rid="ref14 ref6">(Earle, 2010; Sakaki et al.,
2010)</xref>
        . Recent studies in disease
surveillance such as
        <xref ref-type="bibr" rid="ref5">(Collier, 2010)</xref>
        have shown
that significant challenges still exist for
fine-grained automated understanding of
event dynamics.
      </p>
      <p>
        Since 2006, BioCaster
        <xref ref-type="bibr" rid="ref4">(Collier et al.,
2008)</xref>
        has been performing gathering,
semantic analysis and mapping of global
news reports to provide a near-real time
summary of human epidemics. The
system is used regularly by both national and
international health agencies as well as a
growing base of individual users. Recent
advances include expanding the number
of diseases to include animal pathogens
as well as extending the number of
languages from 4 to 13. With the increase in
data came an understanding that public
health analysts needed more help finding
novel trends in the event stream.
      </p>
      <p>In order to support the task of
detecting the unusual, we compare five widely
used aberration detection algorithms to
look for spikes in the geo-temporal event
stream. In particular this paper seeks to
explore the hypothesis that cross-lingual
events from text mining can provide
improved detection rates for novel events.
Although we focus here on newswire as a
source we believe the results should have
applicability for other unverified reports
such as email lists and the rapidly
emerging blog space.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The 2009 H1N1 pandemic illustrated
how dependent each country is on the
surveillance capacity in other states.
Reducing public health risk depends on
an overall strengthening of global health
event monitoring as well as locally
available sources such as clinical data and
over-the-counter sales data. The Web
provides a low cost surveillance
infrastructure that has been shown to offer
a timely means of detecting epidemics
such as SARS
        <xref ref-type="bibr" rid="ref11">(Mawudeku and Blench,
2006)</xref>
        that is often several days ahead of
the official reporting curve. In addition
to work on BioCaster, there is a small but
growing body of work looking at the
issues of online public health monitoring
such as GPHIN
        <xref ref-type="bibr" rid="ref11">(Mawudeku and Blench,
2006)</xref>
        and MedISys/PULS
        <xref ref-type="bibr" rid="ref16">(Yangarber et
al., 2008)</xref>
        . However, studies
providing details of recall/precision/timeliness
for end user tasks in media-based health
surveillance are still surprisingly limited.
To the best of our knowledge no
previous study has explored the multilingual
effects in this area.
      </p>
      <p>
        Several characteristics of early
epidemic detection make the problem
particularly challenging. Firstly, we want
to catch epidemics as early as
possible before they develop into
humanitarian crises; Secondly, not every
epidemic is of equal importance - those that
are of most concern to the international
community are described by the
International Health Regulations
        <xref ref-type="bibr" rid="ref9">(Lawrence and
Gostin, 2004)</xref>
        ; Thirdly, patterns of
media coverage are complex
        <xref ref-type="bibr" rid="ref13">(Olsen et al.,
2002)</xref>
        , at times focussing on dramatic
and emotive imagery, at others
prioritizing the reader‘s security and economic
interests. In many ways the connection
between media interest and the
population at risk is often blurred.
      </p>
      <p>
        How is this work different to various
research in topic detection and tracking
(TDT)
        <xref ref-type="bibr" rid="ref1">(Allen et al., 1998)</xref>
        that has been
undertaken for the last 14 years? Whilst
both tasks look for events that are highly
localized in time and space, the task we
undertake begins with a predefined event
semantics and a desire to distinguish the
unexpected from the typical. Put another
way, bursts in media interest do not
always correspond to public health
significance. The stream of work here seeks
to uncover underlying trends and factors.
Neither is this task entirely the same as
TDT’s first topic detection since we
measure performance partly by the number
of days before the silver standard that we
can capture an event.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <sec id="sec-3-1">
        <title>Evaluation</title>
        <p>
          In general it is extremely difficult to
determine ground truth for the actual
numbers and durations of disease outbreaks.
As a silver standard we have chosen the
best publicly available human network of
reporters which is ProMED-mail
          <xref ref-type="bibr" rid="ref10">(Madoff and Woodall, 2005)</xref>
          . ProMED-mail
is a program of the International Society
for Infectious Diseases with many expert
volunteer reporters globally and a
sophisticated staged editorial process.
Outbreak reports are distributed to 40,000
subscribers by email, RSS feed and Web
portal - precisely the audience we target
in our automated system.
        </p>
        <p>In this study we have used quite
coarse-grained granularity by choosing
countries and days as the units. This is
due to the current limits of reliable
location detection in the system and also the
frequency of news that we observe. The
recorded time for each event was
normalized to system download time which
takes place every hour of each day.</p>
        <p>Evaluation uses the standard
classification test measures of sensitivity
(recall), specificity, positive predictive value
(PPV or precision), negative predictive
value (NPV) and timeliness. We also
measured the average number of system
alarms per 100 days and compared this to
the silver standard. The F-measure (F1)
is calculated in the usual way as the
harmonic mean of sensitivity and PPV.</p>
        <p>The standard for a true positive was
to obtain a system alert on a
countrydisease event on or before the silver
standard alert. The period for a qualifying
system alert was set as up to 7 days prior
to and including a qualifying ProMED
report on the same topic. True positives
were increased by 1 if there was any
system alert that fell within the 7 day
period. Multiple system alerts did not count
twice. False positives were increased by
1 for each system alert that fell outside
of the 7 day window. False negatives
were counted as the number of qualifying
alert periods when there were no system
alerts. True negatives were counted as
the number of days outside of any
qualifying alert period when no system alert
was given.</p>
        <p>In testing we tried to maximize F1
together with timeliness.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data</title>
        <p>Figure 1 shows the 16 event streams that
we explored. The events chosen for this
study were determined based on
diversity of geographical and media coverage
rather than random selection. The 16
event streams contain 2064 surveillance
days with 153 events (7.4% of
alerting days)1. Since we wanted to explore
the hypothesis that linguistic coverage in
multiple languages could strengthen
detection rates and timeliness we compared
English news coverage against all
languages including English for each of 16
disease outbreaks. Because cross-lingual
events on the 13 languages were only
available from December 2009, the trial
period was from January to May 2010.</p>
        <p>
          ProMED reports used in the silver
standard excluded those that fell outside
our case definition, based on the
International Health Regulations
          <xref ref-type="bibr" rid="ref9">(Lawrence
1Note that system data from the study will be
made publicly available online for re-use
and Gostin, 2004)</xref>
          decision tree
instrument. For example, requests for
information, reports primarily focussed on
control measures and aggregated summary
reports not arising from specific events.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Text mining system</title>
        <p>The text mining system we explored
involves a semantic pipeline of modules
running on a high throughput cluster
computer with 48 Xeon cores.
Throughput is approximately 9000 articles per
day. System news was gathered from
multiple news sources through Google
News and MeltWater News as well as
specialized sources such as the European
Media Monitor, IRIN and ReliefWeb.
(Note that no ProMED-mail messages
were included in the system data for
this study using a block on the
Internet domain and message title). In
total this gives us access to over 80,000
news sources globally. The languages
used in the study (in ISO-639-1) are:
ar,zh,nl,en,fr,de,it,ko,pt,ru,es,vi and th.</p>
        <p>
          Underlying the system is a publicly
available multilingual application
ontology
          <xref ref-type="bibr" rid="ref3">(Collier et al., 2006)</xref>
          which is used
within the rule books to make basic
inferences such as countries from names
of provinces, or diseases from causal
pathogens. The BioCaster ontology
(BCO) rules also allow us to unify
variant forms of terms such as the 11 forms
of A(H1N1).
        </p>
        <p>
          After data sourcing, translation takes
place from the twelve non-English
languages used in this study using Google
Translate. Following this, text
classification using Naive Bayes (F-score 0.93)
removes non-disease outbreak news
before text mining is applied. Rules are
based on a regular expression matching
toolkit called the Simple Rule Language
          <xref ref-type="bibr" rid="ref12">(McCrae et al., 2009)</xref>
          and divided
between 18 entity types and template rules.
        </p>
        <p>The final structured event frames in XML
includes slot values normalized to BCO
root terms for disease, pathogen (virus
or bacterium), time period, country and
province. Additionally we also identify
15 aspects of public health events
critical to risk assessment. For the purpose
of this study we only made use of disease
and country slots. Events in the 13
languages are treated in this study as being
part of a univariate model for comparison
purposes against English events.</p>
        <p>Latitude and longitude of events down
to the province level are found
automatically using Google‘s API up to a limit
of 15000 lookups per day, and then
using lookup on 5000 country and province
names harvested from Wikipedia.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Alerting models</title>
        <p>
          We experimented with a range of popular
models for early alerting used in the
public health community: the Early
Aberration and Reporting System (EARS) C3,
C2 and W2 models as well as the
Fstatistic and the Exponential Weighted
Moving Average (EWMA). All were
implemented in Excel for the purpose of
this study. The models are what might be
termed ‘snapshot’ models because they
all use short 7 day baselines that assume
a relatively stationary background, i.e.
ignoring medium to long term periodic
variations such as seasonal cycles. The
baselines are used to predict future trends
against which the current day values are
compared. All models also use a 2 day
‘guard period’ just before the target day
t to prevent the current day’s data from
being included in the baseline. All
models use a minimally supervised method
by setting a threshold parameter which
we determined using the same 5 held out
data sets used by
          <xref ref-type="bibr" rid="ref5">(Collier, 2010)</xref>
          . These
were 0.2 (C2 and W2), 0.3 (C3), 0.6
(Fstatistic) and 2.0 (EWMA). A minimum
standard deviation was set at 0.2 and a
frequency purge was applied to remove
event counts of 1 per day.
        </p>
        <p>
          C2
The Early Aberration and Reporting
System (EARS) algorithms
          <xref ref-type="bibr" rid="ref8">(Hutwagner et
al., 2003)</xref>
          are based on cumulative sum
calculations commonly used in quality
control. C2 triggers an alert when a test
statistic St exceeds a number k of
standard deviations above the baseline mean:
St = max(0, (Ct −(μt +kσt))/σt) (1)
where Ct is the event count on the
target day, μt and σt are the mean and
standard deviation of the counts during the
baseline period. We set k to 1 for all
experiments.
        </p>
        <p>
          C3
C3 is a modified version of C2 so that the
previous 2 observations (within the guard
period) are added to the test statistic if
the counts on those days does not exceed
a threshold of 3 standard deviations plus
the mean on those days. The rationale
here is to extend the sensitivity of C2.
W2
W2
          <xref ref-type="bibr" rid="ref15">(Tokars et al., 2009)</xref>
          is a stratified
version of C2 which compensates for
weekend data outages by removing
Saturday and Sunday data counts from the
baseline. Alerting though can take place
on any day.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>F-statistic</title>
        <p>
          The calculation for the
          <xref ref-type="bibr" rid="ref2">(Burkom, 2005)</xref>
          is:
        </p>
        <p>F-statistic
2 2</p>
        <p>St = σt + σb
where σt2 approximates the variance
during the testing window and σb2
approximates the variance during the
baseline window.</p>
        <p>Calculation is as follows:
(2)
(3)
(4)
nt test
1 nt
nb test
(Ct − μb)2
(Ct − μb)2
(5)
(6)</p>
      </sec>
      <sec id="sec-3-6">
        <title>EWMA</title>
        <p>Unlike other models in our test, the
Exponentially Weighted Moving Average
(EWMA) provides for a non-uniformly
weighted baseline by down-weighting
counts that are on days further from the
target day:</p>
        <p>Y1 = C1</p>
        <p>Yt = λCt + (1 − λ)Yt−1
where 1 &gt; λ &gt; 0 is a parameter that
controls the degree of smoothing. The
optimal level found from held out data
was found to be 0.2. The test statistic is
calculated as:</p>
        <p>St = (Yt −μt)/[σt ×(λ/(2−λ))0.5] (7)</p>
        <p>As above, μt and σt are the mean and
standard deviation on the baseline
window.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Interestingly we found that 80% of
news reports covered only about half the
ProMED-mail alert disease-country
topics, implying that the remaining 20% of
news has to provide coverage for almost
half the topics. Surprisingly, the trend
was broadly similar for both English and
all language news. Although the
sample size is relatively small, given that the
events we chose were from all regions of
the world, this implies that having news
in more languages may have a deepening
effect rather than a broadening effect on
event coverage. The three notable
exceptions were in the cases of FMD in China
(e4 in Figure 1), Dengue in Brazil (e12)
and Dengue in Bolivia (e13).</p>
      <p>Results for global events on English
(Table 1) show an advantage for the
Fstatistic if we are primarily concerned
with sensitivity (recall) and alerting rates
(shown in column B). However the
Fstatistic has a clear disadvantage with
PPV (precision) which impacts heavily
on the number of false alarms. This can
be seen clearly by comparing the alarm
rate per 100 days of 16.2 in column A
with the ProMED average of 7.4. Both
advantages and disadvantages are
amplified when we add cross-lingual events.</p>
      <p>Whilst the F-statistic has the highest
overall F1, its high rate of false alarms
reflected in the PPV makes it potentially
an undesirable choice. If we seek for the
best balance of F1 and timeliness with a
minimum of false alarms then C3 looks
like a more desirable alternative.</p>
      <p>Cross-lingual event capture seemed to
extend sensitivity in all models,
improving F1 and timeliness. To see if we could
harden our intuitions about these effects
we looked specifically at South East Asia
- a region where we would expect the
representation of Chinese to be
proportionately greater than English. Table 2 shows
results which largely mirror those for the
world as a whole. The noticable
exception though is that EWMA shows a large
drop in performance.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Although the sample size is limited, the
data suggests trends in model
performance. C3 seems to perform best when
we consider that the high false alarm
rate for the F-statistic could desensitize
users. Cross-language events generally
seem to improve F1 performance by
several points across most models except
for EWMA. The benefits come from an
extension in sensitivity but could be
focussed on topics where we already have
large coverage of English news. This is
not to say that multilingual news is not
useful, as we comment below, it could
be that it has a greater role to play in
extending detection rates of novel events
at lower levels of geographic granularity
than the country.</p>
      <p>The study raises several questions
about factors in the imbalance of
reporting: why did Dengue in Brazil (e13) or
FMD in China (e13) receive such
massive local coverage but
disproportionately less in the English media? Why
did cholera in Angola (e9) or influenza
in Romania (e8) receive comparatively
low coverage overall? We also observed
that the USA epidemics (e15 and e16)
were widely reported in English but not
so greatly in other languages.</p>
      <p>In order to illustrate the potential
complexity of the task we provide a
detailed drill-down analysis of one of the
outbreaks in our data set, i.e. cholera
in Angola. Just to put the
reporting of this outbreak into context:
Angola itself is a former Portuguese colony
which has suffered major outbreaks (e.g.
2006 to 2008) of cholera due to poor
sanitation, drinking water infrastructure
and environmental conditions. Although
UNICEF has commented on recent
advances, the country remains at risk,
especially during the rainy season from
January to mid-May.
21/1/2010 BioCaster detects 1 report in</p>
      <p>Spanish of 31 cases of cholera from
October to December 2009. The
report is republished in English and
again in Spanish and Portuguese
over the next few days. Since the
report is for a historical outbreak (&gt; 3
weeks old) it is a false positive.
19/2/2010 ProMED-mail issues a
report in English on cholera in
Bocoio, Angola between 12/2/2010
and 18/2/2010. The cited source
was the Angola Press Agency on
19/2/2010. BioCaster failed to
capture this, so it is a false negative.</p>
      <p>4/3/2010 The Angop issues a report of 8
Mean number of days that alerts were given before
ProMED</p>
      <sec id="sec-5-1">
        <title>Model</title>
        <p>mail reports. Figures in parentheses show 95% CI.
guage.
alarms per 100 days;
The mean number of ProMED-mail alerts per 100 days was 8.1.</p>
        <p>B</p>
        <p>Mean number of days that alerts were given before
ProMED</p>
      </sec>
      <sec id="sec-5-2">
        <title>Model mail reports. Figures in parentheses show 95% CI.</title>
        <p>5.1
5.2</p>
        <p>NPV
0.9
(0.93,0.97)
0.92
(0.93,0.97)
0.92
(0.92,0.96)
0.82
(0.94,0.97)
0.91
(0.92,0.96)
NPV
0.89
(0.94,0.97)
0.91
(0.94,0.98)
0.91
(0.94,0.97)
0.79
(0.95,0.98)
0.89
(0.92,0.96)
deaths from cholera in the province
of Namibe. The report is cited by
ProMED-mail on 19/3/2010.
6/3/2010 BioCaster detects the 4/3/2010
report in its Spanish version. The
status of this report is a false
positive in the silver standard but should
be considered as a true positive
since it is a direct translation of a
cited source used by ProMED-mail.
10/3/2010 BioCaster detects reports in
Spanish of a prevention campaign
against cholera in Luanda. This
seems to be a false positive but
several such reports raise a
system alarm. The high average
number of reports means that smaller
peaks of true positives on 14/3/2010
and 16/3/2010 do not raise system
alarms. The F1 scores for
multilingual reports are therefore lower than
for English.
14/3/2010 BioCaster detects 1 report
in French from the Governor of
Luanda requesting civil protection
measures to prevent the
proliferation of cholera following heavy rain
and flooding. The indication of
infrastructure stress is highly
indicative of a true positive. Due to the
high frequency of Spanish reports
on 10/3/2010 no alarm is given.
19/3/2010 ProMED-mail issues a
report in English of cholera deaths
in Tombua (Tombwa), southern
Namibe, between 1/3/2010 and
3/3/2010. The cited source was
Angop on 4/3/2010.</p>
        <p>In this case BioCaster was more
successful for English than for the
multilingual system because a false spike of
reports occluded subsequent true positives.
In the case of the silver standard report on
19/3/2010, the cited English source was
not detected but its Spanish translation
was found a few days later - still much
earlier than the ProMED-mail report.</p>
        <p>The example is a relatively
special case that illustrates an event that
was not widely re-reported. The
reports were made in English, Portuguese,
French and Spanish from Angop.
Externally, the 4/3/2010 article from
Angop was republished in allafrica.com and
africanseer.com on the 4th March. It was
also referenced in a blog by the Namibia
online community.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Automated health surveillance using text
mining is not intended as a substitute
for skilled human analysis but as these
results show it does have the potential
to reduce their information burden if
informed choices are used to govern the
selection of models.</p>
      <p>Obvious improvements to the
techniques described here could take place
by modeling lower geographic
granularity and reducing size differences
between geo-units. More sophisticated
approaches might incorporate proximity
information between events or model how
events propagate through news space.</p>
      <p>A more subtle effect of the
granularity restriction is that the models we
presented do not allow us to follow what
might be called ‘late warning’ signals.
i.e. follow on events within the
country’s borders. For this reason detecting
events below the country level is
desirable. Future work will need to
concentrate on maximizing system sensitivity to
overcome the fragmentation of the event
distribution that occurs when we bucket
events into smaller geographic units.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>I greatly acknowledge the comments by
the anonymous reviewers. Funding
support was provided in part by the Japan
Science and Technology Agency under
the PRESTO programme.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , J. Carbonell, G. Doddington,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yamron</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Topic detection and tracking pilot study final report</article-title>
          .
          <source>In DARPA Broadcast News Transcription and Understanding Workshop</source>
          , Lansdowne, Virginia.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Burkom</surname>
            ,
            <given-names>H. S.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>Accessible alerting algorithms for biosurveillance</article-title>
          .
          <source>In National Syndromic Surveillance Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Collier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kawazoe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shigematsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Takeuchi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kawtrakul</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>A multilingual ontology for infectious disease surveillance: rationale, design and challenges</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>40</volume>
          (
          <issue>3</issue>
          -
          <fpage>4</fpage>
          ).
          <source>DOI: 10.1007/s10579-007- 9019-7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Collier</surname>
            , N.,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kawazoe</surname>
            ,
            <given-names>R. Matsuda</given-names>
          </string-name>
          <string-name>
            <surname>Goodwin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Conway</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Tateno</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Ngo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Dien</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kawtrakul</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Takeuchi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Shigematsu</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Taniguchi</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>BioCaster:detecting public health rumors with a webbased text mining system</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>24</volume>
          (
          <issue>24</issue>
          ):
          <fpage>2940</fpage>
          -
          <lpage>1</lpage>
          , December. doi:
          <volume>10</volume>
          .1093/bioinformatics/btn534.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Collier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>What's unusual in online disease outbreak news?</article-title>
          <source>Biomedical Semantics</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ), March. doi:
          <volume>10</volume>
          .1186/2041- 1480-1-2.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Earle</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Earthquake twitter</article-title>
          .
          <source>Nature Geoscience</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>221</fpage>
          -
          <lpage>222</lpage>
          . doi:
          <volume>10</volume>
          .1038/ngeo832.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hartley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Walters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arthur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yangarber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Madoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Linge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mawudeku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brownstein</surname>
          </string-name>
          , G. Thinus, and
          <string-name>
            <given-names>N.</given-names>
            <surname>Lightfoot</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The landscape of international biosurveillance</article-title>
          .
          <source>Emerging Health Threats J.</source>
          ,
          <volume>3</volume>
          (
          <issue>e3</issue>
          ), January. doi:
          <volume>10</volume>
          .1093/bioinformatics/btn534.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Hutwagner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Seeman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Treadwell</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>The bioterrorism preparedness and response early aberration reporting system (ears)</article-title>
          .
          <source>J. Urban Health</source>
          ,
          <volume>80</volume>
          (
          <issue>2</issue>
          ):
          <fpage>i89</fpage>
          -
          <lpage>i96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gostin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>International infectious disease law - revision of the World Health Organization's international health regulations</article-title>
          .
          <source>J. American Medical Informatics Association</source>
          ,
          <volume>291</volume>
          (
          <issue>21</issue>
          ):
          <fpage>2623</fpage>
          -
          <lpage>2627</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Madoff</surname>
          </string-name>
          , Lawrence C. and
          <string-name>
            <surname>John P. Woodall</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>The internet and the global monitoring of emerging diseases: Lessons from the first 10 years of promed-mail</article-title>
          .
          <source>Archives of Medical Research</source>
          ,
          <volume>36</volume>
          (
          <issue>6</issue>
          ):
          <fpage>724</fpage>
          -
          <lpage>730</lpage>
          . Infectious Diseases:
          <article-title>Revisiting Past Problems</article-title>
          and Addressing Future Challenges.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Mawudeku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Blench</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Global public health intelligence network (gphin)</article-title>
          .
          <source>In Proc. 7th Int. Conf. of the Association for Machine Translation in the Americas</source>
          , Cambridge, MA, USA,
          <year>August</year>
          8-12.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Conway</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Simple rule language editor</article-title>
          .
          <source>Google code project, September</source>
          . Available from: http://code.google.com/p/srleditor/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Olsen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carstensen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hoyen</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Forgotten humanitarian crises, conference on the role of the media, decisionmakers and humanitarian agencies, copenhagen</article-title>
          . In Humanitarian Crises:
          <article-title>What determines the level of emergency assistance? Media coverage, donor interests, and the aid business, 23 October</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Sakaki</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okazaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Earthquake shakes twitter users: real-time event detection by social sensors</article-title>
          .
          <source>In Proc. of the 19th International World Wide Web Conference</source>
          , Raleigh,
          <string-name>
            <surname>NC</surname>
          </string-name>
          , USA.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Tokars</surname>
            ,
            <given-names>J. I.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Burkom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>English</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bloom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cox</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavlin</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Enhancing time-series detection algorithms for automated biosurveillance</article-title>
          .
          <source>Emerging Infectious Diseases</source>
          ,
          <volume>15</volume>
          (
          <issue>4</issue>
          ):
          <fpage>533</fpage>
          -
          <lpage>539</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Yangarber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , P. von Etter, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Content collection and analysis in the domain of epidemiology</article-title>
          .
          <source>In Proc. Int. Workshop on Describing Medical Web Resources (DRMED</source>
          <year>2008</year>
          ), Gotenburg, Sweden, May
          <year>27th</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>