Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           61


                  Constructing a Syndromic Terminology Resource
                            for Veterinary Text Mining
         Lenz Furrer                          Susanne Küker              John Berezowski
          Institute of                    Veterinary Public Health Department of Clinical Research
    Computational Linguistics                     Institute         and Veterinary Public Health
      University of Zurich                   University of Bern          University of Bern
    lenz.furrer@uzh.ch                     susanne.kueker              john.berezowski
                                         @vetsuisse.unibe.ch @vetsuisse.unibe.ch

         Horst Posthaus                 Flavie Vial                  Fabio Rinaldi
Institute of Animal Pathology Veterinary Public Health Institute       Institute of
       University of Bern            University of Bern          Computational Linguistics
      horst.posthaus                  flavie.vial                  University of Zurich
  @vetsuisse.unibe.ch @vetsuisse.unibe.ch fabio.rinaldi@uzh.ch

                      Abstract                                  detecting mentions of fever in free-text clinical
                                                                records. Similarly, the BioCaster system (Collier
     Public health surveillance systems rely on the             et al., 2006; Collier et al., 2008) relies on a care-
     automated monitoring of large amounts of                   fully constructed medical ontology combined with
     text. While building a text mining system
                                                                a Naïve-Bayes classifier as an input filter. Friedlin
     for veterinary syndromic surveillance, we ex-
     ploit automatic and semi-automatic meth-
                                                                et al. (2008) use a regular-expression based term-
     ods for terminology construction at different              extraction system to find positive and negative
     stages. Our approaches include term extrac-                mentions of methicillin-resistant Staphylococcus
     tion from free-text, grouping of term variants             aureus in culture reports. Hartley et al. (2010) give
     based on string similarity, and linking to an              an overview of surveillance systems that mainly fo-
     existing medical ontology.                                 cus on world-wide monitoring of web sources, in-
                                                                cluding news feeds and informal medical networks.
1    Introduction                                                  The text mining of veterinary reports faces ad-
                                                                ditional challenges such as multiple species and
In the project Veterinary Pathology Text Mining,
                                                                a less controlled vocabulary (Smith-Akin et al.,
we are developing tools to exploit veterinary post-
                                                                2007; Santamaria and Zimmerman, 2011). Up to
mortem data for epidemiological surveillance and
                                                                this point, approaches for classifying veterinary di-
early detection of animal diseases. This paper de-
                                                                agnostic data into syndromes for surveillance have
scribes the work in progress on the construction of
                                                                been restricted to the use of rule-based classifiers
a veterinary terminology resource as a basis for a
                                                                (Dórea et al., 2013; Anholt et al., 2014). To build
text mining tool to classify, with minimal human
                                                                these classifiers, a group of experts manually cre-
intervention, free-text veterinary reports with re-
                                                                ates a large set of rules. The rules are then used
spect to multiple clinical syndromes that can be
                                                                to classify veterinary diagnostic submissions into
monitored.
                                                                syndromes based on the presence or absence of
   In human medicine, text mining has been suc-
                                                                specific words within various fields in the diagnos-
cessfully applied to clinical records in many pub-
                                                                tic submission data.
lic health surveillance systems (Botsis et al., 2011;
Steinberger et al., 2008; Brownstein et al., 2008;                 We propose to develop a process for using text
Wagner et al., 2004). The approaches range                      mining methodologies (natural language process-
from hand-written rule-based systems to fully au-               ing) to efficiently extract relevant health informa-
tomated methods using machine learning. For                     tion from veterinary diagnostic submission data
example, Chapman et al. (2004) use heuristical                  with minimal human intervention. Given a suf-
keyword-driven as well as supervised machine                    ficient amount of data (i. e. at least a few hun-
learning techniques (Naïve-Bayes classifier) for                dreds of manually classified reports), a machine
               Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           62


learning approach will allow us to directly classify            2.1 Syndrome and Diagnosis Classification
these data into syndromes that can be monitored                 The work described here is based on post-mortem
for surveillance.                                               reports that were compiled by the Institute of Ani-
   As recognized in the Swiss Animal Health Strat-              mal Pathology (ITPA) of the Vetsuisse faculty at
egy 2010+, methods for early disease detection,                 the University of Bern. The data were entered
based on the increasing abundance of data on ani-               into a database by veterinary pathologists between
mal health stored in national databases, can con-               2000 and 2011. We used a subset of approximately
tribute to valuable and highly efficient surveil-               9 000 report entries regarding pigs and cattle. The
lance activities. Post-mortem data, available from              reports are written in German, with a small frac-
pathology services, are often under-exploited. The              tion (less than 3 %) in English and French.
main purpose of post-mortem investigations of                      For subsequent quantitative analysis, we clas-
food production animals is to provide information               sified all reports using two categorization lev-
about the cause of disease or death with regard                 els. As a coarse-grained categorization, we anno-
to treatment, and prevention options for the af-                tated each report with the syndromic groups that
fected herd. Besides these major diagnoses, all                 were affected by a medical issue. Each report
additional pathological findings are also recorded              was assigned zero, one or more of 9 syndrome
as text and electronically archived as necropsy re-             categories (gastro-intestinal, respiratory, urinary,
ports. In addition to the value of this information             cardio-vascular, lymphatic, musculo-skeletal, re-
for veterinarians and farmers, systematic evalua-               productive, neural, other). This categorization ap-
tion of necropsy data may be of value the early                 proximately meets the level of granularity found in
detection of spatio-temporal clusters of syndromes              other work (Dórea et al., 2013; Warns-Petit et al.,
which may result from a new disease emerging into               2010). For a finer-grained categorization of the re-
a population or from changing patterns of endemic               ports, we additionally annotated post-mortem di-
diseases. As such, it has the potential to be of value          agnoses mentioned (directly or implicitly) in the
for both nation-wide and international (veterinary)             reports, such as enteritis, lipidosis, or injuries from
public health early-warning systems.                            foreign bodies. The set of diagnoses was not de-
   The rest of this paper is organized as follows:              fined a priori, but continuously updated in the clas-
We present our efforts in constructing and exploit-             sification process. The final set comprised some
ing a veterinary terminology resource in Section 2.             100 classes and is shown in Table 1. The diagnoses
Section 3 describes our work towards report clas-               are modeled as subcategories of the syndromes.
sification in the context of building a surveillance            While some category names occur in more than
tool. The next steps and further application scenar-            one syndromic category, it does not mean that they
ios are given in Section 4.                                     are ambiguous, as they are triggered by different
                                                                terms. For example, atresia is classified as a con-
2   Terminology Construction                                    genital abnormality of the gastro-intestinal system,
In the process of report classification, we have put            whereas the ventricular septal defect is a congeni-
a lot of effort in the construction of a terminology            tal abnormality of the cardio-vascular system.
resource that suited our needs. The resulting term
                                                                2.2 Term Normalization
inventory is tailored to a very specific task. Still,
the methods, insights and even the resource itself              The medical reports have a high number of sur-
can be of use for other applications. Similar to the            face variants per term. The variation is caused by
work by Rinaldi et al. (2002), we extracted a set of            inflection, inconsistent spelling and typographical
terms from a collection of raw text and used auto-              errors. On a higher level, variation is increased
matic methods to organize them into a hierarchical              by synonymy, i. e. the use of different terms for
structure. Section 2.1 introduces the categories we             the same concept (e. g. Lipidose/Verfettung ‘lipi-
used for classification. In Sections 2.2 and 2.3, we            dosis’). From the perspective of the given text min-
describe the steps that led to the construction of the          ing task, certain derivative forms can be consid-
term inventory. Sections 2.4 and 2.5 show how this              ered synonymous variants as well (e. g. Ulzeration
resource can be automatically enhanced for a more               besides Ulkus).
general usage.                                                    We split the report texts into tokens, which we
               Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           63


  gastro-intestinal    perforation     35        cystitis         56          congenital             hydrocephalus 18
abomasal ulcer 178     pharyngitis      9        hydronephrosis 30               abnormality 30      intoxication     5
abomasitis       120   proctitis       13        nephritis       416          fracture        94     meningitis     226
acidosis          41   reticulitis     98        renal                        luxation         9     myelitis        13
cheilitis          6   rumenitic ulcer 4            degeneration 116          myodegen-              neural
cholangitis       84   rumenitis       92        trauma            8            eration       70        degeneration 54
colitis          368   sialoadenitis    2        urolithiasis     75          myopathy        56     neuropathy      78
congenital             steatorrhea    105         cardio-vascular             osteochondrosis 30             other
   abnormality 21      stenosis        10        cardiomyopathia 46           osteomyelitis 136      crushed          81
dilatation       159   stomatitis      57        congenital                   polyarthritis 343      dermatitis      184
displaced              trauma          25          abnormality 84             synovitis       45     enterotoxemia 285
   abomasum       55   typhlitis       37        endocarditis   179           tendinitis      25     eye related      22
duodenitis        12   volvulus       479        epicarditis     62           tendovaginitis 18      foreign body 118
enteritis      2458        respiratory           heart                            reproductive       hernia           73
esophagitis       46   bronchiolitis 256           degeneration 50            abortion        642    hydrothorax     104
gastric ulcer 206      bronchitis      466       hydropericard 319            congenital             inanition        74
gastritis        121   broncho-                  myocarditis     77              abnormality    8    intoxication     95
glossitis         16     pneumonia 1040          pericarditis  427            dystocia          9    iron deficiency 65
hepatitis        200   laryngitis       23       pleuritis       41           metritis         70    mastitis         66
HIS              361   pharyngitis       1                                    perforation       4    neoplasia        68
Hoflund                                               lymphatic
                       pleuritis        40       lymph-                       placentitis     126    otitis           20
   syndrome       10   pneumonia       769                                    retained placenta 3    perforation     257
icterus           39                               adenopathy 245
                       rhinitis         11       splenitis      77            uterine                peritonitis    866
ileitis           60   rhinitis                                                  perforation    4    pleuritis      643
invagination      59                             tonsillitis    88
                          atrophicans 196                                     uterine torsion   3    pododermatitis 19
jejunitis         30   sinusitis         8        musculo-skeletal            vaginitis         8    polyserositis 297
lipidosis         93   tracheitis       28       arthritis      231                                  rumen drinker 33
obstipation       13                             arthrosis       31                neural
                            urinary                                           congenital             sepsis          647
omasitis          18                             bone                                                splenic torsion 18
pancreatitis       2   congenital                   degeneration 17             abnormality   5
                         abnormality       1                                  encephalitis  116      umbilicus
parasites         13                             callus          14                                     related      117


       Table 1: The diagnoses used for classification, grouped by syndrome, with number of occurrences.


defined as consecutive runs of alphanumeric char-               k. The complexity of this rule is owed to the fact
acters or hyphens. We then performed a series of                that this normalization is applied to all words, i. e.
normalization steps in order to reduce the number               including originally German words like Kinn/Zinn
of term variants when compiling an index.                       ‘chin’/‘tin’, which would be confused by an uncon-
                                                                ditional conflation of c, k, z. As a side effect, the
   The bulk of the spelling variation stems from                normalization of German terms occasionally cap-
Latin/Greek-originated terms, such as Zäkum ‘ce-                tured closely spelled English terms (which were
cum’. Besides the German spelling (using the let-               not systematically gathered), such as Enzephali-
ters ä, ö, z/k), the Latin spelling is often used (ae,          tis/encephalitis.
oe, c, respectively), and even combinations of the                 Subsequently, we removed inflectional suffixes
two are encountered. For the previous example,                  using the NLTK1 implementation of the “Snow-
the following variants are present, among others:               ball” stemmer for German (Porter, 1980). Stem-
Caecum, caecum, Cäcum, Cäkum, Zaecum. We                        ming is the process of removing inflectional and
normalized the usage of these letters by replacing ä            (partially) derivational affixes, thus truncating
with ae and ö with oe unconditionally, while treat-             words to their stems. For example, minimally and
ing c differently based on its right context: be-               minimize are both reduced to minim in Porter’s En-
fore a front vowel it was replaced by z, before h               glish stemmer, which is not a proper word, but nev-
and k it was kept as c, and in all other cases (in-
                                                                   1
cluding word-final position) we replaced it with                       Natural Language Toolkit: www.nltk.org
                Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                            64


      variants                                                          normalized form         explanation
      Zäkumtorsion, Caecumtorsion                                       zaekumtorsion    }
                                                                                                ä/ö/c/k/z normalization
      Kokzidiose, Coccidiose                                            kokzidios
      Aborts, Abort, Abortes, Aborte, Aborten                           abort                   stemming
      perforierter Ulcus, perforierten Ulkus                            perforiert ulkus }
                                                                                                both
      Kardiomyopathie, Cardiomyopathie, Kardiomyopathien                kardiomyopathi

                                          Table 2: Normalization examples.


ertheless a useful key for lumping together etymo-               ized form, a number of variant forms is already
logically related words.                                         matched, as described above. We aimed to addi-
   Stemming is based on orthographical regulari-                 tionally cover variants produced by misspellings
ties and uses only a minimal amount of lexical in-               as well as inflected forms not recognized by the
formation. Although the method is not flawless –                 stemmer. Using approximate string matching, we
it may be prone to errors with very short and ir-                searched the reports for similar terms for each of
regularly inflected words – it generally works well              the focus terms. We used the simstring tool
for languages with alphabetic script and has been                (Okazaki and Tsujii, 2010) for retrieving similarly
successfully applied to many European languages.                 spelled terms among the entire text collection. Ap-
Using a stemmer, we were able to considerably re-                proximate matching is a difficult task, as it is hard
duce the number of inflectional/derivational vari-               in general to formally define similarity among (the
ants. However, a number of inflectional forms                    orthographical representations of) words in a way
were still missed by the stemmer – especially plural             consistent with human judgement. simstring
forms with Latin inflection, such as Ulkus/Ulzera,               measures similarity as a function of the number
or Enteritis/Enteritiden, which are not covered by               of shared n-grams (runs of n characters) in two
the stemming rules for general German grammar.                   words, which is only a rough approximation of
The stemmer also failed to capture most of the                   the task. However, compared to other similarity
spelling errors. Table 2 illustrates the conflation              measures – e. g. Levenshtein’s edit distance2 – it
with examples.                                                   is considerably more efficient for retrieval in large
                                                                 amounts of text. In the inevitable trade-off of good
2.3     Focus Terms                                              precision and high recall, we strove for recall by
For the syndromic classification of the veterinary               choosing a low similarity threshold for retrieval.
reports, we manually created a list of focus terms               As expected, this resulted in a high number of hits,
which served as indicators for the clinical syn-                 including many false positives, i. e. words with a
dromes and diagnoses. Starting from a frequency-                 high n-gram similarity score, that are not actually
ranked list of the words found in all of the reports             similar to the input term (e. g. arthritis and arteri-
(already grouped by their normalized form), we                   tis). Due to the limited number of focus terms it
manually selected terms that were likely to indicate             was feasible to manually clean the list of similar
(positive) diagnoses in the reports. The list was re-            words.
fined by inspecting the reports that produced hits                  Figure 1 illustrates how term variants were gath-
for the focus terms.                                             ered around the concept of a diagnosis. A num-
   The focus terms typically consist of a single to-             ber of synonymous and hyponymous terms were
ken, but we also allowed multi-word expressions.                 added to a specific diagnosis by a human ex-
The terms are grouped by diagnosis. Thus, each                   pert. These terms were used as seeds to automat-
diagnosis refers to a set of terms which either con-             ically find more variants, such as inflectional and
stitute a common name of the diagnosis or describe               spelling variants as well as misspellings. Please
some of its aspects. For example, concerning in-                 note that the labeled edges are only added for il-
juries caused by foreign bodies, we consider Draht               lustration purposes – the relations between term
‘wire’ and Nagel ‘nail’ as focus terms, even though                 2
                                                                      For a study of agreement between human judgement and
these words only refer to the cause, but not to the
                                                                 different similarity measures, see e. g. Efremova et al. (2014);
injuries themselves.                                             for a general overview of similarity measures cf. Navarro
   As each focus term is represented by its normal-              (2001) and Christen (2006).
               Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           65


                                          diagnosis
                                                                                                  Volvolus
                                  volvulus
                                                                                                  Vovulus
                                           canonical term                  misspellings
                                                                                                  Voluvlus
                                                          Volvulus
         Darmtorsionen                                 synonym
                         inflection
                                            Darmtorsion hyponym
                                           hyponym
                                                                          Caecumtorsion                      spelling
                           Dünndarmtorsion                                                                   variants
                                 synonyms                                         Zäkumtorsion

  Dünndarmdrehungen                                                                Zäkumstorsion
                                          Dünndarmvolvulus
                                                                                    Caecum­Torsion
                                Figure 1: Term variants for the diagnosis volvulus.


forms (such as synonym, misspelling) were not
captured during this phase, as they were not needed
for syndrome/diagnosis classification. However,
                                                                            (a)                              (b)
we examined ways to partly recover this underly-
ing structure in an automated way, as is described              Figure 2: Ontology matching before (a) and after (b)
in the following sections.                                      term conflation. In both graphics, the left-hand side
                                                                represents a diagnosis as a set of terms, some of which
2.4    Further Term Conflation                                  are linked to a UMLS concept (connected bullets) on
                                                                the right-hand side.
The UMLS Metathesaurus3 is a large collection
of various medical terminology resources. One of
its key features is the assignment of unique con-
                                                                granularity hinders the exploitation of the linked
cept identifiers to entries from different vocabular-
                                                                information, as the meaning of many diagnoses ap-
ies in many languages, thus establishing equiva-
                                                                pears highly ambiguous in terms of the Metathe-
lence relations across them. By creating links to
                                                                saurus. In order to better match the semantic range
Metathesaurus concepts, we can enrich our own
                                                                of the UMLS concepts, we passed on to perform
terminology resource with information contained
                                                                the mapping at the level of terms rather than di-
in the Metathesaurus, as well as making it more
                                                                agnoses. This required us to add a hierarchical
valuable when sharing it with others.
                                                                layer to our data structure: We needed to distin-
   We used the 2014AA release of the Metathe-                   guish term variants (spelling and inflectional al-
saurus for this work. For each concept that was                 ternations, such as Caecumtorsion vs. Zäkumstor-
represented in a German vocabulary, we normal-                  sion) from separate terms (e. g. Zäkumstorsion vs.
ized its lemma and tried to match it against an en-             Darmtorsion). Please note that synonyms such
try among our focus terms. With this approach, we               as Darmtorsion and Darmdrehung are considered
were able to establish a link to one or more UMLS               separate terms, even though they have the same
concepts for 80.6 % of the diagnoses.                           meaning.
   Since our data were organized by diagnosis,
                                                                   For each diagnosis, we organized all term forms
each covering a number of terms with sometimes
                                                                into groups of term variants. The arrangement was
quite disparate meanings, the connection to the
                                                                performed automatically, based on string similar-
Metathesaurus produced a high number of one-to-
                                                                ity. While string similarity is only an unreliable
many mappings (cf. Figure 2). This difference in
                                                                approximation of human similarity judgement, and
   3
   www.nlm.nih.gov/pubs/factsheets/                             while there are a number of concurring ways of
umlsmeta.html                                                   computing it, it is also difficult to determine a
                                    Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                                                    66


                     1                                                                                   generation’ and Muskelfaserdegeneration ‘muscle
                    0.9                                                                                  fiber degeneration’, which might even be regarded
                                                                                                         equal in a less strict evaluation. As for the false
                    0.8
                                                                                                         negatives, the number of misses could be reduced
                    0.7                                                                                  by extending the stemmer with Latin-inflection
                    0.6
                                                                                                         endings like Ulkus – Ulzera.
Levenshtein ratio


                    0.5
                                                                                                         2.5 Connecting to UMLS
                    0.4


                    0.3
                                                                                                         Each group of term variants was then linked to a
                                                          variant pairs                                  UMLS concept if there was a match between at
                    0.2                                   non­variant pairs                              least one member of the group (i. e. a term vari-
                    0.1                                                                                  ant) and of the German concept descriptions, re-
                                                                                                         spectively. Only exact agreement of the normal-
                     0
                          0   0.1   0.2      0.3    0.4       0.5     0.6     0.7   0.8   0.9   1        ized forms was counted as a match, as preliminary
                                          character­trigram cosine similarity                            experiments had shown that fuzzy matching intro-
                                                                                                         duced a great amount of false positives (connec-
Figure 3: Two different similarity measures for pairs of
similar and dissimilar words.                                                                            tions between similarly spelled, but otherwise un-
                                                                                                         related words) while adding only very few desired
                                                                                                         links. However, we were able to improve the link-
threshold that clearly separates similar from dis-                                                       age with simple heuristics, such as the removal of
similar pairs of words. We therefore chose to per-                                                       boilerplate expressions like nicht näher bezeichnet
form supervised machine learning, i. e. automatic                                                        ‘not otherwise specified’.
learning by example. We compiled a training set
                                                                                                            In 42.1 % of the terms, we could find a match
of positive instances of inflectional/spelling alter-
                                                                                                         with a UMLS concept. Only 6.7 % of the match-
nation as well as negative instances, i. e. pairs of
                                                                                                         ing terms point to more than one concept, which
unrelated words. For each pair, we computed two
                                                                                                         means that 93.3 % of the terms with a match can
different string similarity measures (cf. Figure 3):
                                                                                                         be mapped to the Metathesaurus unambiguously.
cosine similarity of character trigram vectors, and
                                                                                                         However, for more than half of the terms no cor-
Levenshtein ratio. These two measures cover dif-
                                                                                                         responding UMLS concept could be found at all,
ferent aspects of similarity, and thus their combi-
                                                                                                         which is mainly due to the different domains of
nation might capture more information than just
                                                                                                         our veterinary texts and the predominantly human-
one of them. We trained a Support Vector Ma-
                                                                                                         medicine-based UMLS. Table 3 shows some ex-
chine on the two-dimensional space of the similar-
                                                                                                         amples of the mapping.
ity measures, using a polynomial kernel function.
   The automatic term grouping yielded very satis-                                                          The connections to the Metathesaurus allowed
factory results. We manually evaluated the result-                                                       us to further enrich our data. For example, ev-
ing groups, requiring that all members be ortho-                                                         ery UMLS concept has a semantic type assigned
graphical or inflectional variations of each other.                                                      to it, such as “Disease or Syndrome” or “Patho-
We also allowed derivational variants (e. g. Weiss-                                                      logic Function”. Additionally, we used the concept
muskelkrankheit/…erkrankung ‘white muscle dis-                                                           descriptions in Metathesaurus to find more focus
ease’) to be in the same group, although the                                                             terms. By matching the descriptions of connected
separation of derivatives (e. g. Ulkus/Ulzeration)                                                       concepts against our text collection, we were able
was not counted as false negative. We found                                                              to enlarge the set of focus terms by almost 10 %.
that less than 6.7 % of the groups contained un-                                                            As next steps, we plan to create links to other
equal terms (false positives), and only 1.9 % of                                                         widely-used terminology resources, such as the
the groups were erroneously isolated instead of                                                          Central key for health data recording by the
being merged with the correct equivalents (false                                                         International Committee for Animal Recording
negatives). Many false positive judgements were                                                          (ICAR).4
caused by terms with only small differences in
                                                                                                           4
meaning, such as Muskeldegeneration ‘muscle de-                                                                See www.icar.org
                Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                            67


    diagnosis/terms                                           UMLS
    stenosis
    Darmstenose                                               C0267465 Darmstenose/Darmstriktur/Stenose des…
    Dünndarmstenose                                           C0151924 Dünndarmstenose/Stenose des Dünndarms
    Rectumstenose, Rektumstenose                              –
    myodegeneration
    Belastungsmyopathie                                       –
    Muskelfaserdegeneration, Muskelfaserndegeneration         C0234958 Muskeldegeneration/Degeneration des …
    Muskelfasernekrose                                        –
    Muskelläsionen                                            –
    Muskelnekrose, Muskelnekrosen                             C0235957 Muskelnekrose/Myonekrose
    Myodegeneration, myodegeneration                          –
    Myonekrose, myonecrosis                                   C0235957 Muskelnekrose/Myonekrose
    Rhabdomyolyse                                             C0035410 Rhabdomyolyse
    Weiss-Muskel-Krankheit, Weiss-Muskelkrankheit,            C0043153 Muskeldystrophie, nutritive/
       Weissmuskelerkrankung, Weissmuskelkrankeit,              Weißmuskelkrankheit
       Weissmuskelkrankheit

                                  Table 3: Mapping to the UMLS Metathesaurus.


3     Annotation Tool                                            tion scope detection in Swedish clinical reports.
                                                                 According to them, “[e]mploying a simple, rule-
The terminology resource described above is a
                                                                 based approach with a small amount of negation
key component in our efforts to create a veteri-
                                                                 triggers and a fixed context window for determin-
nary surveillance system. We wrote a pipeline of
                                                                 ing scope is very efficient and useful, if results
Python scripts that assists our semi-automatic an-
                                                                 around 80 % F-score are sufficient for a given pur-
notation of the pathology reports. The tool per-
                                                                 pose” (Tanushi et al., 2013, p. 393). We included a
forms automatic annotation of syndromes and di-
                                                                 simple negation-detection module in our pipeline,
agnoses based on the term resource, while also
                                                                 which looks for a set of negative expressions in a
keeping track of manual verfications and rejec-
                                                                 context window of 5 tokens to either side of the
tions. Through a web interface, it accepts a Mi-
                                                                 focus term. The context can be restricted for each
crosoft Excel workbook as input and produces a
                                                                 expression (e. g. only to the right of or only imme-
modified version in the same format, which allows
                                                                 diately preceding a focus term). The context win-
a veterinary domain expert to inspect and modify
                                                                 dow is shortened at sentence boundaries and other
the automatic annotations. All relevant informa-
                                                                 indicators of a break. However, as the results of the
tion – such as the term resource and the assigned
                                                                 negation detection are not yet satisfactory, we plan
categories, negations (see below), and the previous
                                                                 to integrate an existing library for this task, e. g. the
manual annotations – are contained within this file.
                                                                 Python package pyConTextNLP (Chapman et al.,
3.1 Negation Detection                                           2011).

In a keyword-based system for detecting evidence,
                                                                 3.2 Inter-Annotator Agreement
negative expressions can play a crucial role. Oc-
casionally, negative outcomes of an analysis are                 In order to validate the quality of our annotations,
reported in the texts, and suspected diagnoses are               we organized a multi-annotator evaluation. We
rejected quite frequently, such as keine Hinweise                performed an experiment with six experts of vet-
auf eine Pneumonie ‘no evidence of a pneumonia’.                 erinary pathology, which were asked to classify a
Therefore, we aimed at identifying occurrences of                number of reports with respect to the syndromic
focus terms that are mentioned in a negated con-                 categories described in Section 2.1. For this pur-
text.                                                            pose, we created a web interface which displayed
   Besides the identification of negated expres-                 the report text together with some metadata, one
sions, negation detection heavily depends on the                 report at a time, and allowed to mark each of
correct determination of their scope. Tanushi et                 the syndromes as present or absent. The reports
al. (2013) compare different approaches to nega-                 were randomly sampled, keeping the distribution
                 Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                               68


 syndrome          reports       Do        De        α              in the next revision of the syndromic categoriza-
 gastro-int.      52 (13)       0.059     0.251    0.764            tion.
 respiratory      28 (10)       0.045     0.207    0.781
 urinary           9    (3)     0.014     0.083    0.836            4 Outlook
 cardio-vasc.     15    (9)     0.041     0.115    0.644
 lymphatic         3    (3)     0.014     0.018    0.240            We will assess the performance of the text-mining
 musc.-skel.      13    (3)     0.014     0.125    0.891            tool based on a small number of diseases which
 reproductive      9    (1)     0.005     0.094    0.952            have been relevant in Switzerland in the last 10
 neural            5    (2)     0.009     0.052    0.825            years:
 other            38 (21)       0.095     0.226    0.577
 avg.                                              0.723             1. Bovine Viral Diarrhoea in cattle (an eradica-
                                                                        tion campaign for the disease was introduced
Table 4: Inter-annotator agreement of the syndromic
                                                                        in 2008)
categories, measured with Krippendorff’s Alpha. The
second column gives the number of reports where                      2. Porcine Circovirus type 2 infection in pigs
at least one annotator marked the corresponding syn-
drome as present; following in parentheses is the num-               3. Gastro-intestinal syndromes in pigs (for which
ber of reports with disagreement. Do and De are the                     we observe an increasing amount of pathology
observed and expected disagreement, respectively.
                                                                        submissions)
                                                                    Time-series analyses will be performed to quantify
of species and year of creation as close to the entire              trends, seasonality and other effects (day of week,
collection as possible (approaching stratified sam-                 day of month etc.) on the number of submissions
pling). Each annotator was provided with a sample                   for syndromes potentially related to these diseases.
of 20 reports, which was extended to twice or three                 For each disease, “in-control” data (data collected
times the size when an annotator asked for more.                    in the absence of an outbreak) will be used to estab-
In order to increase sample size, the same report                   lish a baseline model describing the amount of nor-
was given to only two or three annotators, rather                   mal “noise” in the data (expected number of sub-
than all of them. In total, 81 distinct reports were                missions in the absence of disease outbreaks). Ret-
annotated.                                                          rospective analyses of the time-series will be done
   We evaluated the inter-annotator agreement                       to see whether alerts (signals) were produced when
with Krippendorff’s Alpha (Krippendorff, 2013,                      the number of submissions for syndromes poten-
pp. 267–309), as is shown in Table 4. For com-                      tially linked to the disease was higher than ex-
puting the agreement, we regarded each syndrome                     pected from our baseline model (event detection).
as an independent, binary variable (each syndrome                   This will allow us to evaluate whether the system
is either present or absent in a report). The agree-                would have worked as an early-warning system.
ment value α ranges from 1 (perfect agreement)                         The tools developed in this project will be
to 0 (agreement as by chance) or even below (sys-                   adapted to reports from different pathology in-
tematic disagreement). A high agreement means                       stitutes throughout Switzerland, thus contributing
that identifying syndromes is a clear task, while                   to a nation-wide syndromic surveillance system.
a low agreement indicates that the decisions can-                   Similarly, the methodology developed may be ap-
not be easily made. Most of the syndromes have a                    plicable to the analysis of text-based disease infor-
good (>0.8) or acceptable (>0.6) α score,5 whereas                  mation which is recorded in other contexts. For ex-
some are clearly identified as problematic. For the                 ample, there is a great potential of using such a sys-
lymphatic system, the sparse representation (only                   tem to systematically analyse health data which are
3 reports) does not allow for valid conclusions;                    recorded by veterinary practitioners in their prac-
further investigation is required in this case. The                 tice management software, slaughter data or by an-
“catch-all” class other, however, most likely suf-                  imal health services in their central database.
fers from having an unclear scope. As a conse-
quence of this evaluation, we decided to reduce the                 Acknowledgements
ambiguity of other by including additional classes
                                                                    This work was funded by the Swiss Federal
   5
   For a discussion of the interpretation of absolute agree-        Food Safety and Veterinary Office (Bundesamt für
ment scores see Artstein and Poesio (2008, p. 591)                  Lebensmittelsicherheit und Veterinärwesen).
               Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           69


References                                                        for Cultural Heritage, Social Sciences, and Human-
                                                                  ities (LaTeCH), page 47–55, Gothenburg, Sweden,
R. Michele Anholt, John Berezowski, Iqbal Jamal, Carl
                                                                  April. Association for Computational Linguistics.
  Ribble, and Craig Stephen. 2014. Mining free-text
                                                                Jeff Friedlin, Shaun Grannis, and J. Marc Overhage.
  medical records for companion animal enteric syn-
                                                                  2008. Using natural language processing to improve
  drome surveillance. Preventive Veterinary Medicine,
                                                                  accuracy of automated notifiable disease reporting.
  113(4):417–422.
                                                                  AMIA Annual Symposium Proceedings, 2008:207–
Ron Artstein and Massimo Poesio. 2008. Inter-coder                211.
  agreement for Computational Linguistics. Compu-               David Hartley, Noele Nelson, Ronald Walters, Ray
  tational Linguistics, 34(4):555–596.                            Arthur, Roman Yangarber, Larry Madoff, Jens
Taxiarchis Botsis, Michael D. Nguyen, Emily Jane                  Linge, Abla Mawudeku, Nigel Collier, John Brown-
  Woo, Marianthi Markatou, and Robert Ball. 2011.                 stein, Germain Thinus, and Nigel Lightfoot. 2010.
  Text mining for the Vaccine Adverse Event Report-               Landscape of international event-based biosurveil-
  ing System: medical text classification using infor-            lance. Emerging Health Threats Journal, 3(e3).
  mative feature selection. Journal of the American             Klaus Krippendorff. 2013. Content Analysis: An In-
  Medical Informatics Association, 18(5):631–638.                 troduction to Its Methodology. Sage Publications,
John S. Brownstein, Clark C. Freifeld, Ben Y. Reis, and           Thousand Oaks, CA, 3rd edition.
  Kenneth D. Mandl. 2008. Surveillance Sans Fron-               Gonzalo Navarro. 2001. A guided tour to approx-
  tieres: Internet-based emerging infectious disease in-          imate string matching. ACM Computing Surveys,
  telligence and the HealthMap project. PLoS Med,                 33(1):31–88, March.
  5(7):e151.                                                    Naoaki Okazaki and Jun’ichi Tsujii. 2010. Sim-
Wendy W. Chapman, John N. Dowling, and Michael M.                 ple and efficient algorithm for approximate dictio-
  Wagner. 2004. Fever detection from free-text clini-             nary matching. In Proceedings of the 23rd Inter-
  cal records for biosurveillance. Journal of Biomedi-            national Conference on Computational Linguistics,
  cal Informatics, 37(2):120–127.                                 COLING ’10, page 851–859, Beijing, China, August.
Brian E. Chapman, Sean Lee, Hyunseok Peter Kang,                Martin F. Porter. 1980. An algorithm for suffix strip-
  and Wendy W. Chapman. 2011. Document-level                      ping. Program, 14(3):130–137.
  classification of CT pulmonary angiography reports            Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel
  based on an extension of the ConText algorithm.                 Kaljurand, Mare Koitand, Kadri Vider, and Neeme
  Journal of Biomedical Informatics, 44(5):728–737.               Kahusk. 2002. Terminology as knowledge in answer
Peter Christen. 2006. A comparison of personal name               extraction. In TKE-2002: 6th International Confer-
  matching: Techniques and practical issues. Tech-                ence on Terminology and Knowledge Engineering,
  nical Report TR-CS-06-02, The Australian National               Nancy, France, August.
  University, Dec.                                              Suzanne L. Santamaria and Kurt L. Zimmerman. 2011.
Nigel Collier, Ai Kawazoe, Lihua Jin, Mika Shige-                 Uses of informatics to solve real world problems in
  matsu, Dinh Dien, Roberto A. Barrero, Koichi                    veterinary medicine. Journal of veterinary medical
  Takeuchi, and Asanee Kawtrakul. 2006. A multilin-               education, 38(2):103–109.
  gual ontology for infectious disease surveillance: ra-        Kimberly A. Smith-Akin, Charles F. Bearden,
  tionale, design and challenges. Language Resources              Stephen T. Pittenger, and Elmer V. Bernstam. 2007.
  and Evaluation, 40(3-4):405–413.                                Toward a veterinary informatics research agenda:
Nigel Collier, Son Doan, Ai Kawazoe, Reiko Mat-                   An analysis of the PubMed-indexed literature.
  suda Goodwin, Mike Conway, Yoshio Tateno, Quoc-                 International Journal of Medical Informatics,
  Hung Ngo, Dinh Dien, Asanee Kawtrakul, Koichi                   76(4):306–312.
  Takeuchi, et al. 2008. BioCaster: detecting public            Ralf Steinberger, Flavio Fuart, Erik van der Goot, Clive
  health rumors with a Web-based text mining system.              Best, Peter von Etter, and Roman Yangarber. 2008.
  Bioinformatics, 24(24):2940–2941.                               Text mining from the web for medical intelligence.
Fernanda C. Dórea, C. Anne Muckle, David Kel-                     In Françoise Fogelman-Soulié, Domenico Perrotta,
  ton, J. T. McClure, Beverly J. McEwen, W. Bruce                 Jakub Piskorski, and Ralf Steinberger, editors, Min-
  McNab, Javier Sanchez, and Crawford W. Revie.                   ing Massive Data Sets for Security, volume 19 of
  2013. Exploratory analysis of methods for auto-                 NATO Science for Peace and Security Series – D: In-
  mated classification of laboratory test orders into             formation and Communication Security, page 295–
  syndromic groups in veterinary medicine. PLOS                   310. IOS Press.
  one, 8(3):e57334.                                             Hideyuki Tanushi, Hercules Dalianis, Martin Duneld,
Julia Efremova, Bijan Ranjbar-Sahraei, and Toon                   Maria Kvist, Maria Skeppstedt, and Sumithra
  Calders. 2014. A hybrid disambiguation measure                  Velupillai. 2013. Negation scope delimitation in
  for inaccurate cultural heritage data. In Proceed-              clinical text using three approaches: NegEx, Py-
  ings of the 8th Workshop on Language Technology                 ConTextNLP and SynNeg. In Proceedings of the
              Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                          70


  19th Nordic Conference of Computational Linguis-
  tics (NODALIDA 2013), page 387–397, Oslo, Nor-
  way.
Michael M. Wagner, J. Espino, F-C. Tsui, P. Geste-
  land, W. Chapman, O. Ivanov, A. Moore, W. Wong,
  J. Dowling, and J. Hutman. 2004. Syndrome
  and outbreak detection using chief-complaint data
  – experience of the Real-Time Outbreak and Dis-
  ease Surveillance project. Morbidity and Mortality
  Weekly Report, 53:28–31.
Eva Warns-Petit, Eric Morignat, Marc Artois, and Di-
  dier Calavas. 2010. Unsupervised clustering of
  wildlife necropsy data for syndromic surveillance.
  BMC Veterinary Research, 6(56):1–11.