=Paper=
{{Paper
|id=Vol-1495/paper_8
|storemode=property
|title=Constructing a Syndromic Terminology Resource for Veterinary Text Mining
|pdfUrl=https://ceur-ws.org/Vol-1495/paper_8.pdf
|volume=Vol-1495
|dblpUrl=https://dblp.org/rec/conf/tia/FurrerKBPVR15
}}
==Constructing a Syndromic Terminology Resource for Veterinary Text Mining==
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
61
Constructing a Syndromic Terminology Resource
for Veterinary Text Mining
Lenz Furrer Susanne Küker John Berezowski
Institute of Veterinary Public Health Department of Clinical Research
Computational Linguistics Institute and Veterinary Public Health
University of Zurich University of Bern University of Bern
lenz.furrer@uzh.ch susanne.kueker john.berezowski
@vetsuisse.unibe.ch @vetsuisse.unibe.ch
Horst Posthaus Flavie Vial Fabio Rinaldi
Institute of Animal Pathology Veterinary Public Health Institute Institute of
University of Bern University of Bern Computational Linguistics
horst.posthaus flavie.vial University of Zurich
@vetsuisse.unibe.ch @vetsuisse.unibe.ch fabio.rinaldi@uzh.ch
Abstract detecting mentions of fever in free-text clinical
records. Similarly, the BioCaster system (Collier
Public health surveillance systems rely on the et al., 2006; Collier et al., 2008) relies on a care-
automated monitoring of large amounts of fully constructed medical ontology combined with
text. While building a text mining system
a Naïve-Bayes classifier as an input filter. Friedlin
for veterinary syndromic surveillance, we ex-
ploit automatic and semi-automatic meth-
et al. (2008) use a regular-expression based term-
ods for terminology construction at different extraction system to find positive and negative
stages. Our approaches include term extrac- mentions of methicillin-resistant Staphylococcus
tion from free-text, grouping of term variants aureus in culture reports. Hartley et al. (2010) give
based on string similarity, and linking to an an overview of surveillance systems that mainly fo-
existing medical ontology. cus on world-wide monitoring of web sources, in-
cluding news feeds and informal medical networks.
1 Introduction The text mining of veterinary reports faces ad-
ditional challenges such as multiple species and
In the project Veterinary Pathology Text Mining,
a less controlled vocabulary (Smith-Akin et al.,
we are developing tools to exploit veterinary post-
2007; Santamaria and Zimmerman, 2011). Up to
mortem data for epidemiological surveillance and
this point, approaches for classifying veterinary di-
early detection of animal diseases. This paper de-
agnostic data into syndromes for surveillance have
scribes the work in progress on the construction of
been restricted to the use of rule-based classifiers
a veterinary terminology resource as a basis for a
(Dórea et al., 2013; Anholt et al., 2014). To build
text mining tool to classify, with minimal human
these classifiers, a group of experts manually cre-
intervention, free-text veterinary reports with re-
ates a large set of rules. The rules are then used
spect to multiple clinical syndromes that can be
to classify veterinary diagnostic submissions into
monitored.
syndromes based on the presence or absence of
In human medicine, text mining has been suc-
specific words within various fields in the diagnos-
cessfully applied to clinical records in many pub-
tic submission data.
lic health surveillance systems (Botsis et al., 2011;
Steinberger et al., 2008; Brownstein et al., 2008; We propose to develop a process for using text
Wagner et al., 2004). The approaches range mining methodologies (natural language process-
from hand-written rule-based systems to fully au- ing) to efficiently extract relevant health informa-
tomated methods using machine learning. For tion from veterinary diagnostic submission data
example, Chapman et al. (2004) use heuristical with minimal human intervention. Given a suf-
keyword-driven as well as supervised machine ficient amount of data (i. e. at least a few hun-
learning techniques (Naïve-Bayes classifier) for dreds of manually classified reports), a machine
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
62
learning approach will allow us to directly classify 2.1 Syndrome and Diagnosis Classification
these data into syndromes that can be monitored The work described here is based on post-mortem
for surveillance. reports that were compiled by the Institute of Ani-
As recognized in the Swiss Animal Health Strat- mal Pathology (ITPA) of the Vetsuisse faculty at
egy 2010+, methods for early disease detection, the University of Bern. The data were entered
based on the increasing abundance of data on ani- into a database by veterinary pathologists between
mal health stored in national databases, can con- 2000 and 2011. We used a subset of approximately
tribute to valuable and highly efficient surveil- 9 000 report entries regarding pigs and cattle. The
lance activities. Post-mortem data, available from reports are written in German, with a small frac-
pathology services, are often under-exploited. The tion (less than 3 %) in English and French.
main purpose of post-mortem investigations of For subsequent quantitative analysis, we clas-
food production animals is to provide information sified all reports using two categorization lev-
about the cause of disease or death with regard els. As a coarse-grained categorization, we anno-
to treatment, and prevention options for the af- tated each report with the syndromic groups that
fected herd. Besides these major diagnoses, all were affected by a medical issue. Each report
additional pathological findings are also recorded was assigned zero, one or more of 9 syndrome
as text and electronically archived as necropsy re- categories (gastro-intestinal, respiratory, urinary,
ports. In addition to the value of this information cardio-vascular, lymphatic, musculo-skeletal, re-
for veterinarians and farmers, systematic evalua- productive, neural, other). This categorization ap-
tion of necropsy data may be of value the early proximately meets the level of granularity found in
detection of spatio-temporal clusters of syndromes other work (Dórea et al., 2013; Warns-Petit et al.,
which may result from a new disease emerging into 2010). For a finer-grained categorization of the re-
a population or from changing patterns of endemic ports, we additionally annotated post-mortem di-
diseases. As such, it has the potential to be of value agnoses mentioned (directly or implicitly) in the
for both nation-wide and international (veterinary) reports, such as enteritis, lipidosis, or injuries from
public health early-warning systems. foreign bodies. The set of diagnoses was not de-
The rest of this paper is organized as follows: fined a priori, but continuously updated in the clas-
We present our efforts in constructing and exploit- sification process. The final set comprised some
ing a veterinary terminology resource in Section 2. 100 classes and is shown in Table 1. The diagnoses
Section 3 describes our work towards report clas- are modeled as subcategories of the syndromes.
sification in the context of building a surveillance While some category names occur in more than
tool. The next steps and further application scenar- one syndromic category, it does not mean that they
ios are given in Section 4. are ambiguous, as they are triggered by different
terms. For example, atresia is classified as a con-
2 Terminology Construction genital abnormality of the gastro-intestinal system,
In the process of report classification, we have put whereas the ventricular septal defect is a congeni-
a lot of effort in the construction of a terminology tal abnormality of the cardio-vascular system.
resource that suited our needs. The resulting term
2.2 Term Normalization
inventory is tailored to a very specific task. Still,
the methods, insights and even the resource itself The medical reports have a high number of sur-
can be of use for other applications. Similar to the face variants per term. The variation is caused by
work by Rinaldi et al. (2002), we extracted a set of inflection, inconsistent spelling and typographical
terms from a collection of raw text and used auto- errors. On a higher level, variation is increased
matic methods to organize them into a hierarchical by synonymy, i. e. the use of different terms for
structure. Section 2.1 introduces the categories we the same concept (e. g. Lipidose/Verfettung ‘lipi-
used for classification. In Sections 2.2 and 2.3, we dosis’). From the perspective of the given text min-
describe the steps that led to the construction of the ing task, certain derivative forms can be consid-
term inventory. Sections 2.4 and 2.5 show how this ered synonymous variants as well (e. g. Ulzeration
resource can be automatically enhanced for a more besides Ulkus).
general usage. We split the report texts into tokens, which we
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
63
gastro-intestinal perforation 35 cystitis 56 congenital hydrocephalus 18
abomasal ulcer 178 pharyngitis 9 hydronephrosis 30 abnormality 30 intoxication 5
abomasitis 120 proctitis 13 nephritis 416 fracture 94 meningitis 226
acidosis 41 reticulitis 98 renal luxation 9 myelitis 13
cheilitis 6 rumenitic ulcer 4 degeneration 116 myodegen- neural
cholangitis 84 rumenitis 92 trauma 8 eration 70 degeneration 54
colitis 368 sialoadenitis 2 urolithiasis 75 myopathy 56 neuropathy 78
congenital steatorrhea 105 cardio-vascular osteochondrosis 30 other
abnormality 21 stenosis 10 cardiomyopathia 46 osteomyelitis 136 crushed 81
dilatation 159 stomatitis 57 congenital polyarthritis 343 dermatitis 184
displaced trauma 25 abnormality 84 synovitis 45 enterotoxemia 285
abomasum 55 typhlitis 37 endocarditis 179 tendinitis 25 eye related 22
duodenitis 12 volvulus 479 epicarditis 62 tendovaginitis 18 foreign body 118
enteritis 2458 respiratory heart reproductive hernia 73
esophagitis 46 bronchiolitis 256 degeneration 50 abortion 642 hydrothorax 104
gastric ulcer 206 bronchitis 466 hydropericard 319 congenital inanition 74
gastritis 121 broncho- myocarditis 77 abnormality 8 intoxication 95
glossitis 16 pneumonia 1040 pericarditis 427 dystocia 9 iron deficiency 65
hepatitis 200 laryngitis 23 pleuritis 41 metritis 70 mastitis 66
HIS 361 pharyngitis 1 perforation 4 neoplasia 68
Hoflund lymphatic
pleuritis 40 lymph- placentitis 126 otitis 20
syndrome 10 pneumonia 769 retained placenta 3 perforation 257
icterus 39 adenopathy 245
rhinitis 11 splenitis 77 uterine peritonitis 866
ileitis 60 rhinitis perforation 4 pleuritis 643
invagination 59 tonsillitis 88
atrophicans 196 uterine torsion 3 pododermatitis 19
jejunitis 30 sinusitis 8 musculo-skeletal vaginitis 8 polyserositis 297
lipidosis 93 tracheitis 28 arthritis 231 rumen drinker 33
obstipation 13 arthrosis 31 neural
urinary congenital sepsis 647
omasitis 18 bone splenic torsion 18
pancreatitis 2 congenital degeneration 17 abnormality 5
abnormality 1 encephalitis 116 umbilicus
parasites 13 callus 14 related 117
Table 1: The diagnoses used for classification, grouped by syndrome, with number of occurrences.
defined as consecutive runs of alphanumeric char- k. The complexity of this rule is owed to the fact
acters or hyphens. We then performed a series of that this normalization is applied to all words, i. e.
normalization steps in order to reduce the number including originally German words like Kinn/Zinn
of term variants when compiling an index. ‘chin’/‘tin’, which would be confused by an uncon-
ditional conflation of c, k, z. As a side effect, the
The bulk of the spelling variation stems from normalization of German terms occasionally cap-
Latin/Greek-originated terms, such as Zäkum ‘ce- tured closely spelled English terms (which were
cum’. Besides the German spelling (using the let- not systematically gathered), such as Enzephali-
ters ä, ö, z/k), the Latin spelling is often used (ae, tis/encephalitis.
oe, c, respectively), and even combinations of the Subsequently, we removed inflectional suffixes
two are encountered. For the previous example, using the NLTK1 implementation of the “Snow-
the following variants are present, among others: ball” stemmer for German (Porter, 1980). Stem-
Caecum, caecum, Cäcum, Cäkum, Zaecum. We ming is the process of removing inflectional and
normalized the usage of these letters by replacing ä (partially) derivational affixes, thus truncating
with ae and ö with oe unconditionally, while treat- words to their stems. For example, minimally and
ing c differently based on its right context: be- minimize are both reduced to minim in Porter’s En-
fore a front vowel it was replaced by z, before h glish stemmer, which is not a proper word, but nev-
and k it was kept as c, and in all other cases (in-
1
cluding word-final position) we replaced it with Natural Language Toolkit: www.nltk.org
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
64
variants normalized form explanation
Zäkumtorsion, Caecumtorsion zaekumtorsion }
ä/ö/c/k/z normalization
Kokzidiose, Coccidiose kokzidios
Aborts, Abort, Abortes, Aborte, Aborten abort stemming
perforierter Ulcus, perforierten Ulkus perforiert ulkus }
both
Kardiomyopathie, Cardiomyopathie, Kardiomyopathien kardiomyopathi
Table 2: Normalization examples.
ertheless a useful key for lumping together etymo- ized form, a number of variant forms is already
logically related words. matched, as described above. We aimed to addi-
Stemming is based on orthographical regulari- tionally cover variants produced by misspellings
ties and uses only a minimal amount of lexical in- as well as inflected forms not recognized by the
formation. Although the method is not flawless – stemmer. Using approximate string matching, we
it may be prone to errors with very short and ir- searched the reports for similar terms for each of
regularly inflected words – it generally works well the focus terms. We used the simstring tool
for languages with alphabetic script and has been (Okazaki and Tsujii, 2010) for retrieving similarly
successfully applied to many European languages. spelled terms among the entire text collection. Ap-
Using a stemmer, we were able to considerably re- proximate matching is a difficult task, as it is hard
duce the number of inflectional/derivational vari- in general to formally define similarity among (the
ants. However, a number of inflectional forms orthographical representations of) words in a way
were still missed by the stemmer – especially plural consistent with human judgement. simstring
forms with Latin inflection, such as Ulkus/Ulzera, measures similarity as a function of the number
or Enteritis/Enteritiden, which are not covered by of shared n-grams (runs of n characters) in two
the stemming rules for general German grammar. words, which is only a rough approximation of
The stemmer also failed to capture most of the the task. However, compared to other similarity
spelling errors. Table 2 illustrates the conflation measures – e. g. Levenshtein’s edit distance2 – it
with examples. is considerably more efficient for retrieval in large
amounts of text. In the inevitable trade-off of good
2.3 Focus Terms precision and high recall, we strove for recall by
For the syndromic classification of the veterinary choosing a low similarity threshold for retrieval.
reports, we manually created a list of focus terms As expected, this resulted in a high number of hits,
which served as indicators for the clinical syn- including many false positives, i. e. words with a
dromes and diagnoses. Starting from a frequency- high n-gram similarity score, that are not actually
ranked list of the words found in all of the reports similar to the input term (e. g. arthritis and arteri-
(already grouped by their normalized form), we tis). Due to the limited number of focus terms it
manually selected terms that were likely to indicate was feasible to manually clean the list of similar
(positive) diagnoses in the reports. The list was re- words.
fined by inspecting the reports that produced hits Figure 1 illustrates how term variants were gath-
for the focus terms. ered around the concept of a diagnosis. A num-
The focus terms typically consist of a single to- ber of synonymous and hyponymous terms were
ken, but we also allowed multi-word expressions. added to a specific diagnosis by a human ex-
The terms are grouped by diagnosis. Thus, each pert. These terms were used as seeds to automat-
diagnosis refers to a set of terms which either con- ically find more variants, such as inflectional and
stitute a common name of the diagnosis or describe spelling variants as well as misspellings. Please
some of its aspects. For example, concerning in- note that the labeled edges are only added for il-
juries caused by foreign bodies, we consider Draht lustration purposes – the relations between term
‘wire’ and Nagel ‘nail’ as focus terms, even though 2
For a study of agreement between human judgement and
these words only refer to the cause, but not to the
different similarity measures, see e. g. Efremova et al. (2014);
injuries themselves. for a general overview of similarity measures cf. Navarro
As each focus term is represented by its normal- (2001) and Christen (2006).
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
65
diagnosis
Volvolus
volvulus
Vovulus
canonical term misspellings
Voluvlus
Volvulus
Darmtorsionen synonym
inflection
Darmtorsion hyponym
hyponym
Caecumtorsion spelling
Dünndarmtorsion variants
synonyms Zäkumtorsion
Dünndarmdrehungen Zäkumstorsion
Dünndarmvolvulus
CaecumTorsion
Figure 1: Term variants for the diagnosis volvulus.
forms (such as synonym, misspelling) were not
captured during this phase, as they were not needed
for syndrome/diagnosis classification. However,
(a) (b)
we examined ways to partly recover this underly-
ing structure in an automated way, as is described Figure 2: Ontology matching before (a) and after (b)
in the following sections. term conflation. In both graphics, the left-hand side
represents a diagnosis as a set of terms, some of which
2.4 Further Term Conflation are linked to a UMLS concept (connected bullets) on
the right-hand side.
The UMLS Metathesaurus3 is a large collection
of various medical terminology resources. One of
its key features is the assignment of unique con-
granularity hinders the exploitation of the linked
cept identifiers to entries from different vocabular-
information, as the meaning of many diagnoses ap-
ies in many languages, thus establishing equiva-
pears highly ambiguous in terms of the Metathe-
lence relations across them. By creating links to
saurus. In order to better match the semantic range
Metathesaurus concepts, we can enrich our own
of the UMLS concepts, we passed on to perform
terminology resource with information contained
the mapping at the level of terms rather than di-
in the Metathesaurus, as well as making it more
agnoses. This required us to add a hierarchical
valuable when sharing it with others.
layer to our data structure: We needed to distin-
We used the 2014AA release of the Metathe- guish term variants (spelling and inflectional al-
saurus for this work. For each concept that was ternations, such as Caecumtorsion vs. Zäkumstor-
represented in a German vocabulary, we normal- sion) from separate terms (e. g. Zäkumstorsion vs.
ized its lemma and tried to match it against an en- Darmtorsion). Please note that synonyms such
try among our focus terms. With this approach, we as Darmtorsion and Darmdrehung are considered
were able to establish a link to one or more UMLS separate terms, even though they have the same
concepts for 80.6 % of the diagnoses. meaning.
Since our data were organized by diagnosis,
For each diagnosis, we organized all term forms
each covering a number of terms with sometimes
into groups of term variants. The arrangement was
quite disparate meanings, the connection to the
performed automatically, based on string similar-
Metathesaurus produced a high number of one-to-
ity. While string similarity is only an unreliable
many mappings (cf. Figure 2). This difference in
approximation of human similarity judgement, and
3
www.nlm.nih.gov/pubs/factsheets/ while there are a number of concurring ways of
umlsmeta.html computing it, it is also difficult to determine a
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
66
1 generation’ and Muskelfaserdegeneration ‘muscle
0.9 fiber degeneration’, which might even be regarded
equal in a less strict evaluation. As for the false
0.8
negatives, the number of misses could be reduced
0.7 by extending the stemmer with Latin-inflection
0.6
endings like Ulkus – Ulzera.
Levenshtein ratio
0.5
2.5 Connecting to UMLS
0.4
0.3
Each group of term variants was then linked to a
variant pairs UMLS concept if there was a match between at
0.2 nonvariant pairs least one member of the group (i. e. a term vari-
0.1 ant) and of the German concept descriptions, re-
spectively. Only exact agreement of the normal-
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ized forms was counted as a match, as preliminary
charactertrigram cosine similarity experiments had shown that fuzzy matching intro-
duced a great amount of false positives (connec-
Figure 3: Two different similarity measures for pairs of
similar and dissimilar words. tions between similarly spelled, but otherwise un-
related words) while adding only very few desired
links. However, we were able to improve the link-
threshold that clearly separates similar from dis- age with simple heuristics, such as the removal of
similar pairs of words. We therefore chose to per- boilerplate expressions like nicht näher bezeichnet
form supervised machine learning, i. e. automatic ‘not otherwise specified’.
learning by example. We compiled a training set
In 42.1 % of the terms, we could find a match
of positive instances of inflectional/spelling alter-
with a UMLS concept. Only 6.7 % of the match-
nation as well as negative instances, i. e. pairs of
ing terms point to more than one concept, which
unrelated words. For each pair, we computed two
means that 93.3 % of the terms with a match can
different string similarity measures (cf. Figure 3):
be mapped to the Metathesaurus unambiguously.
cosine similarity of character trigram vectors, and
However, for more than half of the terms no cor-
Levenshtein ratio. These two measures cover dif-
responding UMLS concept could be found at all,
ferent aspects of similarity, and thus their combi-
which is mainly due to the different domains of
nation might capture more information than just
our veterinary texts and the predominantly human-
one of them. We trained a Support Vector Ma-
medicine-based UMLS. Table 3 shows some ex-
chine on the two-dimensional space of the similar-
amples of the mapping.
ity measures, using a polynomial kernel function.
The automatic term grouping yielded very satis- The connections to the Metathesaurus allowed
factory results. We manually evaluated the result- us to further enrich our data. For example, ev-
ing groups, requiring that all members be ortho- ery UMLS concept has a semantic type assigned
graphical or inflectional variations of each other. to it, such as “Disease or Syndrome” or “Patho-
We also allowed derivational variants (e. g. Weiss- logic Function”. Additionally, we used the concept
muskelkrankheit/…erkrankung ‘white muscle dis- descriptions in Metathesaurus to find more focus
ease’) to be in the same group, although the terms. By matching the descriptions of connected
separation of derivatives (e. g. Ulkus/Ulzeration) concepts against our text collection, we were able
was not counted as false negative. We found to enlarge the set of focus terms by almost 10 %.
that less than 6.7 % of the groups contained un- As next steps, we plan to create links to other
equal terms (false positives), and only 1.9 % of widely-used terminology resources, such as the
the groups were erroneously isolated instead of Central key for health data recording by the
being merged with the correct equivalents (false International Committee for Animal Recording
negatives). Many false positive judgements were (ICAR).4
caused by terms with only small differences in
4
meaning, such as Muskeldegeneration ‘muscle de- See www.icar.org
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
67
diagnosis/terms UMLS
stenosis
Darmstenose C0267465 Darmstenose/Darmstriktur/Stenose des…
Dünndarmstenose C0151924 Dünndarmstenose/Stenose des Dünndarms
Rectumstenose, Rektumstenose –
myodegeneration
Belastungsmyopathie –
Muskelfaserdegeneration, Muskelfaserndegeneration C0234958 Muskeldegeneration/Degeneration des …
Muskelfasernekrose –
Muskelläsionen –
Muskelnekrose, Muskelnekrosen C0235957 Muskelnekrose/Myonekrose
Myodegeneration, myodegeneration –
Myonekrose, myonecrosis C0235957 Muskelnekrose/Myonekrose
Rhabdomyolyse C0035410 Rhabdomyolyse
Weiss-Muskel-Krankheit, Weiss-Muskelkrankheit, C0043153 Muskeldystrophie, nutritive/
Weissmuskelerkrankung, Weissmuskelkrankeit, Weißmuskelkrankheit
Weissmuskelkrankheit
Table 3: Mapping to the UMLS Metathesaurus.
3 Annotation Tool tion scope detection in Swedish clinical reports.
According to them, “[e]mploying a simple, rule-
The terminology resource described above is a
based approach with a small amount of negation
key component in our efforts to create a veteri-
triggers and a fixed context window for determin-
nary surveillance system. We wrote a pipeline of
ing scope is very efficient and useful, if results
Python scripts that assists our semi-automatic an-
around 80 % F-score are sufficient for a given pur-
notation of the pathology reports. The tool per-
pose” (Tanushi et al., 2013, p. 393). We included a
forms automatic annotation of syndromes and di-
simple negation-detection module in our pipeline,
agnoses based on the term resource, while also
which looks for a set of negative expressions in a
keeping track of manual verfications and rejec-
context window of 5 tokens to either side of the
tions. Through a web interface, it accepts a Mi-
focus term. The context can be restricted for each
crosoft Excel workbook as input and produces a
expression (e. g. only to the right of or only imme-
modified version in the same format, which allows
diately preceding a focus term). The context win-
a veterinary domain expert to inspect and modify
dow is shortened at sentence boundaries and other
the automatic annotations. All relevant informa-
indicators of a break. However, as the results of the
tion – such as the term resource and the assigned
negation detection are not yet satisfactory, we plan
categories, negations (see below), and the previous
to integrate an existing library for this task, e. g. the
manual annotations – are contained within this file.
Python package pyConTextNLP (Chapman et al.,
3.1 Negation Detection 2011).
In a keyword-based system for detecting evidence,
3.2 Inter-Annotator Agreement
negative expressions can play a crucial role. Oc-
casionally, negative outcomes of an analysis are In order to validate the quality of our annotations,
reported in the texts, and suspected diagnoses are we organized a multi-annotator evaluation. We
rejected quite frequently, such as keine Hinweise performed an experiment with six experts of vet-
auf eine Pneumonie ‘no evidence of a pneumonia’. erinary pathology, which were asked to classify a
Therefore, we aimed at identifying occurrences of number of reports with respect to the syndromic
focus terms that are mentioned in a negated con- categories described in Section 2.1. For this pur-
text. pose, we created a web interface which displayed
Besides the identification of negated expres- the report text together with some metadata, one
sions, negation detection heavily depends on the report at a time, and allowed to mark each of
correct determination of their scope. Tanushi et the syndromes as present or absent. The reports
al. (2013) compare different approaches to nega- were randomly sampled, keeping the distribution
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
68
syndrome reports Do De α in the next revision of the syndromic categoriza-
gastro-int. 52 (13) 0.059 0.251 0.764 tion.
respiratory 28 (10) 0.045 0.207 0.781
urinary 9 (3) 0.014 0.083 0.836 4 Outlook
cardio-vasc. 15 (9) 0.041 0.115 0.644
lymphatic 3 (3) 0.014 0.018 0.240 We will assess the performance of the text-mining
musc.-skel. 13 (3) 0.014 0.125 0.891 tool based on a small number of diseases which
reproductive 9 (1) 0.005 0.094 0.952 have been relevant in Switzerland in the last 10
neural 5 (2) 0.009 0.052 0.825 years:
other 38 (21) 0.095 0.226 0.577
avg. 0.723 1. Bovine Viral Diarrhoea in cattle (an eradica-
tion campaign for the disease was introduced
Table 4: Inter-annotator agreement of the syndromic
in 2008)
categories, measured with Krippendorff’s Alpha. The
second column gives the number of reports where 2. Porcine Circovirus type 2 infection in pigs
at least one annotator marked the corresponding syn-
drome as present; following in parentheses is the num- 3. Gastro-intestinal syndromes in pigs (for which
ber of reports with disagreement. Do and De are the we observe an increasing amount of pathology
observed and expected disagreement, respectively.
submissions)
Time-series analyses will be performed to quantify
of species and year of creation as close to the entire trends, seasonality and other effects (day of week,
collection as possible (approaching stratified sam- day of month etc.) on the number of submissions
pling). Each annotator was provided with a sample for syndromes potentially related to these diseases.
of 20 reports, which was extended to twice or three For each disease, “in-control” data (data collected
times the size when an annotator asked for more. in the absence of an outbreak) will be used to estab-
In order to increase sample size, the same report lish a baseline model describing the amount of nor-
was given to only two or three annotators, rather mal “noise” in the data (expected number of sub-
than all of them. In total, 81 distinct reports were missions in the absence of disease outbreaks). Ret-
annotated. rospective analyses of the time-series will be done
We evaluated the inter-annotator agreement to see whether alerts (signals) were produced when
with Krippendorff’s Alpha (Krippendorff, 2013, the number of submissions for syndromes poten-
pp. 267–309), as is shown in Table 4. For com- tially linked to the disease was higher than ex-
puting the agreement, we regarded each syndrome pected from our baseline model (event detection).
as an independent, binary variable (each syndrome This will allow us to evaluate whether the system
is either present or absent in a report). The agree- would have worked as an early-warning system.
ment value α ranges from 1 (perfect agreement) The tools developed in this project will be
to 0 (agreement as by chance) or even below (sys- adapted to reports from different pathology in-
tematic disagreement). A high agreement means stitutes throughout Switzerland, thus contributing
that identifying syndromes is a clear task, while to a nation-wide syndromic surveillance system.
a low agreement indicates that the decisions can- Similarly, the methodology developed may be ap-
not be easily made. Most of the syndromes have a plicable to the analysis of text-based disease infor-
good (>0.8) or acceptable (>0.6) α score,5 whereas mation which is recorded in other contexts. For ex-
some are clearly identified as problematic. For the ample, there is a great potential of using such a sys-
lymphatic system, the sparse representation (only tem to systematically analyse health data which are
3 reports) does not allow for valid conclusions; recorded by veterinary practitioners in their prac-
further investigation is required in this case. The tice management software, slaughter data or by an-
“catch-all” class other, however, most likely suf- imal health services in their central database.
fers from having an unclear scope. As a conse-
quence of this evaluation, we decided to reduce the Acknowledgements
ambiguity of other by including additional classes
This work was funded by the Swiss Federal
5
For a discussion of the interpretation of absolute agree- Food Safety and Veterinary Office (Bundesamt für
ment scores see Artstein and Poesio (2008, p. 591) Lebensmittelsicherheit und Veterinärwesen).
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
69
References for Cultural Heritage, Social Sciences, and Human-
ities (LaTeCH), page 47–55, Gothenburg, Sweden,
R. Michele Anholt, John Berezowski, Iqbal Jamal, Carl
April. Association for Computational Linguistics.
Ribble, and Craig Stephen. 2014. Mining free-text
Jeff Friedlin, Shaun Grannis, and J. Marc Overhage.
medical records for companion animal enteric syn-
2008. Using natural language processing to improve
drome surveillance. Preventive Veterinary Medicine,
accuracy of automated notifiable disease reporting.
113(4):417–422.
AMIA Annual Symposium Proceedings, 2008:207–
Ron Artstein and Massimo Poesio. 2008. Inter-coder 211.
agreement for Computational Linguistics. Compu- David Hartley, Noele Nelson, Ronald Walters, Ray
tational Linguistics, 34(4):555–596. Arthur, Roman Yangarber, Larry Madoff, Jens
Taxiarchis Botsis, Michael D. Nguyen, Emily Jane Linge, Abla Mawudeku, Nigel Collier, John Brown-
Woo, Marianthi Markatou, and Robert Ball. 2011. stein, Germain Thinus, and Nigel Lightfoot. 2010.
Text mining for the Vaccine Adverse Event Report- Landscape of international event-based biosurveil-
ing System: medical text classification using infor- lance. Emerging Health Threats Journal, 3(e3).
mative feature selection. Journal of the American Klaus Krippendorff. 2013. Content Analysis: An In-
Medical Informatics Association, 18(5):631–638. troduction to Its Methodology. Sage Publications,
John S. Brownstein, Clark C. Freifeld, Ben Y. Reis, and Thousand Oaks, CA, 3rd edition.
Kenneth D. Mandl. 2008. Surveillance Sans Fron- Gonzalo Navarro. 2001. A guided tour to approx-
tieres: Internet-based emerging infectious disease in- imate string matching. ACM Computing Surveys,
telligence and the HealthMap project. PLoS Med, 33(1):31–88, March.
5(7):e151. Naoaki Okazaki and Jun’ichi Tsujii. 2010. Sim-
Wendy W. Chapman, John N. Dowling, and Michael M. ple and efficient algorithm for approximate dictio-
Wagner. 2004. Fever detection from free-text clini- nary matching. In Proceedings of the 23rd Inter-
cal records for biosurveillance. Journal of Biomedi- national Conference on Computational Linguistics,
cal Informatics, 37(2):120–127. COLING ’10, page 851–859, Beijing, China, August.
Brian E. Chapman, Sean Lee, Hyunseok Peter Kang, Martin F. Porter. 1980. An algorithm for suffix strip-
and Wendy W. Chapman. 2011. Document-level ping. Program, 14(3):130–137.
classification of CT pulmonary angiography reports Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel
based on an extension of the ConText algorithm. Kaljurand, Mare Koitand, Kadri Vider, and Neeme
Journal of Biomedical Informatics, 44(5):728–737. Kahusk. 2002. Terminology as knowledge in answer
Peter Christen. 2006. A comparison of personal name extraction. In TKE-2002: 6th International Confer-
matching: Techniques and practical issues. Tech- ence on Terminology and Knowledge Engineering,
nical Report TR-CS-06-02, The Australian National Nancy, France, August.
University, Dec. Suzanne L. Santamaria and Kurt L. Zimmerman. 2011.
Nigel Collier, Ai Kawazoe, Lihua Jin, Mika Shige- Uses of informatics to solve real world problems in
matsu, Dinh Dien, Roberto A. Barrero, Koichi veterinary medicine. Journal of veterinary medical
Takeuchi, and Asanee Kawtrakul. 2006. A multilin- education, 38(2):103–109.
gual ontology for infectious disease surveillance: ra- Kimberly A. Smith-Akin, Charles F. Bearden,
tionale, design and challenges. Language Resources Stephen T. Pittenger, and Elmer V. Bernstam. 2007.
and Evaluation, 40(3-4):405–413. Toward a veterinary informatics research agenda:
Nigel Collier, Son Doan, Ai Kawazoe, Reiko Mat- An analysis of the PubMed-indexed literature.
suda Goodwin, Mike Conway, Yoshio Tateno, Quoc- International Journal of Medical Informatics,
Hung Ngo, Dinh Dien, Asanee Kawtrakul, Koichi 76(4):306–312.
Takeuchi, et al. 2008. BioCaster: detecting public Ralf Steinberger, Flavio Fuart, Erik van der Goot, Clive
health rumors with a Web-based text mining system. Best, Peter von Etter, and Roman Yangarber. 2008.
Bioinformatics, 24(24):2940–2941. Text mining from the web for medical intelligence.
Fernanda C. Dórea, C. Anne Muckle, David Kel- In Françoise Fogelman-Soulié, Domenico Perrotta,
ton, J. T. McClure, Beverly J. McEwen, W. Bruce Jakub Piskorski, and Ralf Steinberger, editors, Min-
McNab, Javier Sanchez, and Crawford W. Revie. ing Massive Data Sets for Security, volume 19 of
2013. Exploratory analysis of methods for auto- NATO Science for Peace and Security Series – D: In-
mated classification of laboratory test orders into formation and Communication Security, page 295–
syndromic groups in veterinary medicine. PLOS 310. IOS Press.
one, 8(3):e57334. Hideyuki Tanushi, Hercules Dalianis, Martin Duneld,
Julia Efremova, Bijan Ranjbar-Sahraei, and Toon Maria Kvist, Maria Skeppstedt, and Sumithra
Calders. 2014. A hybrid disambiguation measure Velupillai. 2013. Negation scope delimitation in
for inaccurate cultural heritage data. In Proceed- clinical text using three approaches: NegEx, Py-
ings of the 8th Workshop on Language Technology ConTextNLP and SynNeg. In Proceedings of the
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
70
19th Nordic Conference of Computational Linguis-
tics (NODALIDA 2013), page 387–397, Oslo, Nor-
way.
Michael M. Wagner, J. Espino, F-C. Tsui, P. Geste-
land, W. Chapman, O. Ivanov, A. Moore, W. Wong,
J. Dowling, and J. Hutman. 2004. Syndrome
and outbreak detection using chief-complaint data
– experience of the Real-Time Outbreak and Dis-
ease Surveillance project. Morbidity and Mortality
Weekly Report, 53:28–31.
Eva Warns-Petit, Eric Morignat, Marc Artois, and Di-
dier Calavas. 2010. Unsupervised clustering of
wildlife necropsy data for syndromic surveillance.
BMC Veterinary Research, 6(56):1–11.