Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 61 Constructing a Syndromic Terminology Resource for Veterinary Text Mining Lenz Furrer Susanne Küker John Berezowski Institute of Veterinary Public Health Department of Clinical Research Computational Linguistics Institute and Veterinary Public Health University of Zurich University of Bern University of Bern lenz.furrer@uzh.ch susanne.kueker john.berezowski @vetsuisse.unibe.ch @vetsuisse.unibe.ch Horst Posthaus Flavie Vial Fabio Rinaldi Institute of Animal Pathology Veterinary Public Health Institute Institute of University of Bern University of Bern Computational Linguistics horst.posthaus flavie.vial University of Zurich @vetsuisse.unibe.ch @vetsuisse.unibe.ch fabio.rinaldi@uzh.ch Abstract detecting mentions of fever in free-text clinical records. Similarly, the BioCaster system (Collier Public health surveillance systems rely on the et al., 2006; Collier et al., 2008) relies on a care- automated monitoring of large amounts of fully constructed medical ontology combined with text. While building a text mining system a Naïve-Bayes classifier as an input filter. Friedlin for veterinary syndromic surveillance, we ex- ploit automatic and semi-automatic meth- et al. (2008) use a regular-expression based term- ods for terminology construction at different extraction system to find positive and negative stages. Our approaches include term extrac- mentions of methicillin-resistant Staphylococcus tion from free-text, grouping of term variants aureus in culture reports. Hartley et al. (2010) give based on string similarity, and linking to an an overview of surveillance systems that mainly fo- existing medical ontology. cus on world-wide monitoring of web sources, in- cluding news feeds and informal medical networks. 1 Introduction The text mining of veterinary reports faces ad- ditional challenges such as multiple species and In the project Veterinary Pathology Text Mining, a less controlled vocabulary (Smith-Akin et al., we are developing tools to exploit veterinary post- 2007; Santamaria and Zimmerman, 2011). Up to mortem data for epidemiological surveillance and this point, approaches for classifying veterinary di- early detection of animal diseases. This paper de- agnostic data into syndromes for surveillance have scribes the work in progress on the construction of been restricted to the use of rule-based classifiers a veterinary terminology resource as a basis for a (Dórea et al., 2013; Anholt et al., 2014). To build text mining tool to classify, with minimal human these classifiers, a group of experts manually cre- intervention, free-text veterinary reports with re- ates a large set of rules. The rules are then used spect to multiple clinical syndromes that can be to classify veterinary diagnostic submissions into monitored. syndromes based on the presence or absence of In human medicine, text mining has been suc- specific words within various fields in the diagnos- cessfully applied to clinical records in many pub- tic submission data. lic health surveillance systems (Botsis et al., 2011; Steinberger et al., 2008; Brownstein et al., 2008; We propose to develop a process for using text Wagner et al., 2004). The approaches range mining methodologies (natural language process- from hand-written rule-based systems to fully au- ing) to efficiently extract relevant health informa- tomated methods using machine learning. For tion from veterinary diagnostic submission data example, Chapman et al. (2004) use heuristical with minimal human intervention. Given a suf- keyword-driven as well as supervised machine ficient amount of data (i. e. at least a few hun- learning techniques (Naïve-Bayes classifier) for dreds of manually classified reports), a machine Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 62 learning approach will allow us to directly classify 2.1 Syndrome and Diagnosis Classification these data into syndromes that can be monitored The work described here is based on post-mortem for surveillance. reports that were compiled by the Institute of Ani- As recognized in the Swiss Animal Health Strat- mal Pathology (ITPA) of the Vetsuisse faculty at egy 2010+, methods for early disease detection, the University of Bern. The data were entered based on the increasing abundance of data on ani- into a database by veterinary pathologists between mal health stored in national databases, can con- 2000 and 2011. We used a subset of approximately tribute to valuable and highly efficient surveil- 9 000 report entries regarding pigs and cattle. The lance activities. Post-mortem data, available from reports are written in German, with a small frac- pathology services, are often under-exploited. The tion (less than 3 %) in English and French. main purpose of post-mortem investigations of For subsequent quantitative analysis, we clas- food production animals is to provide information sified all reports using two categorization lev- about the cause of disease or death with regard els. As a coarse-grained categorization, we anno- to treatment, and prevention options for the af- tated each report with the syndromic groups that fected herd. Besides these major diagnoses, all were affected by a medical issue. Each report additional pathological findings are also recorded was assigned zero, one or more of 9 syndrome as text and electronically archived as necropsy re- categories (gastro-intestinal, respiratory, urinary, ports. In addition to the value of this information cardio-vascular, lymphatic, musculo-skeletal, re- for veterinarians and farmers, systematic evalua- productive, neural, other). This categorization ap- tion of necropsy data may be of value the early proximately meets the level of granularity found in detection of spatio-temporal clusters of syndromes other work (Dórea et al., 2013; Warns-Petit et al., which may result from a new disease emerging into 2010). For a finer-grained categorization of the re- a population or from changing patterns of endemic ports, we additionally annotated post-mortem di- diseases. As such, it has the potential to be of value agnoses mentioned (directly or implicitly) in the for both nation-wide and international (veterinary) reports, such as enteritis, lipidosis, or injuries from public health early-warning systems. foreign bodies. The set of diagnoses was not de- The rest of this paper is organized as follows: fined a priori, but continuously updated in the clas- We present our efforts in constructing and exploit- sification process. The final set comprised some ing a veterinary terminology resource in Section 2. 100 classes and is shown in Table 1. The diagnoses Section 3 describes our work towards report clas- are modeled as subcategories of the syndromes. sification in the context of building a surveillance While some category names occur in more than tool. The next steps and further application scenar- one syndromic category, it does not mean that they ios are given in Section 4. are ambiguous, as they are triggered by different terms. For example, atresia is classified as a con- 2 Terminology Construction genital abnormality of the gastro-intestinal system, In the process of report classification, we have put whereas the ventricular septal defect is a congeni- a lot of effort in the construction of a terminology tal abnormality of the cardio-vascular system. resource that suited our needs. The resulting term 2.2 Term Normalization inventory is tailored to a very specific task. Still, the methods, insights and even the resource itself The medical reports have a high number of sur- can be of use for other applications. Similar to the face variants per term. The variation is caused by work by Rinaldi et al. (2002), we extracted a set of inflection, inconsistent spelling and typographical terms from a collection of raw text and used auto- errors. On a higher level, variation is increased matic methods to organize them into a hierarchical by synonymy, i. e. the use of different terms for structure. Section 2.1 introduces the categories we the same concept (e. g. Lipidose/Verfettung ‘lipi- used for classification. In Sections 2.2 and 2.3, we dosis’). From the perspective of the given text min- describe the steps that led to the construction of the ing task, certain derivative forms can be consid- term inventory. Sections 2.4 and 2.5 show how this ered synonymous variants as well (e. g. Ulzeration resource can be automatically enhanced for a more besides Ulkus). general usage. We split the report texts into tokens, which we Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 63 gastro-intestinal perforation 35 cystitis 56 congenital hydrocephalus 18 abomasal ulcer 178 pharyngitis 9 hydronephrosis 30 abnormality 30 intoxication 5 abomasitis 120 proctitis 13 nephritis 416 fracture 94 meningitis 226 acidosis 41 reticulitis 98 renal luxation 9 myelitis 13 cheilitis 6 rumenitic ulcer 4 degeneration 116 myodegen- neural cholangitis 84 rumenitis 92 trauma 8 eration 70 degeneration 54 colitis 368 sialoadenitis 2 urolithiasis 75 myopathy 56 neuropathy 78 congenital steatorrhea 105 cardio-vascular osteochondrosis 30 other abnormality 21 stenosis 10 cardiomyopathia 46 osteomyelitis 136 crushed 81 dilatation 159 stomatitis 57 congenital polyarthritis 343 dermatitis 184 displaced trauma 25 abnormality 84 synovitis 45 enterotoxemia 285 abomasum 55 typhlitis 37 endocarditis 179 tendinitis 25 eye related 22 duodenitis 12 volvulus 479 epicarditis 62 tendovaginitis 18 foreign body 118 enteritis 2458 respiratory heart reproductive hernia 73 esophagitis 46 bronchiolitis 256 degeneration 50 abortion 642 hydrothorax 104 gastric ulcer 206 bronchitis 466 hydropericard 319 congenital inanition 74 gastritis 121 broncho- myocarditis 77 abnormality 8 intoxication 95 glossitis 16 pneumonia 1040 pericarditis 427 dystocia 9 iron deficiency 65 hepatitis 200 laryngitis 23 pleuritis 41 metritis 70 mastitis 66 HIS 361 pharyngitis 1 perforation 4 neoplasia 68 Hoflund lymphatic pleuritis 40 lymph- placentitis 126 otitis 20 syndrome 10 pneumonia 769 retained placenta 3 perforation 257 icterus 39 adenopathy 245 rhinitis 11 splenitis 77 uterine peritonitis 866 ileitis 60 rhinitis perforation 4 pleuritis 643 invagination 59 tonsillitis 88 atrophicans 196 uterine torsion 3 pododermatitis 19 jejunitis 30 sinusitis 8 musculo-skeletal vaginitis 8 polyserositis 297 lipidosis 93 tracheitis 28 arthritis 231 rumen drinker 33 obstipation 13 arthrosis 31 neural urinary congenital sepsis 647 omasitis 18 bone splenic torsion 18 pancreatitis 2 congenital degeneration 17 abnormality 5 abnormality 1 encephalitis 116 umbilicus parasites 13 callus 14 related 117 Table 1: The diagnoses used for classification, grouped by syndrome, with number of occurrences. defined as consecutive runs of alphanumeric char- k. The complexity of this rule is owed to the fact acters or hyphens. We then performed a series of that this normalization is applied to all words, i. e. normalization steps in order to reduce the number including originally German words like Kinn/Zinn of term variants when compiling an index. ‘chin’/‘tin’, which would be confused by an uncon- ditional conflation of c, k, z. As a side effect, the The bulk of the spelling variation stems from normalization of German terms occasionally cap- Latin/Greek-originated terms, such as Zäkum ‘ce- tured closely spelled English terms (which were cum’. Besides the German spelling (using the let- not systematically gathered), such as Enzephali- ters ä, ö, z/k), the Latin spelling is often used (ae, tis/encephalitis. oe, c, respectively), and even combinations of the Subsequently, we removed inflectional suffixes two are encountered. For the previous example, using the NLTK1 implementation of the “Snow- the following variants are present, among others: ball” stemmer for German (Porter, 1980). Stem- Caecum, caecum, Cäcum, Cäkum, Zaecum. We ming is the process of removing inflectional and normalized the usage of these letters by replacing ä (partially) derivational affixes, thus truncating with ae and ö with oe unconditionally, while treat- words to their stems. For example, minimally and ing c differently based on its right context: be- minimize are both reduced to minim in Porter’s En- fore a front vowel it was replaced by z, before h glish stemmer, which is not a proper word, but nev- and k it was kept as c, and in all other cases (in- 1 cluding word-final position) we replaced it with Natural Language Toolkit: www.nltk.org Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 64 variants normalized form explanation Zäkumtorsion, Caecumtorsion zaekumtorsion } ä/ö/c/k/z normalization Kokzidiose, Coccidiose kokzidios Aborts, Abort, Abortes, Aborte, Aborten abort stemming perforierter Ulcus, perforierten Ulkus perforiert ulkus } both Kardiomyopathie, Cardiomyopathie, Kardiomyopathien kardiomyopathi Table 2: Normalization examples. ertheless a useful key for lumping together etymo- ized form, a number of variant forms is already logically related words. matched, as described above. We aimed to addi- Stemming is based on orthographical regulari- tionally cover variants produced by misspellings ties and uses only a minimal amount of lexical in- as well as inflected forms not recognized by the formation. Although the method is not flawless – stemmer. Using approximate string matching, we it may be prone to errors with very short and ir- searched the reports for similar terms for each of regularly inflected words – it generally works well the focus terms. We used the simstring tool for languages with alphabetic script and has been (Okazaki and Tsujii, 2010) for retrieving similarly successfully applied to many European languages. spelled terms among the entire text collection. Ap- Using a stemmer, we were able to considerably re- proximate matching is a difficult task, as it is hard duce the number of inflectional/derivational vari- in general to formally define similarity among (the ants. However, a number of inflectional forms orthographical representations of) words in a way were still missed by the stemmer – especially plural consistent with human judgement. simstring forms with Latin inflection, such as Ulkus/Ulzera, measures similarity as a function of the number or Enteritis/Enteritiden, which are not covered by of shared n-grams (runs of n characters) in two the stemming rules for general German grammar. words, which is only a rough approximation of The stemmer also failed to capture most of the the task. However, compared to other similarity spelling errors. Table 2 illustrates the conflation measures – e. g. Levenshtein’s edit distance2 – it with examples. is considerably more efficient for retrieval in large amounts of text. In the inevitable trade-off of good 2.3 Focus Terms precision and high recall, we strove for recall by For the syndromic classification of the veterinary choosing a low similarity threshold for retrieval. reports, we manually created a list of focus terms As expected, this resulted in a high number of hits, which served as indicators for the clinical syn- including many false positives, i. e. words with a dromes and diagnoses. Starting from a frequency- high n-gram similarity score, that are not actually ranked list of the words found in all of the reports similar to the input term (e. g. arthritis and arteri- (already grouped by their normalized form), we tis). Due to the limited number of focus terms it manually selected terms that were likely to indicate was feasible to manually clean the list of similar (positive) diagnoses in the reports. The list was re- words. fined by inspecting the reports that produced hits Figure 1 illustrates how term variants were gath- for the focus terms. ered around the concept of a diagnosis. A num- The focus terms typically consist of a single to- ber of synonymous and hyponymous terms were ken, but we also allowed multi-word expressions. added to a specific diagnosis by a human ex- The terms are grouped by diagnosis. Thus, each pert. These terms were used as seeds to automat- diagnosis refers to a set of terms which either con- ically find more variants, such as inflectional and stitute a common name of the diagnosis or describe spelling variants as well as misspellings. Please some of its aspects. For example, concerning in- note that the labeled edges are only added for il- juries caused by foreign bodies, we consider Draht lustration purposes – the relations between term ‘wire’ and Nagel ‘nail’ as focus terms, even though 2 For a study of agreement between human judgement and these words only refer to the cause, but not to the different similarity measures, see e. g. Efremova et al. (2014); injuries themselves. for a general overview of similarity measures cf. Navarro As each focus term is represented by its normal- (2001) and Christen (2006). Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 65 diagnosis Volvolus volvulus Vovulus canonical term misspellings Voluvlus Volvulus Darmtorsionen synonym inflection Darmtorsion hyponym hyponym Caecumtorsion spelling Dünndarmtorsion variants synonyms Zäkumtorsion Dünndarmdrehungen Zäkumstorsion Dünndarmvolvulus Caecum­Torsion Figure 1: Term variants for the diagnosis volvulus. forms (such as synonym, misspelling) were not captured during this phase, as they were not needed for syndrome/diagnosis classification. However, (a) (b) we examined ways to partly recover this underly- ing structure in an automated way, as is described Figure 2: Ontology matching before (a) and after (b) in the following sections. term conflation. In both graphics, the left-hand side represents a diagnosis as a set of terms, some of which 2.4 Further Term Conflation are linked to a UMLS concept (connected bullets) on the right-hand side. The UMLS Metathesaurus3 is a large collection of various medical terminology resources. One of its key features is the assignment of unique con- granularity hinders the exploitation of the linked cept identifiers to entries from different vocabular- information, as the meaning of many diagnoses ap- ies in many languages, thus establishing equiva- pears highly ambiguous in terms of the Metathe- lence relations across them. By creating links to saurus. In order to better match the semantic range Metathesaurus concepts, we can enrich our own of the UMLS concepts, we passed on to perform terminology resource with information contained the mapping at the level of terms rather than di- in the Metathesaurus, as well as making it more agnoses. This required us to add a hierarchical valuable when sharing it with others. layer to our data structure: We needed to distin- We used the 2014AA release of the Metathe- guish term variants (spelling and inflectional al- saurus for this work. For each concept that was ternations, such as Caecumtorsion vs. Zäkumstor- represented in a German vocabulary, we normal- sion) from separate terms (e. g. Zäkumstorsion vs. ized its lemma and tried to match it against an en- Darmtorsion). Please note that synonyms such try among our focus terms. With this approach, we as Darmtorsion and Darmdrehung are considered were able to establish a link to one or more UMLS separate terms, even though they have the same concepts for 80.6 % of the diagnoses. meaning. Since our data were organized by diagnosis, For each diagnosis, we organized all term forms each covering a number of terms with sometimes into groups of term variants. The arrangement was quite disparate meanings, the connection to the performed automatically, based on string similar- Metathesaurus produced a high number of one-to- ity. While string similarity is only an unreliable many mappings (cf. Figure 2). This difference in approximation of human similarity judgement, and 3 www.nlm.nih.gov/pubs/factsheets/ while there are a number of concurring ways of umlsmeta.html computing it, it is also difficult to determine a Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 66 1 generation’ and Muskelfaserdegeneration ‘muscle 0.9 fiber degeneration’, which might even be regarded equal in a less strict evaluation. As for the false 0.8 negatives, the number of misses could be reduced 0.7 by extending the stemmer with Latin-inflection 0.6 endings like Ulkus – Ulzera. Levenshtein ratio 0.5 2.5 Connecting to UMLS 0.4 0.3 Each group of term variants was then linked to a variant pairs UMLS concept if there was a match between at 0.2 non­variant pairs least one member of the group (i. e. a term vari- 0.1 ant) and of the German concept descriptions, re- spectively. Only exact agreement of the normal- 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ized forms was counted as a match, as preliminary character­trigram cosine similarity experiments had shown that fuzzy matching intro- duced a great amount of false positives (connec- Figure 3: Two different similarity measures for pairs of similar and dissimilar words. tions between similarly spelled, but otherwise un- related words) while adding only very few desired links. However, we were able to improve the link- threshold that clearly separates similar from dis- age with simple heuristics, such as the removal of similar pairs of words. We therefore chose to per- boilerplate expressions like nicht näher bezeichnet form supervised machine learning, i. e. automatic ‘not otherwise specified’. learning by example. We compiled a training set In 42.1 % of the terms, we could find a match of positive instances of inflectional/spelling alter- with a UMLS concept. Only 6.7 % of the match- nation as well as negative instances, i. e. pairs of ing terms point to more than one concept, which unrelated words. For each pair, we computed two means that 93.3 % of the terms with a match can different string similarity measures (cf. Figure 3): be mapped to the Metathesaurus unambiguously. cosine similarity of character trigram vectors, and However, for more than half of the terms no cor- Levenshtein ratio. These two measures cover dif- responding UMLS concept could be found at all, ferent aspects of similarity, and thus their combi- which is mainly due to the different domains of nation might capture more information than just our veterinary texts and the predominantly human- one of them. We trained a Support Vector Ma- medicine-based UMLS. Table 3 shows some ex- chine on the two-dimensional space of the similar- amples of the mapping. ity measures, using a polynomial kernel function. The automatic term grouping yielded very satis- The connections to the Metathesaurus allowed factory results. We manually evaluated the result- us to further enrich our data. For example, ev- ing groups, requiring that all members be ortho- ery UMLS concept has a semantic type assigned graphical or inflectional variations of each other. to it, such as “Disease or Syndrome” or “Patho- We also allowed derivational variants (e. g. Weiss- logic Function”. Additionally, we used the concept muskelkrankheit/…erkrankung ‘white muscle dis- descriptions in Metathesaurus to find more focus ease’) to be in the same group, although the terms. By matching the descriptions of connected separation of derivatives (e. g. Ulkus/Ulzeration) concepts against our text collection, we were able was not counted as false negative. We found to enlarge the set of focus terms by almost 10 %. that less than 6.7 % of the groups contained un- As next steps, we plan to create links to other equal terms (false positives), and only 1.9 % of widely-used terminology resources, such as the the groups were erroneously isolated instead of Central key for health data recording by the being merged with the correct equivalents (false International Committee for Animal Recording negatives). Many false positive judgements were (ICAR).4 caused by terms with only small differences in 4 meaning, such as Muskeldegeneration ‘muscle de- See www.icar.org Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 67 diagnosis/terms UMLS stenosis Darmstenose C0267465 Darmstenose/Darmstriktur/Stenose des… Dünndarmstenose C0151924 Dünndarmstenose/Stenose des Dünndarms Rectumstenose, Rektumstenose – myodegeneration Belastungsmyopathie – Muskelfaserdegeneration, Muskelfaserndegeneration C0234958 Muskeldegeneration/Degeneration des … Muskelfasernekrose – Muskelläsionen – Muskelnekrose, Muskelnekrosen C0235957 Muskelnekrose/Myonekrose Myodegeneration, myodegeneration – Myonekrose, myonecrosis C0235957 Muskelnekrose/Myonekrose Rhabdomyolyse C0035410 Rhabdomyolyse Weiss-Muskel-Krankheit, Weiss-Muskelkrankheit, C0043153 Muskeldystrophie, nutritive/ Weissmuskelerkrankung, Weissmuskelkrankeit, Weißmuskelkrankheit Weissmuskelkrankheit Table 3: Mapping to the UMLS Metathesaurus. 3 Annotation Tool tion scope detection in Swedish clinical reports. According to them, “[e]mploying a simple, rule- The terminology resource described above is a based approach with a small amount of negation key component in our efforts to create a veteri- triggers and a fixed context window for determin- nary surveillance system. We wrote a pipeline of ing scope is very efficient and useful, if results Python scripts that assists our semi-automatic an- around 80 % F-score are sufficient for a given pur- notation of the pathology reports. The tool per- pose” (Tanushi et al., 2013, p. 393). We included a forms automatic annotation of syndromes and di- simple negation-detection module in our pipeline, agnoses based on the term resource, while also which looks for a set of negative expressions in a keeping track of manual verfications and rejec- context window of 5 tokens to either side of the tions. Through a web interface, it accepts a Mi- focus term. The context can be restricted for each crosoft Excel workbook as input and produces a expression (e. g. only to the right of or only imme- modified version in the same format, which allows diately preceding a focus term). The context win- a veterinary domain expert to inspect and modify dow is shortened at sentence boundaries and other the automatic annotations. All relevant informa- indicators of a break. However, as the results of the tion – such as the term resource and the assigned negation detection are not yet satisfactory, we plan categories, negations (see below), and the previous to integrate an existing library for this task, e. g. the manual annotations – are contained within this file. Python package pyConTextNLP (Chapman et al., 3.1 Negation Detection 2011). In a keyword-based system for detecting evidence, 3.2 Inter-Annotator Agreement negative expressions can play a crucial role. Oc- casionally, negative outcomes of an analysis are In order to validate the quality of our annotations, reported in the texts, and suspected diagnoses are we organized a multi-annotator evaluation. We rejected quite frequently, such as keine Hinweise performed an experiment with six experts of vet- auf eine Pneumonie ‘no evidence of a pneumonia’. erinary pathology, which were asked to classify a Therefore, we aimed at identifying occurrences of number of reports with respect to the syndromic focus terms that are mentioned in a negated con- categories described in Section 2.1. For this pur- text. pose, we created a web interface which displayed Besides the identification of negated expres- the report text together with some metadata, one sions, negation detection heavily depends on the report at a time, and allowed to mark each of correct determination of their scope. Tanushi et the syndromes as present or absent. The reports al. (2013) compare different approaches to nega- were randomly sampled, keeping the distribution Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 68 syndrome reports Do De α in the next revision of the syndromic categoriza- gastro-int. 52 (13) 0.059 0.251 0.764 tion. respiratory 28 (10) 0.045 0.207 0.781 urinary 9 (3) 0.014 0.083 0.836 4 Outlook cardio-vasc. 15 (9) 0.041 0.115 0.644 lymphatic 3 (3) 0.014 0.018 0.240 We will assess the performance of the text-mining musc.-skel. 13 (3) 0.014 0.125 0.891 tool based on a small number of diseases which reproductive 9 (1) 0.005 0.094 0.952 have been relevant in Switzerland in the last 10 neural 5 (2) 0.009 0.052 0.825 years: other 38 (21) 0.095 0.226 0.577 avg. 0.723 1. Bovine Viral Diarrhoea in cattle (an eradica- tion campaign for the disease was introduced Table 4: Inter-annotator agreement of the syndromic in 2008) categories, measured with Krippendorff’s Alpha. The second column gives the number of reports where 2. Porcine Circovirus type 2 infection in pigs at least one annotator marked the corresponding syn- drome as present; following in parentheses is the num- 3. Gastro-intestinal syndromes in pigs (for which ber of reports with disagreement. Do and De are the we observe an increasing amount of pathology observed and expected disagreement, respectively. submissions) Time-series analyses will be performed to quantify of species and year of creation as close to the entire trends, seasonality and other effects (day of week, collection as possible (approaching stratified sam- day of month etc.) on the number of submissions pling). Each annotator was provided with a sample for syndromes potentially related to these diseases. of 20 reports, which was extended to twice or three For each disease, “in-control” data (data collected times the size when an annotator asked for more. in the absence of an outbreak) will be used to estab- In order to increase sample size, the same report lish a baseline model describing the amount of nor- was given to only two or three annotators, rather mal “noise” in the data (expected number of sub- than all of them. In total, 81 distinct reports were missions in the absence of disease outbreaks). Ret- annotated. rospective analyses of the time-series will be done We evaluated the inter-annotator agreement to see whether alerts (signals) were produced when with Krippendorff’s Alpha (Krippendorff, 2013, the number of submissions for syndromes poten- pp. 267–309), as is shown in Table 4. For com- tially linked to the disease was higher than ex- puting the agreement, we regarded each syndrome pected from our baseline model (event detection). as an independent, binary variable (each syndrome This will allow us to evaluate whether the system is either present or absent in a report). The agree- would have worked as an early-warning system. ment value α ranges from 1 (perfect agreement) The tools developed in this project will be to 0 (agreement as by chance) or even below (sys- adapted to reports from different pathology in- tematic disagreement). A high agreement means stitutes throughout Switzerland, thus contributing that identifying syndromes is a clear task, while to a nation-wide syndromic surveillance system. a low agreement indicates that the decisions can- Similarly, the methodology developed may be ap- not be easily made. Most of the syndromes have a plicable to the analysis of text-based disease infor- good (>0.8) or acceptable (>0.6) α score,5 whereas mation which is recorded in other contexts. For ex- some are clearly identified as problematic. For the ample, there is a great potential of using such a sys- lymphatic system, the sparse representation (only tem to systematically analyse health data which are 3 reports) does not allow for valid conclusions; recorded by veterinary practitioners in their prac- further investigation is required in this case. The tice management software, slaughter data or by an- “catch-all” class other, however, most likely suf- imal health services in their central database. fers from having an unclear scope. As a conse- quence of this evaluation, we decided to reduce the Acknowledgements ambiguity of other by including additional classes This work was funded by the Swiss Federal 5 For a discussion of the interpretation of absolute agree- Food Safety and Veterinary Office (Bundesamt für ment scores see Artstein and Poesio (2008, p. 591) Lebensmittelsicherheit und Veterinärwesen). Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 69 References for Cultural Heritage, Social Sciences, and Human- ities (LaTeCH), page 47–55, Gothenburg, Sweden, R. Michele Anholt, John Berezowski, Iqbal Jamal, Carl April. Association for Computational Linguistics. Ribble, and Craig Stephen. 2014. Mining free-text Jeff Friedlin, Shaun Grannis, and J. Marc Overhage. medical records for companion animal enteric syn- 2008. Using natural language processing to improve drome surveillance. Preventive Veterinary Medicine, accuracy of automated notifiable disease reporting. 113(4):417–422. AMIA Annual Symposium Proceedings, 2008:207– Ron Artstein and Massimo Poesio. 2008. Inter-coder 211. agreement for Computational Linguistics. Compu- David Hartley, Noele Nelson, Ronald Walters, Ray tational Linguistics, 34(4):555–596. Arthur, Roman Yangarber, Larry Madoff, Jens Taxiarchis Botsis, Michael D. Nguyen, Emily Jane Linge, Abla Mawudeku, Nigel Collier, John Brown- Woo, Marianthi Markatou, and Robert Ball. 2011. stein, Germain Thinus, and Nigel Lightfoot. 2010. Text mining for the Vaccine Adverse Event Report- Landscape of international event-based biosurveil- ing System: medical text classification using infor- lance. Emerging Health Threats Journal, 3(e3). mative feature selection. Journal of the American Klaus Krippendorff. 2013. Content Analysis: An In- Medical Informatics Association, 18(5):631–638. troduction to Its Methodology. Sage Publications, John S. Brownstein, Clark C. Freifeld, Ben Y. Reis, and Thousand Oaks, CA, 3rd edition. Kenneth D. Mandl. 2008. Surveillance Sans Fron- Gonzalo Navarro. 2001. A guided tour to approx- tieres: Internet-based emerging infectious disease in- imate string matching. ACM Computing Surveys, telligence and the HealthMap project. PLoS Med, 33(1):31–88, March. 5(7):e151. Naoaki Okazaki and Jun’ichi Tsujii. 2010. Sim- Wendy W. Chapman, John N. Dowling, and Michael M. ple and efficient algorithm for approximate dictio- Wagner. 2004. Fever detection from free-text clini- nary matching. In Proceedings of the 23rd Inter- cal records for biosurveillance. Journal of Biomedi- national Conference on Computational Linguistics, cal Informatics, 37(2):120–127. COLING ’10, page 851–859, Beijing, China, August. Brian E. Chapman, Sean Lee, Hyunseok Peter Kang, Martin F. Porter. 1980. An algorithm for suffix strip- and Wendy W. Chapman. 2011. Document-level ping. Program, 14(3):130–137. classification of CT pulmonary angiography reports Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel based on an extension of the ConText algorithm. Kaljurand, Mare Koitand, Kadri Vider, and Neeme Journal of Biomedical Informatics, 44(5):728–737. Kahusk. 2002. Terminology as knowledge in answer Peter Christen. 2006. A comparison of personal name extraction. In TKE-2002: 6th International Confer- matching: Techniques and practical issues. Tech- ence on Terminology and Knowledge Engineering, nical Report TR-CS-06-02, The Australian National Nancy, France, August. University, Dec. Suzanne L. Santamaria and Kurt L. Zimmerman. 2011. Nigel Collier, Ai Kawazoe, Lihua Jin, Mika Shige- Uses of informatics to solve real world problems in matsu, Dinh Dien, Roberto A. Barrero, Koichi veterinary medicine. Journal of veterinary medical Takeuchi, and Asanee Kawtrakul. 2006. A multilin- education, 38(2):103–109. gual ontology for infectious disease surveillance: ra- Kimberly A. Smith-Akin, Charles F. Bearden, tionale, design and challenges. Language Resources Stephen T. Pittenger, and Elmer V. Bernstam. 2007. and Evaluation, 40(3-4):405–413. Toward a veterinary informatics research agenda: Nigel Collier, Son Doan, Ai Kawazoe, Reiko Mat- An analysis of the PubMed-indexed literature. suda Goodwin, Mike Conway, Yoshio Tateno, Quoc- International Journal of Medical Informatics, Hung Ngo, Dinh Dien, Asanee Kawtrakul, Koichi 76(4):306–312. Takeuchi, et al. 2008. BioCaster: detecting public Ralf Steinberger, Flavio Fuart, Erik van der Goot, Clive health rumors with a Web-based text mining system. Best, Peter von Etter, and Roman Yangarber. 2008. Bioinformatics, 24(24):2940–2941. Text mining from the web for medical intelligence. Fernanda C. Dórea, C. Anne Muckle, David Kel- In Françoise Fogelman-Soulié, Domenico Perrotta, ton, J. T. McClure, Beverly J. McEwen, W. Bruce Jakub Piskorski, and Ralf Steinberger, editors, Min- McNab, Javier Sanchez, and Crawford W. Revie. ing Massive Data Sets for Security, volume 19 of 2013. Exploratory analysis of methods for auto- NATO Science for Peace and Security Series – D: In- mated classification of laboratory test orders into formation and Communication Security, page 295– syndromic groups in veterinary medicine. PLOS 310. IOS Press. one, 8(3):e57334. Hideyuki Tanushi, Hercules Dalianis, Martin Duneld, Julia Efremova, Bijan Ranjbar-Sahraei, and Toon Maria Kvist, Maria Skeppstedt, and Sumithra Calders. 2014. A hybrid disambiguation measure Velupillai. 2013. Negation scope delimitation in for inaccurate cultural heritage data. In Proceed- clinical text using three approaches: NegEx, Py- ings of the 8th Workshop on Language Technology ConTextNLP and SynNeg. In Proceedings of the Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain) 70 19th Nordic Conference of Computational Linguis- tics (NODALIDA 2013), page 387–397, Oslo, Nor- way. Michael M. Wagner, J. Espino, F-C. Tsui, P. Geste- land, W. Chapman, O. Ivanov, A. Moore, W. Wong, J. Dowling, and J. Hutman. 2004. Syndrome and outbreak detection using chief-complaint data – experience of the Real-Time Outbreak and Dis- ease Surveillance project. Morbidity and Mortality Weekly Report, 53:28–31. Eva Warns-Petit, Eric Morignat, Marc Artois, and Di- dier Calavas. 2010. Unsupervised clustering of wildlife necropsy data for syndromic surveillance. BMC Veterinary Research, 6(56):1–11.