=Paper=
{{Paper
|id=Vol-1179/CLEF2013wn-CLEFER-HellrichEt2013
|storemode=property
|title=The JULIE LAB MANTRA System for the CLEF-ER 2013 Challenge
|pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFER-HellrichEt2013.pdf
|volume=Vol-1179
|dblpUrl=https://dblp.org/rec/conf/clef/HellrichH13
}}
==The JULIE LAB MANTRA System for the CLEF-ER 2013 Challenge==
The J ULIE L AB M ANTRA System for the C LEF -ER 2013 Challenge Johannes Hellrich and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena Jena, Germany johannes.hellrich@uni-jena.de udo.hahn@uni-jena.de Abstract. We here describe the set-up for the system from the Jena Univer- sity Language & Information Engineering (J ULIE ) Lab which participated in the C LEF -ER 2013 Challenge. The task of this challenge was to identify hitherto unknown translation equivalents for biomedical terms from several parallel text corpora. The languages being covered are English, German, French, Spanish and Dutch. Our translation system enhanced, in a realistic scenario, the French, Ger- man, Spanish and Dutch parts of a U MLS-derived terminology with 4k to 15k new entries each. Based on expert assessment of the new German translations about 76% of these were judged as plausible term enhancements. 1 Introduction The task underlying the C LEF -ER 2013 Challenge is to identify translation equiv- alents for biomedical terms from several parallel text corpora. Terms from this sublanguage are either single-word (e.g. ‘appendicitis’) or multi-word terms (e.g. ‘blood cell’). The task not only encompasses the recognition of literal mentions of these terms but also includes a grounding task, namely to determine unique identifiers for the recognized terms in an authoritative biomedical terminology, the Unified Medical Language System (U MLS).1 The U MLS is an umbrella system for a huge collection of various specialized do- main terminologies for human anatomy, diseases, drugs, clinical care, etc. that was originally developed for the English language but has subsequently been complemented by translations into a large variety of other languages. Whereas the English U MLS is a rather complete collection of biomedical terms, the non- English versions lack this property to different degrees though. Hence, the proper enhancement of the lexical and conceptual coverage of the non-English U MLSes constitutes a rewarding goal which was picked up by the C LEF -ER 2013 Chal- lenge organizers2 for the following languages: English, German, French, Spanish and Dutch. From the perspective of biomedical natural language processing the task com- bines aspects of both named entity recognition (NER) and machine translation 1 http://www.nlm.nih.gov/research/umls/ 2 https://sites.google.com/site/mantraeu/clef-er-challenge (MT). Our system leans more towards the MT side, since we try to solve the challenge by enriching a U MLS-derived terminology with new translations ex- tracted from the parallel texts. This enriched terminology is thereafter used to annotate the corpora with a gazetteer. This eases both linking new entities with the identifiers used in the terminology and a human review of the system’s output, since we produce some thousands of new terms to review instead of hundreds of thousands of annotations. Our terminology translation system works by combining phrase-based statistical machine translation (SMT) with named entity recognition. For this task, we ex- ploit the M ED L INE and E MEA biomedical parallel corpora (see details below) for all relevant language pairs. Our translation system enhanced, in a realistic scenario, the French, German, Spanish and Dutch parts of the U MLS-derived ter- minology with 4k to 15k new entries each. Based on expert assessment of the new German translations about 76% of these were judged as plausible term en- hancements. 2 J ULIE Lab’s M ANTRA System This section describes the translation part of J ULIE Lab’s M ANTRA system, where we distinguish preparatory steps (Section 2.1) from the candidate generation (Section 2.2) and candidate filtering steps (Section 2.3) and, finally, turn to the results of applying our system to the challenge data (Section 2.4). A major design requirement during system development was to reuse existing and widely used software, often developed in other contexts, and keep the system as domain- and language-independent as possible. Accordingly, we equipped J ULIE Lab’s M ANTRA with the L ING P IPE3 gazetteer, G IZA ++ and M OSES4 [1, 2] for phrase-based SMT, JC O R E for biomedical NER [3] and, finally, W EKA5 [4] for learning a maximum-entropy model to combine NER and SMT information. 2.1 Preparatory Steps To generate training data for the translation part of our system we merged the M EDLINE R 6 and E MEA [5] parallel corpora provided for the C LEF -ER 2013 Challenge, resulting in one file per language pair. These files were then annotated for those biomedical entities already contained in the U MLS-derived C LEF -ER terminology,7 using a L INGPIPE-based gazetteer. 10% of each corpus were taken apart and used to train the JC O R E NER engine. The remaining 90% of the cor- pora were used to train a phrase-based SMT model with G IZA ++ and M OSES. 3 http://alias-i.com/lingpipe/ 4 http://www.statmt.org/moses/ 5 http://www.cs.waikato.ac.nz/˜ml/weka/ 6 http://mbr.nlm.nih.gov/Download/ 7 For technical and data consistency purposes, the original U MLS terminology was slightly curated and reformatted. This specific U MLS version is available at https://sites. google.com/site/mantraeu/terminology 2.2 Candidate production The model created by G IZA ++ contains phrase pairs, such as ‘an indication of tubal cancer’ → ‘als Hinweis auf ein Tubenkarzinom’, and several trans- lation probabilities for each pair, namely inverse phrase translation probability φ(f |e), inverse lexical weighting lex(f |e), direct phrase translation probability φ(e|f ) and direct lexical weighting lex(e|f ). The probabilities describe condi- tional probabilities for a phrase in the target language e (either German, French, Spanish or Dutch) being a translation of a phrase in the source language f (En- glish). Translation candidates for enriching the U MLS were produced by filtering phrasal pairs such that only those were retained which translated a known English synonym to a biomedical concept in one of the target languages. 2.3 Candidate Filtering We filtered the candidates produced in the previous step by using W EKA to train a maximum-entropy model as a selector that keeps or discards tentative trans- lations. We used as features the phrase probabilities from the SMT model, the NER system’s judgment for each candidate being a recognized named entity (thus removing biomedical non-terms ) and, for our final submission only, the ratio between the respective character lengths of the translated synonym and its translation equivalent. This ratio was logarithmized to keep features on a similar scale. The NER system’s judgment was normalized over all sentences containing the phrase in question by summing the probabilities for it being a named entity and dividing by the total number of sentences, taking ‘0’, if no match was found: Psentences EntityP robabilityi i=0 pN E = sentences The annotations for the already known translations, generated in step 2.1 were used as training material. 2.4 Results & Evaluation Those translation candidates accepted by the filter and not yet contained in the U MLS were added to the dictionary used by our gazetteer system. We generated two new dictionaries, one with and one without the length ratio as a feature, and used these to annotate our submissions. To evaluate our translation subsystem prior to submission, we used both expert judgment and an automatic benchmark, dealing with new and already known ter- minology, respectively. In the first one, translations not yet contained in the U MLS were judged by a biomedical expert.8 In the second setting, we measured the sys- tem’s performance in recreating portions of the already available versions of the U MLS, i.e. we matched suggested translations with entries already contained in the terminology. This evaluation was done concept-wise, i.e. a word which is a synonym for multiple concepts or a translation thereof was examined multiple times. To measure the effects of different system configurations we used preci- sion, recall and F1 -score calculated as follows: 8 A bioinformatics graduate student utilizing online medical dictionaries. correct translations precision = proposed translation correct translations recall = traceable translations precision∗recall F1 -score = 2 precision+recall A translation was counted as correct, if the translation was contained in the set of synonyms in the respective language for the U MLS concept. A translation was considered as traceable, if the corresponding concept was annotated in two par- allel sentences by the gazetteer system. We evaluated the translation of each syn- onym of a U MLS entry independently. These criteria aim to measure how much of the U MLS which could have been reconstructed from the parallel corpus was actually reconstructed by our system. Expert judgment was collected based on a sample of 100 English-German term translations created with all features except the length ratio. 76% of these new translations were judged as being correct and reasonable from a domain knowl- edge perspective. Automatic analysis was performed for all languages, results are listed in the following table, with a baseline system performing no filtering of term candidates listed for comparison. Both configurations of the J ULIE Lab’s JULIE L AB ’ S M ANTRA system Language Measurement Baseline without length ratio with length ratio F1 -score 0.13 0.61 0.57 French Precision 0.07 0.58 0.61 Recall 0.98 0.64 0.53 F1 -score 0.14 0.66 0.65 German Precision 0.07 0.61 0.69 Recall 0.98 0.73 0.60 F1 -score 0.19 0.72 0.73 Spanish Precision 0.11 0.62 0.63 Recall 0.97 0.85 0.86 F1 -score 0.15 0.79 0.81 Dutch Precision 0.08 0.70 0.76 Recall 0.98 0.89 0.87 Table 1. Evaluation by partial U MLS recreation. We list results comparing our submissions with a baseline system without candidate filtering. M ANTRA system are clearly superior to the baseline in all regards except recall. The system with length ratio provides similar F1 -scores to the one without, yet we suppose it to be more adequate for the C LEF-ER challenge, due to its higher precision. 3 Related work Prior efforts to use parallel corpora for terminology translation were performed by Déjean et al. [6] and Deléger et al. [7, 8], with German and French as the tar- get languages, respectively. Both studies report precision values of about 80%. However, due to inconsistent evaluation strategies used in the literature, the influ- ence of the used resources as well as language-specific system design and tuning decisions, it is hard to generalize from these results. Terminology extraction in the biomedical field is tricky, as most terms are not merely single words (e.g. ‘appendicitis’) but multi-word expressions (MWEs), like ‘Alzheimer’s disease’ or ‘acquired immunodeficiency syndrome’. Approaches towards finding MWEs can be classified as either pattern-based, using e.g. man- ually created part-of-speech (POS) patterns, or statistically motivated, utilizing e.g. phrase alignment techniques. The former solution (used e.g. by Déjean et al. [6] or Bouamor et al. [9]) suffers from the need to supply POS patterns which are often hand-crafted and may become cumbersome to read and write as the pattern set keeps growing. Statistical approaches circumvent this dilemma and can use e.g. the translation probabilities of the single words of a MWE (treated as a bag of words) [10] or some kind of phrases. These can either be linguistically moti- vated, i.e. use POS information [11], or be purely statistical and derived from the translation model produced by a phrase-based SMT system [12], just like in our system. 4 Conclusion We described a system using SMT and NER to generate new entries for multilin- gual biomedical terminologies. A terminology enriched this way can be used to improve the annotation of raw language data corpora. A direct evaluation of terminology enrichment systems like ours is complicated for two reasons. First, a missing standard metric—some report only precision based on the number of correct translations produced by their system [7], while others issue F-scores based on the system’s ability to reproduce a (sample) termi- nology [6]—and, second, the rather strong influence of the chosen terminology and corpora on a system’s performance. The C LEF -ER 2013 Challenge will al- low us to overcome this problem, by enabling extrinsic comparison based on the annotations provided by the different systems. Acknowledgments This work is funded by the European Commission’s 7th Framework Programme for small or medium-scale focused research actions (STREP) from the Informa- tion Content Technologies Call FP7-ICT-2011-4.1, Challenge 4: Technologies for Digital Content and Languages, Grant No. 296410. References 1. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1) (2003) 19–51 2. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Con- stantin, A., Herbst, E.: M OSES: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume. Proceedings of the In- teractive Poster and Demonstration Sessions, Prague, Czech Republic, June 2527, 2007. Association for Computational Linguistics (2007) 177–180 3. Hahn, U., Buyko, E., Landefeld, R., Mühlhausen, M., Poprat, M., Tomanek, K., Wermter, J.: An overview of JC O R E, the J ULIE Lab U IMA compo- nent repository. In: Proceedings of the LREC’08 Workshop ‘Towards En- hanced Interoperability for Large HLT Systems: UIMA for NLP‘, Mar- rakech, Morocco, 31 May 2008. Paris: European Language Resources As- sociation (ELRA) (2008) 1–7 4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The W EKA data mining software: An update. ACM SIGKDD Explorations 11(1) (2009) 10–18 5. Tiedemann, J.: News from O PUS: A collection of multilingual parallel cor- pora with tools and interfaces. In: Recent Advances in Natural Language Processing. Volume V. Amsterdam, Philadelphia: John Benjamins (2009) 237–248 6. Déjean, H., Gaussier, E., Renders, J.M., Sadat, F.: Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval. Artificial Intelligence in Medicine 33(2) (2005) 111–124 7. Deléger, L., Merkel, M., Zweigenbaum, P.: Enriching medical terminologies: an approach based on aligned corpora. In Hasman, A., Haux, R., van der Lei, J., De Clercq, E., Roger France, F.H., eds.: MIE 2006 – Proceedings of the 20th International Congress of the European Federation for Medical Informatics. Volume 124 of Studies in Health Technology and Informatics. Maastricht, The Netherlands, August 27-30, 2006. Amsterdam: IOS Press (2006) 747–752 8. Deléger, L., Merkel, M., Zweigenbaum, P.: Translating medical terminolo- gies through word alignment in parallel text corpora. Journal of Biomedical Informatics 42(4) (2009) 692 – 701 9. Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi- word expressions for statistical machine translation. In: LREC 2012 – Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 21-27 May 2012. European Language Re- sources Association (ELRA) (2012) 674–679 10. Vintar, S̆pela: Bilingual term recognition revisited; the bag-of-equivalents term alignment approach and its evaluation. Terminology 16(2) (2010) 141– 158 11. Lefever, E., Macken, L., Hoste, V.: Language-independent bilingual termi- nology extraction from a multilingual parallel corpus. In: EACL 2009 – Proceedings of the 12th Conference of the European Chapter of the Asso- ciation for Computational Linguistics, Athens, Greece, 30 March 3 April, 2009. Association for Computational Linguistics (2009) 496–504 12. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL’03 – Proceedings of the 2003 Human Language Technology Con- ference of the North American Chapter of the Association for Computational Linguistics. Volume 1., Edmonton, Alberta, Canada, May 27 - June 1, 2003. Association for Computational Linguistics (2003) 48–54