=Paper=
{{Paper
|id=Vol-1515/regular4
|storemode=property
|title=An ontology-based approach for SNOMED CT translation
|pdfUrl=https://ceur-ws.org/Vol-1515/regular4.pdf
|volume=Vol-1515
|dblpUrl=https://dblp.org/rec/conf/icbo/SilvaCS15
}}
==An ontology-based approach for SNOMED CT translation==
An ontology-based approach for SNOMED CT translation Mário J. Silva, Tiago Chaves and Bárbara Simões Instituto Superior Técnico, Universidade de Lisboa and INESC-ID, Portugal ABSTRACT Portuguese as an assisting tool that could be used for the production SNOMED CT is a comprehensive multilingual class hierarchy of of a future standard translation of SNOMED CT. We take the medical terms used in clinical records. Few translations are available, approach of using available classifications and automatic translation but, as new concepts and revisions are continuously being added, services as ontologies that can be aligned and later navigated to the manual translation and revision of the terms will remain a major provide the translations of such technical terms. In our method, we endeavour. We propose a new approach for translating SNOMED CT start by identifying existing alignments between SNOMED CT and terms (or named entities) using ontology mapping methods and other selected ontologies, including the releases of SNOMED CT various existing multilingual resources with translated concepts. Our in different languages. For the Portuguese translation, given its purpose is generating initial candidate translations, already close proximity to Spanish, many terms in the Spanish release of to those proposed by medical experts, to be later used in a SNOMED CT have almost identical spelling. There are other curated translation process. Our method for automatically translating medical terminologies for which multiple translations have been SNOMED CT is being developed for Portuguese, using DBPedia, published, such as ICD (International Classification of Diseases) 2 . ICD-9 and Google Translate as sources of candidate translations of Another major resource is DBPedia, an ontology derived from the clinical terms, which could later be verified. Initial results, using Wikipedia, which is very rich in medical terms (Lehmann et al., a manually translated Portuguese catalog of allergies and adverse 2015). After the collection of these multilingual ontologies and reactions (CPARA) to SNOMED CT as ground truth, show that it has published mappings between their terms, we derive additional high potential. alignments using the ontology mapping algorithms implemented in AgreementMakerLight, a scalable automated ontology matching 1 INTRODUCTION system developed primarily for the life sciences domain Faria et al. SNOMED Clinical Terms1 , or SNOMED CT, is a comprehensive (2013). To obtain correspondences between terms in two distinct multilingual class hierarchy of terms used in clinical records, languages, we can also explore online translation services, such as with extensive overlapping and synonymous descriptions. The the Google Translate Service3 or Microsoft translator4 to generate primary purpose of SNOMED CT is to encode the meanings additional mappings. of the terminology used in health information, supporting the To show the feasibility of the above outlined approach for effective clinical recording of data with the aim of improving automatically generating translations of SNOMED CT terms to patient care. SNOMED CT provides the core general terminology Portuguese, we evaluated the translations obtained with the for electronic health records. With about 300,000 active terms, alignments against the translations of a set of SNOMED CT SNOMED CT spans clinical findings, symptoms, diagnoses, terms that have been mapped by medical experts to terms of procedures, body structures, organisms and other etiologies, the Portuguese catalog of allergies and adverse reactions (SPMS, substances, pharmaceuticals, devices and specimen. 2015). The latest release of CPARA includes curated translations The need to interchange medical records across states is of SNOMED CT terms. The evaluation shows promising results. demanding the development of faster methods to obtain approved, The ontology-mapping translation method achieved an accuracy of standards-based, translations of medical records, in particular 89% and coverage of 37% for the set of 191 terms on the translation SNOMED CT. The standardisation of clinical terms and their of the CPARA vocabulary terms previously hand-mapped to translations to other languages is very important for the unification SNOMED CT (using case-insensitive string comparison). of the electronic health records worldwide. However, the manual translation and revision of the terms, synonyms and definitions to 2 RESOURCES AND RELATED WORK a new language is a major endeavour. SNOMED CT is presently In our work, we used the the 01/2014 International (English) available in US and UK English, Spanish, Danish and Swedish. It distribution of SNOMED CT and the Spanish version dated from is also being translated to several other languages, but there is no April 2014, both provided by the NLM (National Library of translation of SNOMED CT to Portuguese or an official initiative to Medicine) institutional site 5 . The distribution also includes a develop and maintain that translation. Hence, a tool to automatically mapping between ICD-9, a WHO classification of diseases, and translate SNOMED CT to Portuguese would assist in the production SNOMED CT. This mapping can be used to link SNOMED CT of a release to be validated and improved in a subsequent step at a much lower cost than conducting the process manually. As new trannslations, concepts and revisions are continuously 2 http://www.who.int/classifications/icd/en/ being added, the manual translation and revision of the terms 3 https://translate.google.com/ will remain a major endeavour. This paper describes our work on 4 http://www.microsoft.com/translator/ the development of an automatic translator of SNOMED CT to translator-api.aspx 5 http://www.nlm.nih.gov/research/umls/Snomed/ 1 http://www.ihtsdo.org/snomed-ct snomed_main.html Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 1 Silva, Chaves and Simões codes to the ICD-9 Portuguese terms in a translation provided by reported in this paper only used the first layer algorithms to perform the Portuguese Ministry of Health 6 . the alignments. There are no comprehensive medical terminologies for European Medical terms, like named entities in general, can matched using Portuguese. In addition to ICD-9 in European Portuguese, ICD- similarity metrics like the Jaro distance, initially proposed for record 10 has been manually translated to Brazilian Portuguese 7 . linkage systems (Porter and Winkler, 1997). The Jaro distance has There is also an English to Brazilian Portuguese dictionary of been used for the evaluation of automatic translations of named medical terms(Stedman, 2003). However, there are a number entitites. It accounts the number of transpositions between two input of terminological differences between these two variants of the strings and also the number of different characters, resulting in a language. Other terminologies, such as ICPC 8 , have been numeric distance in the [0, 1] range. translated 9 , but they have a much narrower scope than ICD. In computing, the translation of a terminology, such as the 3 SNOMED CT TRANSLATION set of SNOMED CT terms, is an instance of a common task Given that SNOMED CT is mostly used to provide terminology in Natural Language Processing (NLP) , designated as Named for electronic health records, the risks of using an automatically Entity Translation Ling et al. (2011). The task is formulated as generated translation of such large collection of terms without the problem of, given a set of labels (named entities) in a source expert validation are unacceptable. In fact, the SNOMED publisher language, obtaining the translations of these entities in a target provides detailed guidelines for validating the translations made language. Langlais et al. (2008) researched the translation of by medical experts for the official translations available (IHTSDO, medical terms using a bilingual lexicon. Recently,Abdoune et al. 2012). However, we believe that, if the initial quality of the (2013) performed an automatic translation of the CORE subset of automatically generated translation is high, we could later validate SNOMED CT to French by mapping this subset to four French- such candidate translations through a crowdsourcing activity, as translated terminologies integrated in the UMLS Metathesaurus: experimented by Schulz et al. (2013). SNOMED international, ICD10, MedDRA and MeSH. They were able to map 89% of the preferred terms of the CORE Subset of SNOMED CT with at least one preferred term in one of the four terminologies. Other approaches for generating translations have been attempted. Algorithms based on linguistic rules are particularly useful for languages which are poor in language resources, like a recently proposed Basque semi-automatic translation of SNOMED CT (Perez- de Viñaspre and Oronoz, 2014). The algorithm takes an incremental approach: first a lexical translation is attempted; then if a translation is not found, generation/transcription-rules for terms, or chunk-level generation to translate a term token by token are used; finally, a rule-based automatic translation system is used to find a translation. Fig. 1. The translation of SNOMED CT is preceded by a data staging In this work, we explore DBPedia, an ontology derived from phase. Once the data is prepared, translation is carried out using the Wikipedia, as an alternative source of term translations (Lehmann implemented methods. We select the best translation candidate using an et al., 2015). We apply ontology matching methods to align ensemble model trained that selects the best method for each class of DBPedia and SNOMED CT, along with other web-based services, SNOMED CT terms, based on known translations like Google Translate. The DBPedia is a potentially rich resource for medical terms mappings, given that the English and Portuguese Wikipedias are among the largest. To map these ontologies we used Our approach for generating the translations of SNOMED CT AgreementMakerLight, an ontology matching system developed to terms into Portuguese is illustrated in Figure 1. We start by tackle large ontology matching problems, and focused in particular organising two mappings: on the biomedical domain (Faria et al., 2013). This system can 1. SNOMED CT to ICD-9: a correspondence between the codes handle the mapping of very large ontologies, as it is the case of SNOMED CT and codes and descriptions of ICD-9. with SNOMED CT and DBPedia. AgreementMakerLight is derived 2. SNOMED CT to DBPEDIA: a correspondence between from the AgreementMaker ontology matching systems (Cruz et al., SNOMED CT codes and DBPedia (English and Portuguese) 2009). The alignments produced by AgreementMaker combine page URIs, and associated page titles. multiple matching algorithms, in three layers: the first layer uses The first mapping is derived the SNOMED CT to ICD-9 string matching methods to identify similar labels, the second mapping included in the from the UMLS distribution, which matches ontology structures, and the third layer combines the results includes the correspondence between SNOMED CT and ICD- from the matchers in the first two layers. The initial experiments 9 codes. For the second mapping, the matching algorithms implemented in AgreementMakerLight can generate an alignment between SNOMED CT terms and English DBPedia labels. Once 6 http://www.acss.min-saude.pt/Portals/0/ this alignment is generated, we can map SNOMED CT codes to ICD9CMOut2013.xlsx DBPedia URIs and then obtain the corresponding label for the 7 http://www.datasus.gov.br/cid10/V2008/cid10.htm Portuguese term by a simple lookup. 8 http://goo.gl/IX9mqT To obtain candidate translations for SNOMED CT terms, we 9 http://icpc2.danielpinto.net/ implemented four translation methods: 2 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes SNOMED CT Translation to Portuguese 1. Google Translate EN: the candidate translation into Portuguese is critical to making the medical information exchanged about of each English term in SNOMED CT is provided by the patients who travel internationally more accurate. In our evaluation, GoogleTranslate API service. we used the Common Terms translations as gold standard to 2. Google Translate ES: identical to the above, but the translation assess the accuracy of our translation approach. CPARA includes service uses the Spanish term as input. 191 codes and common terms of the US English distribution of SNOMED CT, and the corresponding CPARA codes and terms. In 3. ICD-9 Mapping: for a given SNOMED CT term in English, we the Spanish SNOMED CT distribution there are 192 terms mapped take the corresponding code and lookup the SNOMED CT to from these 191 codes (one code is mapped to two terms). ICD-9 mapping in the UMLS distribution to obtain the ICD-9 Evaluation of the translated SNOMED CT terms started with code and next the term description in the Portuguese version of candidate translations for the allergy-related SNOMED CT codes in ICD-9. This description becomes the candidate translation of CPARA generated by application of our method. We then evaluated the SNOMED CT term to Portuguese. the resulting set of translations against the ground truth composed 4. DBPedia Mapping: starting with a SNOMED CT term in by the corresponding CPARA terms as defined by the medical English, we lookup the code on the SNOMED CT to DBPedia committee that defined the mapping. To assess the accuracy of the mapping and, from there, obtain the available candidate evaluated translation methods, we scored each term translation by translation on the Portuguese DBPedia. the Jaro distance between the automatically translated term and the CPARA translation. The Jaro distance (JD) between two strings is 1 DBPedia is too big to be fully mapped in one batch with limited if the strings have the exact same number of characters and do not computing power, given the size of the ontologies involved. This have any transposition-10 . would make the time required by AgreementMakerLight to align Prior to computing Jaro distances all the translation candidates SNOMED CT with the full DBPedia prohibitive. However, it and CPARA translations were normalised: we removed any is unnecessary, given that most of DBPedia is irrelevant to the qualifiers from the SNOMED CT candidates, deleted quotes from clinical domain, to use the full DBPedia. We expect that our users, the CPARA translations, and converted all the named entities to domain experts in clinical specialisations, will select a batch of lowercase (e.g, “Moderate (severity modifier) (qualifier value)” SNOMED CT terms of their interest at a time and create/revise became “moderate” and “Contact metal agent (substance)” became the translations of the terms in that smaller set. For instance, to “contact metal agent”). These preparatory steps are necessary to identify a set of allergy-related DBPedia pages to be aligned with obtain meaningful similarity metrics, because these qualifiers are a set of SNOMED CT terms, we used the UNIX grep tool to filter common to many terms and can be translated independently. In out of the DBPedia ontology every page with a label not containing addition, the Jaro Distance considers the same letter in lowercase any of the words of the SNOMED CT terms. This resulted in a and uppercase forms as two distinct characters. The statistics of the size reduction from 2 GB to 12MB. To obtain the alignment with translations obtained with each of the four implemented methods DBPedia, we parameterized AgreementMakerLight to consider as described in the previous section are given in Table 1. In these aligned all pairs of terms with a Jaro Distance ≥ 0.5. statistics, we considered as valid the translations with JD = 1. The last step in our method involves the application of an ensemble learning algorithm (Dietterich, 2000). Each SNOMED CT term has a class label, provided as “qualifier” in the term description. Source AVG STDEV Method Coverage #Method For instance, the SNOMED CT term with code 158965000 Language JD JD has the term “Medical practitioner (occupation)”, from which GT EN 100% 191 0.78 0.22 we can separate the description “Medical practitioner” and class GT ES 114% 218 0.58 0.15 Occupation. Instead of choosing the best overall translation method, ICD 9 EN 10% 20 0.61 0.12 we identify the best translation method for each class, based on the DBPedia EN 37% 70 0.89 0.03 validated translations. As this number will increase over time, we Table 1. Global Results for all translation methods with the respective expect that ensemble learning will in the end improve the automatic average Jaro Distance (AVG JD) and Standard Deviation Jaro Distance translation process. However, given the small number of validated (STDEV JD). The implemented methods are Google Translate (GT), both and translated terms in Portuguese that we have at this time, we still from English (EN) and Spanish (ES) to Portuguese, ICD-9 Mapping (ICD lack reliable data to evaluate this step. 9), and DBPedia Mapping (DBPedia). All translations were attempted with two source languages, English (EN) and Spanish (ES). The number of terms translated by each method is given in the # Method column. 4 EVALUATION CPARA, Catálogo Português de Alergias e Reacções Adversas, is a list of terms related to Allergies and Adverse Reactions in use in the Portuguese National Health Service (SPMS, 2015). It was We observe that the SNOMED-DBPedia alignment obtains, for developed with the goal of unifying the classification for allergies a coverage of 37%, both the highest similarity (0.89) and lowest and adverse reactions in Portugal. Given the high levels of patient standard deviation (0.03). This shows that we have been able to mobility, physicians frequently need to know precisely which accurately translate a set of SNOMED CT terms to Portuguese, substances are known to affect an international patient. To address using basic alignment techniques, through the SNOMED CT to this need, CPARA terms have been mapped to SNOMED CT DBPEDIA alignment. However, the generation of translations terms by a group of experts. These experts also created European Portuguese translations of the SNOMED CT Common Terms and 10 The computation of the Jaro distances was made with the Python Jellyfish Fully Specified Names (FSN) in the CPARA catalog. This mapping library https://pypi.python.org/pypi/jellyfish Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 3 Silva, Chaves and Simões based on ontology alignments as proposed in this paper also has 5 CONCLUSIONS AND FUTURE WORK limitations. In particular, only a fraction of the translations can be SNOMED CT is increasingly prevalent in the heath care sector, obtained by this method, while Google Translate always proposed resulting from the increasing need to exchange medical records a translation. Our success with Portuguese may not be granted in mobile societies. There is also a growing general interest when aligning SNOMED CT with DBPedia in other languages with in accessing standardised machine-readable medical records for smaller Wikipedias. improving managed heath care and biomedical research. Google Translate EN showed better accuracy than Google We introduced a new methodology for translating SNOMED CT Translate ES. This result was not initially expected, because terms, which relies primarily on aligning large ontologies, Spanish and Portuguese are close languages. This may result complementing language-based methods that have been proposed from the CPARA terms being originally derived from the English before. We prototyped an initial implementation of this methodology, terminology. The number of translations obtained with Google which obtained high coverage and good accuracy, despite only using Translate ES is higher than the number of terms in the CPARA string matchers for SNOMED CT and DBPedia alignment along dataset (yielding the 114% coverage). This is the result of how with the domain-independent Google Translator. A translation was we have obtained the Spanish SNOMED CT candidate terms for considered valid when the expert mapping of an allergy-related translation. We started from the same initial SNOMED CT codes SNOMED CT term to Portuguese is identical to the obtained using that we used for the English translation and obtained the Spanish the SNOMED CT to DBPedia alignment. The accuracy under these codes matching the concept id and type id of the initial English settings was 37%. This shows that both the English and Portuguese terms. This generated a higher number of ES candidate terms to versions of DBPedia are rich and accurately interlink with medical translate (218) then the initial EN terms (191). terms. However, the results for Portuguese may not be indicative of To evaluate which translation method works best for each class how this method would perform on other languages. Portuguese is of SNOMED CT terms, we measure which translation method one of the top-10 Wikipedia languages in terms of the total number performs best in each class. This method is necessary to later model of entries. The coverage of the obtained translations depend on an ensemble learning stage that could pick the best method for how rich the Wikipedia for a target language is in covering clinical each class. To obtain the results, we divided CPARA in classes concepts and the extent to which these concepts are mapped to for translation purposes. These classes were extracted from the Wikipedia pages in languages for which a SNOMED CT translation qualifiers defined for the SNOMED full specified name terms. We exists. In addition, our validation experiment was confined to testing were interested in observing translation performance differences about 200 SNOMED CT Common Terms in the alergies and adverse across classes. To measure the differences, we calculated the reactions domain in European Portuguese. It is still unknown how similarity and standard deviation as above of all the translation comprehensive and accurate the English and Portuguese Wikipedias candidates in each class. The results are summarised in Table 2. are across the full clinical domain, and how this factor affects the The SNOMED-DBPedia alignment generates better translations accuracy of the SNOMED CT translations. for all classes, except Person and Qualifier Value. The poorer Some improvements can still be added to the software performance could, however, reflect that only a small number of implementing the presented translation method. For instance, related identified terms in the allergy domain have been identified the SNOMED CT to DBPedia alignment should explore the for both classes. defined semantic relationships between classes and terms in both Google TranslateES has better average similarity for the Person SNOMED CT and DBPedia. On the other hand, these relationships class than Google Translate EN. This shows that the SNOMED CT could be explored to generate accurate translations for untranslated translation from Spanish could benefit from using the Spanish terms in lexical methods to be provided. For this purpose, language language distribution for some CPARA translations. resources, such as WordNet, and parallel corpora of named entities, The translations obtained with the ICD-9 mapping translator are such as previously validated SNOMED CT translations, could be worse than obtained by Google Translate (for both languages). This used to learn how words and multi-word expressions should be results from ICD-9 being less comprehensive than SNOMED CT. properly translated. ICD codes mostly diseases, symptoms or causes of death. Therefore, The measured accuracy of our translation method could still be many of the CPARA terms in SNOMED CT were absent in the significantly increased without sacrificing the quality of translations, ICD-9 to SNOMED CT mapping. The results also indicate that, by relaxing the similarity threshold. The negative impacts of such as expected and observed with ICD-9, terminologies of narrower relaxation are negligible, given that the generated translations scope are not useful for translating clinical terms through ontology will always need to be validated by experts before used in a alignment. The ICD-9 mapping is much less successful than other clinical context. The expert-validation step presently relies on resources, such as DBPedia and Google Translate, which can the review of generated translations presented on spreadsheets. A provide much higher coverage of candidate translations, in many crowdsourcing platform could speed-up the process of creating and cases while retaining equal or better accuracy. The ICD-9 mapping maintaining a validated translation of SNOMED CT. Moreover, method generates a high amount of 1-to-many matchings. However, active learning could also be incorporated in the crowdsourcing ICD-9 could still be useful in cases where it generates only one platform, leading to fast improvement of the proposed translations matching description, which is usually very accurate and reliable, as the validated translation can also be used as input to generate attending that matchings between ICD and SNOMED CT and the good candidates (Ambati et al., 2010). resulting translations are validated by medical experts. A complementary assessment of the alignment approach proposed here could be obtained by applying it to the automatic translation with one of the existing released translations, e.g. 4 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes SNOMED CT Translation to Portuguese Spanish. However, given that we rely on alignments bewteen lexical Translation Source Class AVG JD STDEV JD resources we are not certain if the Wikipedia correspondences Technique Lang. between clinical term pages in Spanish and English have been Substance 0.82 0.19 created based on SNOMED CT. Observable 0.74 NA Entity ACKNOWLEDGEMENTS Product 0.96 0.01 Disorder 0.78 0.25 We thank Daniel Faria and the other members of the SOMER Occupation 1.00 0.00 project for help with running AgreementMakerLight and their Person 0.55 0.18 feedback. We also thank Dr. Anabela Santos for the help with the EN Qualifier CPARA translation of SNOMED CT to validate our tool, and Bruno 0.79 0.17 Value Martins for the pointers to previous works. This work was partially Finding 0.74 0.30 supported by Fundação para a Ciência e a Tecnologia (FCT), grants Event 1.00 NA PTDC/EIA-EIA/119119/2010 (SOMER), UID/CEC/50021/2013 Situation 0.71 NA and EXCL/EEI- ESS/0257/2012 (DataStorm). Organism 0.63 0.37 Google Severity REFERENCES 0.79 0.30 Translate Modifier Abdoune, H., Merabti, T., Darmoni, S. J., and Joubert, M. (2013). Assisting the Contextual translation of the core subset of snomed ct into french. Studies in health technology 0.83 0.15 Qualifier and informatics, 169, 819–823. DOI:10.3233/978-1-60750-806-9-819. No Qualifier 0.63 0.28 Ambati, V., Vogel, S., and Carbonell, J. (2010). Active learning and crowd-sourcing for machine translation. In Proceedings of the Seventh International Conference on Disorder 0.66 0.11 Language Resources and Evaluation (LREC’10). Substance 0.59 0.14 Cruz, I. F., Antonelli, F. P., and Stroe, C. (2009). Agreementmaker: Efficient matching Qualifier for large real-world schemas and ontologies. PVLDB, 2(2), 1586–1589. 0.58 0.11 Value Dietterich, T. (2000). Ensemble methods in machine learning. In Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer Contextual 0.56 0.08 Berlin Heidelberg. Qualifier Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I., and Couto, F. (2013). Organism 0.62 0.10 The agreement maker light ontology matching system. In On the Move to ES Person 0.57 0.09 Meaningful Internet Systems: OTM 2013 Conferences—Confederated International Occupation 0.67 0.04 Conferences, number 8185 in Lecture Notes in Computer Science, pages 527––541. Springer. Finding 0.67 0.12 IHTSDO (2012). Guidelines for Management of Translation of SNOMED CT. IHTSDO Situation 0.60 0.00 - International Health Terminology Standards Development Organisation. Observable Langlais, P., Yvon, F., and Zweigenbaum, P. (2008). Analogical translation of medical 0.64 0.00 Entity words in different languages. In B. Nordström and A. Ranta, editors, Advances in Natural Language Processing, volume 5221 of Lecture Notes in Computer Science, Product 0.57 0.03 pages 284–295. Springer Berlin Heidelberg. Severity 0.53 0.13 Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., Modifier Hellmann, S., Morsey, M., van Kleef, P., Auer, S., and Bizer, C. (2015). DBpedia - Event 0.35 NA a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web No Qualifier 0.50 0.28 Journal, 6(2), 167–195. Ling, W., Calado, P., Martins, B., Trancoso, I., Black, A., and Coheur, L. (2011). Disorder 0.60 0.12 ICD-9 EN Named entity translation using anchor texts. In The International Workshop on Finding 0.68 0.15 Spoken Language Translation (IWSLT). Disorder 0.92 0.04 Perez-de Viñaspre, O. and Oronoz, M. (2014). Translating snomed ct terminology into Substance 0.90 0.03 a minor language. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), pages 38–45, Gothenburg, Sweden. Qualifier 0.72 0.05 Association for Computational Linguistics. DBPedia Value EN Porter, E. H. and Winkler, W. E. (1997). Approximate string comparison and its effect Matching Event 1.00 NA on an advanced record linkage system. In Advanced Record Linkage System. U.S. Finding 0.99 0.00 Bureau of the Census, Research Report, pages 190–199. Schulz, S., Bernhardt-Melischnig, J., Kreuzthaler, M., Daumkea, P., and Boeker, M. Organism 0.82 0.09 (2013). Machine vs. human translation of SNOMED CT terms. In MEDINFO Person 0.48 NA 2013. No Qualifier 0.83 0.09 SPMS (2015). CPARA – catálogo português de alergias e outras reações adversas / Table 2. Scores for the different classes of SNOMED CT terms. AVG and portuguese catalogue of allergies and other adverse reactions. Technical Report STDEV JD column represent the averade and standard deviation of the Jaro V3.0, 09-03-2015, SPMS – Serviços Partilhados do Ministério da Saúde. http: Distance; NA indicates that STDEV cannot be obtained because there is //tinyurl.com/me5jhq7,http://tinyurl.com/lehlhaa. only one translation for the class. Stedman, T. L. (2003). Stedman’s English to Portuguese and Portuguese to English Medical Dictionary. French & European Pubns. ISBN 13: 9780785975281. Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 5