=Paper= {{Paper |id=Vol-1515/regular4 |storemode=property |title=An ontology-based approach for SNOMED CT translation |pdfUrl=https://ceur-ws.org/Vol-1515/regular4.pdf |volume=Vol-1515 |dblpUrl=https://dblp.org/rec/conf/icbo/SilvaCS15 }} ==An ontology-based approach for SNOMED CT translation== https://ceur-ws.org/Vol-1515/regular4.pdf
           An ontology-based approach for SNOMED CT translation
                                Mário J. Silva, Tiago Chaves and Bárbara Simões
                         Instituto Superior Técnico, Universidade de Lisboa and INESC-ID, Portugal




ABSTRACT                                                                    Portuguese as an assisting tool that could be used for the production
   SNOMED CT is a comprehensive multilingual class hierarchy of             of a future standard translation of SNOMED CT. We take the
medical terms used in clinical records. Few translations are available,     approach of using available classifications and automatic translation
but, as new concepts and revisions are continuously being added,            services as ontologies that can be aligned and later navigated to
the manual translation and revision of the terms will remain a major        provide the translations of such technical terms. In our method, we
endeavour. We propose a new approach for translating SNOMED CT              start by identifying existing alignments between SNOMED CT and
terms (or named entities) using ontology mapping methods and                other selected ontologies, including the releases of SNOMED CT
various existing multilingual resources with translated concepts. Our       in different languages. For the Portuguese translation, given its
purpose is generating initial candidate translations, already close         proximity to Spanish, many terms in the Spanish release of
to those proposed by medical experts, to be later used in a                 SNOMED CT have almost identical spelling. There are other
curated translation process. Our method for automatically translating       medical terminologies for which multiple translations have been
SNOMED CT is being developed for Portuguese, using DBPedia,                 published, such as ICD (International Classification of Diseases) 2 .
ICD-9 and Google Translate as sources of candidate translations of          Another major resource is DBPedia, an ontology derived from
the clinical terms, which could later be verified. Initial results, using   Wikipedia, which is very rich in medical terms (Lehmann et al.,
a manually translated Portuguese catalog of allergies and adverse           2015). After the collection of these multilingual ontologies and
reactions (CPARA) to SNOMED CT as ground truth, show that it has            published mappings between their terms, we derive additional
high potential.                                                             alignments using the ontology mapping algorithms implemented
                                                                            in AgreementMakerLight, a scalable automated ontology matching
1     INTRODUCTION                                                          system developed primarily for the life sciences domain Faria et al.
SNOMED Clinical Terms1 , or SNOMED CT, is a comprehensive                   (2013). To obtain correspondences between terms in two distinct
multilingual class hierarchy of terms used in clinical records,             languages, we can also explore online translation services, such as
with extensive overlapping and synonymous descriptions. The                 the Google Translate Service3 or Microsoft translator4 to generate
primary purpose of SNOMED CT is to encode the meanings                      additional mappings.
of the terminology used in health information, supporting the                  To show the feasibility of the above outlined approach for
effective clinical recording of data with the aim of improving              automatically generating translations of SNOMED CT terms to
patient care. SNOMED CT provides the core general terminology               Portuguese, we evaluated the translations obtained with the
for electronic health records. With about 300,000 active terms,             alignments against the translations of a set of SNOMED CT
SNOMED CT spans clinical findings, symptoms, diagnoses,                     terms that have been mapped by medical experts to terms of
procedures, body structures, organisms and other etiologies,                the Portuguese catalog of allergies and adverse reactions (SPMS,
substances, pharmaceuticals, devices and specimen.                          2015). The latest release of CPARA includes curated translations
   The need to interchange medical records across states is                 of SNOMED CT terms. The evaluation shows promising results.
demanding the development of faster methods to obtain approved,             The ontology-mapping translation method achieved an accuracy of
standards-based, translations of medical records, in particular             89% and coverage of 37% for the set of 191 terms on the translation
SNOMED CT. The standardisation of clinical terms and their                  of the CPARA vocabulary terms previously hand-mapped to
translations to other languages is very important for the unification       SNOMED CT (using case-insensitive string comparison).
of the electronic health records worldwide. However, the manual
translation and revision of the terms, synonyms and definitions to          2    RESOURCES AND RELATED WORK
a new language is a major endeavour. SNOMED CT is presently                 In our work, we used the the 01/2014 International (English)
available in US and UK English, Spanish, Danish and Swedish. It             distribution of SNOMED CT and the Spanish version dated from
is also being translated to several other languages, but there is no        April 2014, both provided by the NLM (National Library of
translation of SNOMED CT to Portuguese or an official initiative to         Medicine) institutional site 5 . The distribution also includes a
develop and maintain that translation. Hence, a tool to automatically       mapping between ICD-9, a WHO classification of diseases, and
translate SNOMED CT to Portuguese would assist in the production            SNOMED CT. This mapping can be used to link SNOMED CT
of a release to be validated and improved in a subsequent step at a
much lower cost than conducting the process manually.
   As new trannslations, concepts and revisions are continuously            2   http://www.who.int/classifications/icd/en/
being added, the manual translation and revision of the terms               3 https://translate.google.com/
will remain a major endeavour. This paper describes our work on             4 http://www.microsoft.com/translator/
the development of an automatic translator of SNOMED CT to                  translator-api.aspx
                                                                            5 http://www.nlm.nih.gov/research/umls/Snomed/
1   http://www.ihtsdo.org/snomed-ct                                         snomed_main.html



    Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                        1
Silva, Chaves and Simões



codes to the ICD-9 Portuguese terms in a translation provided by            reported in this paper only used the first layer algorithms to perform
the Portuguese Ministry of Health 6 .                                       the alignments.
   There are no comprehensive medical terminologies for European               Medical terms, like named entities in general, can matched using
Portuguese. In addition to ICD-9 in European Portuguese, ICD-               similarity metrics like the Jaro distance, initially proposed for record
10 has been manually translated to Brazilian Portuguese 7 .                 linkage systems (Porter and Winkler, 1997). The Jaro distance has
There is also an English to Brazilian Portuguese dictionary of              been used for the evaluation of automatic translations of named
medical terms(Stedman, 2003). However, there are a number                   entitites. It accounts the number of transpositions between two input
of terminological differences between these two variants of the             strings and also the number of different characters, resulting in a
language. Other terminologies, such as ICPC 8 , have been                   numeric distance in the [0, 1] range.
translated 9 , but they have a much narrower scope than ICD.
   In computing, the translation of a terminology, such as the              3     SNOMED CT TRANSLATION
set of SNOMED CT terms, is an instance of a common task                     Given that SNOMED CT is mostly used to provide terminology
in Natural Language Processing (NLP) , designated as Named                  for electronic health records, the risks of using an automatically
Entity Translation Ling et al. (2011). The task is formulated as            generated translation of such large collection of terms without
the problem of, given a set of labels (named entities) in a source          expert validation are unacceptable. In fact, the SNOMED publisher
language, obtaining the translations of these entities in a target          provides detailed guidelines for validating the translations made
language. Langlais et al. (2008) researched the translation of              by medical experts for the official translations available (IHTSDO,
medical terms using a bilingual lexicon. Recently,Abdoune et al.            2012). However, we believe that, if the initial quality of the
(2013) performed an automatic translation of the CORE subset of             automatically generated translation is high, we could later validate
SNOMED CT to French by mapping this subset to four French-                  such candidate translations through a crowdsourcing activity, as
translated terminologies integrated in the UMLS Metathesaurus:              experimented by Schulz et al. (2013).
SNOMED international, ICD10, MedDRA and MeSH. They were
able to map 89% of the preferred terms of the CORE Subset of
SNOMED CT with at least one preferred term in one of the four
terminologies.
   Other approaches for generating translations have been attempted.
Algorithms based on linguistic rules are particularly useful for
languages which are poor in language resources, like a recently
proposed Basque semi-automatic translation of SNOMED CT (Perez-
de Viñaspre and Oronoz, 2014). The algorithm takes an incremental
approach: first a lexical translation is attempted; then if a translation
is not found, generation/transcription-rules for terms, or chunk-level
generation to translate a term token by token are used; finally, a
rule-based automatic translation system is used to find a translation.
                                                                                 Fig. 1. The translation of SNOMED CT is preceded by a data staging
   In this work, we explore DBPedia, an ontology derived from
                                                                                  phase. Once the data is prepared, translation is carried out using the
Wikipedia, as an alternative source of term translations (Lehmann               implemented methods. We select the best translation candidate using an
et al., 2015). We apply ontology matching methods to align                       ensemble model trained that selects the best method for each class of
DBPedia and SNOMED CT, along with other web-based services,                               SNOMED CT terms, based on known translations
like Google Translate. The DBPedia is a potentially rich resource
for medical terms mappings, given that the English and Portuguese
Wikipedias are among the largest. To map these ontologies we used              Our approach for generating the translations of SNOMED CT
AgreementMakerLight, an ontology matching system developed to               terms into Portuguese is illustrated in Figure 1. We start by
tackle large ontology matching problems, and focused in particular          organising two mappings:
on the biomedical domain (Faria et al., 2013). This system can                1. SNOMED CT to ICD-9: a correspondence between the codes
handle the mapping of very large ontologies, as it is the case                   of SNOMED CT and codes and descriptions of ICD-9.
with SNOMED CT and DBPedia. AgreementMakerLight is derived                    2. SNOMED CT to DBPEDIA: a correspondence between
from the AgreementMaker ontology matching systems (Cruz et al.,                  SNOMED CT codes and DBPedia (English and Portuguese)
2009). The alignments produced by AgreementMaker combine                         page URIs, and associated page titles.
multiple matching algorithms, in three layers: the first layer uses            The first mapping is derived the SNOMED CT to ICD-9
string matching methods to identify similar labels, the second              mapping included in the from the UMLS distribution, which
matches ontology structures, and the third layer combines the results       includes the correspondence between SNOMED CT and ICD-
from the matchers in the first two layers. The initial experiments          9 codes. For the second mapping, the matching algorithms
                                                                            implemented in AgreementMakerLight can generate an alignment
                                                                            between SNOMED CT terms and English DBPedia labels. Once
6 http://www.acss.min-saude.pt/Portals/0/                                   this alignment is generated, we can map SNOMED CT codes to
ICD9CMOut2013.xlsx                                                          DBPedia URIs and then obtain the corresponding label for the
7 http://www.datasus.gov.br/cid10/V2008/cid10.htm
                                                                            Portuguese term by a simple lookup.
8 http://goo.gl/IX9mqT
                                                                               To obtain candidate translations for SNOMED CT terms, we
9 http://icpc2.danielpinto.net/                                             implemented four translation methods:


2                            Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes
                                                                                                        SNOMED CT Translation to Portuguese



    1. Google Translate EN: the candidate translation into Portuguese      is critical to making the medical information exchanged about
       of each English term in SNOMED CT is provided by the                patients who travel internationally more accurate. In our evaluation,
       GoogleTranslate API service.                                        we used the Common Terms translations as gold standard to
    2. Google Translate ES: identical to the above, but the translation    assess the accuracy of our translation approach. CPARA includes
       service uses the Spanish term as input.                             191 codes and common terms of the US English distribution of
                                                                           SNOMED CT, and the corresponding CPARA codes and terms. In
    3. ICD-9 Mapping: for a given SNOMED CT term in English, we
                                                                           the Spanish SNOMED CT distribution there are 192 terms mapped
       take the corresponding code and lookup the SNOMED CT to
                                                                           from these 191 codes (one code is mapped to two terms).
       ICD-9 mapping in the UMLS distribution to obtain the ICD-9
                                                                              Evaluation of the translated SNOMED CT terms started with
       code and next the term description in the Portuguese version of
                                                                           candidate translations for the allergy-related SNOMED CT codes in
       ICD-9. This description becomes the candidate translation of
                                                                           CPARA generated by application of our method. We then evaluated
       the SNOMED CT term to Portuguese.
                                                                           the resulting set of translations against the ground truth composed
    4. DBPedia Mapping: starting with a SNOMED CT term in                  by the corresponding CPARA terms as defined by the medical
       English, we lookup the code on the SNOMED CT to DBPedia             committee that defined the mapping. To assess the accuracy of the
       mapping and, from there, obtain the available candidate             evaluated translation methods, we scored each term translation by
       translation on the Portuguese DBPedia.                              the Jaro distance between the automatically translated term and the
                                                                           CPARA translation. The Jaro distance (JD) between two strings is 1
   DBPedia is too big to be fully mapped in one batch with limited
                                                                           if the strings have the exact same number of characters and do not
computing power, given the size of the ontologies involved. This
                                                                           have any transposition-10 .
would make the time required by AgreementMakerLight to align
                                                                              Prior to computing Jaro distances all the translation candidates
SNOMED CT with the full DBPedia prohibitive. However, it
                                                                           and CPARA translations were normalised: we removed any
is unnecessary, given that most of DBPedia is irrelevant to the
                                                                           qualifiers from the SNOMED CT candidates, deleted quotes from
clinical domain, to use the full DBPedia. We expect that our users,
                                                                           the CPARA translations, and converted all the named entities to
domain experts in clinical specialisations, will select a batch of
                                                                           lowercase (e.g, “Moderate (severity modifier) (qualifier value)”
SNOMED CT terms of their interest at a time and create/revise
                                                                           became “moderate” and “Contact metal agent (substance)” became
the translations of the terms in that smaller set. For instance, to
                                                                           “contact metal agent”). These preparatory steps are necessary to
identify a set of allergy-related DBPedia pages to be aligned with
                                                                           obtain meaningful similarity metrics, because these qualifiers are
a set of SNOMED CT terms, we used the UNIX grep tool to filter
                                                                           common to many terms and can be translated independently. In
out of the DBPedia ontology every page with a label not containing
                                                                           addition, the Jaro Distance considers the same letter in lowercase
any of the words of the SNOMED CT terms. This resulted in a
                                                                           and uppercase forms as two distinct characters. The statistics of the
size reduction from 2 GB to 12MB. To obtain the alignment with
                                                                           translations obtained with each of the four implemented methods
DBPedia, we parameterized AgreementMakerLight to consider as
                                                                           described in the previous section are given in Table 1. In these
aligned all pairs of terms with a Jaro Distance ≥ 0.5.
                                                                           statistics, we considered as valid the translations with JD = 1.
   The last step in our method involves the application of an
ensemble learning algorithm (Dietterich, 2000). Each SNOMED CT
term has a class label, provided as “qualifier” in the term description.                  Source                                  AVG      STDEV
                                                                                Method                 Coverage      #Method
For instance, the SNOMED CT term with code 158965000                                     Language                                  JD        JD
has the term “Medical practitioner (occupation)”, from which                  GT           EN             100%          191       0.78      0.22
we can separate the description “Medical practitioner” and class              GT            ES            114%          218       0.58      0.15
Occupation. Instead of choosing the best overall translation method,         ICD 9         EN              10%          20        0.61      0.12
we identify the best translation method for each class, based on the        DBPedia        EN              37%          70        0.89      0.03
validated translations. As this number will increase over time, we            Table 1. Global Results for all translation methods with the respective
expect that ensemble learning will in the end improve the automatic          average Jaro Distance (AVG JD) and Standard Deviation Jaro Distance
translation process. However, given the small number of validated           (STDEV JD). The implemented methods are Google Translate (GT), both
and translated terms in Portuguese that we have at this time, we still     from English (EN) and Spanish (ES) to Portuguese, ICD-9 Mapping (ICD
lack reliable data to evaluate this step.                                  9), and DBPedia Mapping (DBPedia). All translations were attempted with
                                                                              two source languages, English (EN) and Spanish (ES). The number of
                                                                                terms translated by each method is given in the # Method column.
4     EVALUATION
CPARA, Catálogo Português de Alergias e Reacções Adversas, is
a list of terms related to Allergies and Adverse Reactions in use
in the Portuguese National Health Service (SPMS, 2015). It was                We observe that the SNOMED-DBPedia alignment obtains, for
developed with the goal of unifying the classification for allergies       a coverage of 37%, both the highest similarity (0.89) and lowest
and adverse reactions in Portugal. Given the high levels of patient        standard deviation (0.03). This shows that we have been able to
mobility, physicians frequently need to know precisely which               accurately translate a set of SNOMED CT terms to Portuguese,
substances are known to affect an international patient. To address        using basic alignment techniques, through the SNOMED CT to
this need, CPARA terms have been mapped to SNOMED CT                       DBPEDIA alignment. However, the generation of translations
terms by a group of experts. These experts also created European
Portuguese translations of the SNOMED CT Common Terms and                  10  The computation of the Jaro distances was made with the Python Jellyfish
Fully Specified Names (FSN) in the CPARA catalog. This mapping             library https://pypi.python.org/pypi/jellyfish



 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                                 3
Silva, Chaves and Simões



based on ontology alignments as proposed in this paper also has          5   CONCLUSIONS AND FUTURE WORK
limitations. In particular, only a fraction of the translations can be   SNOMED CT is increasingly prevalent in the heath care sector,
obtained by this method, while Google Translate always proposed          resulting from the increasing need to exchange medical records
a translation. Our success with Portuguese may not be granted            in mobile societies. There is also a growing general interest
when aligning SNOMED CT with DBPedia in other languages with             in accessing standardised machine-readable medical records for
smaller Wikipedias.                                                      improving managed heath care and biomedical research.
   Google Translate EN showed better accuracy than Google                   We introduced a new methodology for translating SNOMED CT
Translate ES. This result was not initially expected, because            terms, which relies primarily on aligning large ontologies,
Spanish and Portuguese are close languages. This may result              complementing language-based methods that have been proposed
from the CPARA terms being originally derived from the English           before. We prototyped an initial implementation of this methodology,
terminology. The number of translations obtained with Google             which obtained high coverage and good accuracy, despite only using
Translate ES is higher than the number of terms in the CPARA             string matchers for SNOMED CT and DBPedia alignment along
dataset (yielding the 114% coverage). This is the result of how          with the domain-independent Google Translator. A translation was
we have obtained the Spanish SNOMED CT candidate terms for               considered valid when the expert mapping of an allergy-related
translation. We started from the same initial SNOMED CT codes            SNOMED CT term to Portuguese is identical to the obtained using
that we used for the English translation and obtained the Spanish        the SNOMED CT to DBPedia alignment. The accuracy under these
codes matching the concept id and type id of the initial English         settings was 37%. This shows that both the English and Portuguese
terms. This generated a higher number of ES candidate terms to           versions of DBPedia are rich and accurately interlink with medical
translate (218) then the initial EN terms (191).                         terms. However, the results for Portuguese may not be indicative of
   To evaluate which translation method works best for each class        how this method would perform on other languages. Portuguese is
of SNOMED CT terms, we measure which translation method                  one of the top-10 Wikipedia languages in terms of the total number
performs best in each class. This method is necessary to later model     of entries. The coverage of the obtained translations depend on
an ensemble learning stage that could pick the best method for           how rich the Wikipedia for a target language is in covering clinical
each class. To obtain the results, we divided CPARA in classes           concepts and the extent to which these concepts are mapped to
for translation purposes. These classes were extracted from the          Wikipedia pages in languages for which a SNOMED CT translation
qualifiers defined for the SNOMED full specified name terms. We          exists. In addition, our validation experiment was confined to testing
were interested in observing translation performance differences         about 200 SNOMED CT Common Terms in the alergies and adverse
across classes. To measure the differences, we calculated the            reactions domain in European Portuguese. It is still unknown how
similarity and standard deviation as above of all the translation        comprehensive and accurate the English and Portuguese Wikipedias
candidates in each class. The results are summarised in Table 2.         are across the full clinical domain, and how this factor affects the
   The SNOMED-DBPedia alignment generates better translations            accuracy of the SNOMED CT translations.
for all classes, except Person and Qualifier Value. The poorer              Some improvements can still be added to the software
performance could, however, reflect that only a small number of          implementing the presented translation method. For instance,
related identified terms in the allergy domain have been identified      the SNOMED CT to DBPedia alignment should explore the
for both classes.                                                        defined semantic relationships between classes and terms in both
   Google TranslateES has better average similarity for the Person       SNOMED CT and DBPedia. On the other hand, these relationships
class than Google Translate EN. This shows that the SNOMED CT            could be explored to generate accurate translations for untranslated
translation from Spanish could benefit from using the Spanish            terms in lexical methods to be provided. For this purpose, language
language distribution for some CPARA translations.                       resources, such as WordNet, and parallel corpora of named entities,
   The translations obtained with the ICD-9 mapping translator are       such as previously validated SNOMED CT translations, could be
worse than obtained by Google Translate (for both languages). This       used to learn how words and multi-word expressions should be
results from ICD-9 being less comprehensive than SNOMED CT.              properly translated.
ICD codes mostly diseases, symptoms or causes of death. Therefore,          The measured accuracy of our translation method could still be
many of the CPARA terms in SNOMED CT were absent in the                  significantly increased without sacrificing the quality of translations,
ICD-9 to SNOMED CT mapping. The results also indicate that,              by relaxing the similarity threshold. The negative impacts of such
as expected and observed with ICD-9, terminologies of narrower           relaxation are negligible, given that the generated translations
scope are not useful for translating clinical terms through ontology     will always need to be validated by experts before used in a
alignment. The ICD-9 mapping is much less successful than other          clinical context. The expert-validation step presently relies on
resources, such as DBPedia and Google Translate, which can               the review of generated translations presented on spreadsheets. A
provide much higher coverage of candidate translations, in many          crowdsourcing platform could speed-up the process of creating and
cases while retaining equal or better accuracy. The ICD-9 mapping        maintaining a validated translation of SNOMED CT. Moreover,
method generates a high amount of 1-to-many matchings. However,          active learning could also be incorporated in the crowdsourcing
ICD-9 could still be useful in cases where it generates only one         platform, leading to fast improvement of the proposed translations
matching description, which is usually very accurate and reliable,       as the validated translation can also be used as input to generate
attending that matchings between ICD and SNOMED CT and the               good candidates (Ambati et al., 2010).
resulting translations are validated by medical experts.                    A complementary assessment of the alignment approach
                                                                         proposed here could be obtained by applying it to the automatic
                                                                         translation with one of the existing released translations, e.g.



4                           Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes
                                                                                                                      SNOMED CT Translation to Portuguese



Spanish. However, given that we rely on alignments bewteen lexical                           Translation    Source
                                                                                                                       Class           AVG JD      STDEV JD
resources we are not certain if the Wikipedia correspondences                                Technique      Lang.
between clinical term pages in Spanish and English have been                                                           Substance         0.82          0.19
created based on SNOMED CT.                                                                                            Observable
                                                                                                                                         0.74          NA
                                                                                                                          Entity
ACKNOWLEDGEMENTS                                                                                                       Product           0.96          0.01
                                                                                                                       Disorder          0.78          0.25
We thank Daniel Faria and the other members of the SOMER                                                               Occupation        1.00          0.00
project for help with running AgreementMakerLight and their                                                            Person            0.55          0.18
feedback. We also thank Dr. Anabela Santos for the help with the                                              EN
                                                                                                                       Qualifier
CPARA translation of SNOMED CT to validate our tool, and Bruno                                                                           0.79          0.17
                                                                                                                        Value
Martins for the pointers to previous works. This work was partially                                                    Finding           0.74          0.30
supported by Fundação para a Ciência e a Tecnologia (FCT), grants                                                   Event             1.00          NA
PTDC/EIA-EIA/119119/2010 (SOMER), UID/CEC/50021/2013                                                                   Situation         0.71          NA
and EXCL/EEI- ESS/0257/2012 (DataStorm).                                                                               Organism          0.63          0.37
                                                                                               Google                  Severity
REFERENCES                                                                                                                               0.79          0.30
                                                                                              Translate                Modifier
Abdoune, H., Merabti, T., Darmoni, S. J., and Joubert, M. (2013). Assisting the                                        Contextual
   translation of the core subset of snomed ct into french. Studies in health technology                                                 0.83          0.15
                                                                                                                        Qualifier
   and informatics, 169, 819–823. DOI:10.3233/978-1-60750-806-9-819.
                                                                                                                       No Qualifier      0.63          0.28
Ambati, V., Vogel, S., and Carbonell, J. (2010). Active learning and crowd-sourcing
   for machine translation. In Proceedings of the Seventh International Conference on                                  Disorder          0.66          0.11
   Language Resources and Evaluation (LREC’10).                                                                        Substance         0.59          0.14
Cruz, I. F., Antonelli, F. P., and Stroe, C. (2009). Agreementmaker: Efficient matching                                Qualifier
   for large real-world schemas and ontologies. PVLDB, 2(2), 1586–1589.                                                                  0.58          0.11
                                                                                                                        Value
Dietterich, T. (2000). Ensemble methods in machine learning. In Multiple Classifier
   Systems, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer
                                                                                                                       Contextual
                                                                                                                                         0.56          0.08
   Berlin Heidelberg.                                                                                                   Qualifier
Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I., and Couto, F. (2013).                                    Organism          0.62          0.10
   The agreement maker light ontology matching system. In On the Move to                                      ES
                                                                                                                       Person            0.57          0.09
   Meaningful Internet Systems: OTM 2013 Conferences—Confederated International
                                                                                                                       Occupation        0.67          0.04
   Conferences, number 8185 in Lecture Notes in Computer Science, pages 527––541.
   Springer.                                                                                                           Finding           0.67          0.12
IHTSDO (2012). Guidelines for Management of Translation of SNOMED CT. IHTSDO                                           Situation         0.60          0.00
   - International Health Terminology Standards Development Organisation.                                              Observable
Langlais, P., Yvon, F., and Zweigenbaum, P. (2008). Analogical translation of medical                                                    0.64          0.00
                                                                                                                          Entity
   words in different languages. In B. Nordström and A. Ranta, editors, Advances in
   Natural Language Processing, volume 5221 of Lecture Notes in Computer Science,
                                                                                                                       Product           0.57          0.03
   pages 284–295. Springer Berlin Heidelberg.                                                                          Severity
                                                                                                                                         0.53          0.13
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N.,                                       Modifier
   Hellmann, S., Morsey, M., van Kleef, P., Auer, S., and Bizer, C. (2015). DBpedia -                                  Event             0.35          NA
   a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web
                                                                                                                       No Qualifier      0.50          0.28
   Journal, 6(2), 167–195.
Ling, W., Calado, P., Martins, B., Trancoso, I., Black, A., and Coheur, L. (2011).                                     Disorder          0.60          0.12
                                                                                                ICD-9         EN
   Named entity translation using anchor texts. In The International Workshop on                                       Finding           0.68          0.15
   Spoken Language Translation (IWSLT).                                                                                Disorder          0.92          0.04
Perez-de Viñaspre, O. and Oronoz, M. (2014). Translating snomed ct terminology into
                                                                                                                       Substance         0.90          0.03
   a minor language. In Proceedings of the 5th International Workshop on Health
   Text Mining and Information Analysis (Louhi), pages 38–45, Gothenburg, Sweden.                                      Qualifier
                                                                                                                                         0.72          0.05
   Association for Computational Linguistics.                                                 DBPedia                   Value
                                                                                                              EN
Porter, E. H. and Winkler, W. E. (1997). Approximate string comparison and its effect         Matching                 Event             1.00          NA
   on an advanced record linkage system. In Advanced Record Linkage System. U.S.                                       Finding           0.99          0.00
   Bureau of the Census, Research Report, pages 190–199.
Schulz, S., Bernhardt-Melischnig, J., Kreuzthaler, M., Daumkea, P., and Boeker, M.
                                                                                                                       Organism          0.82          0.09
   (2013). Machine vs. human translation of SNOMED CT terms. In MEDINFO                                                Person            0.48          NA
   2013.                                                                                                               No Qualifier      0.83          0.09
SPMS (2015). CPARA – catálogo português de alergias e outras reações adversas /        Table 2. Scores for the different classes of SNOMED CT terms. AVG and
   portuguese catalogue of allergies and other adverse reactions. Technical Report         STDEV JD column represent the averade and standard deviation of the Jaro
   V3.0, 09-03-2015, SPMS – Serviços Partilhados do Ministério da Saúde. http:
                                                                                            Distance; NA indicates that STDEV cannot be obtained because there is
   //tinyurl.com/me5jhq7,http://tinyurl.com/lehlhaa.
                                                                                                               only one translation for the class.
Stedman, T. L. (2003). Stedman’s English to Portuguese and Portuguese to English
   Medical Dictionary. French & European Pubns. ISBN 13: 9780785975281.




 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                                             5