<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An ontology-based approach for SNOMED CT translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M a´rio J. Silva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiago Chaves</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B a´rbara Simo˜ es</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Superior T e ́cnico, Universidade de Lisboa and INESC-ID</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>SNOMED CT is a comprehensive multilingual class hierarchy of medical terms used in clinical records. Few translations are available, but, as new concepts and revisions are continuously being added, the manual translation and revision of the terms will remain a major endeavour. We propose a new approach for translating SNOMED CT terms (or named entities) using ontology mapping methods and various existing multilingual resources with translated concepts. Our purpose is generating initial candidate translations, already close to those proposed by medical experts, to be later used in a curated translation process. Our method for automatically translating SNOMED CT is being developed for Portuguese, using DBPedia, ICD-9 and Google Translate as sources of candidate translations of the clinical terms, which could later be verified. Initial results, using a manually translated Portuguese catalog of allergies and adverse reactions (CPARA) to SNOMED CT as ground truth, show that it has high potential.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>SNOMED Clinical Terms1, or SNOMED CT, is a comprehensive
multilingual class hierarchy of terms used in clinical records,
with extensive overlapping and synonymous descriptions. The
primary purpose of SNOMED CT is to encode the meanings
of the terminology used in health information, supporting the
effective clinical recording of data with the aim of improving
patient care. SNOMED CT provides the core general terminology
for electronic health records. With about 300,000 active terms,
SNOMED CT spans clinical findings, symptoms, diagnoses,
procedures, body structures, organisms and other etiologies,
substances, pharmaceuticals, devices and specimen.</p>
      <p>The need to interchange medical records across states is
demanding the development of faster methods to obtain approved,
standards-based, translations of medical records, in particular
SNOMED CT. The standardisation of clinical terms and their
translations to other languages is very important for the unification
of the electronic health records worldwide. However, the manual
translation and revision of the terms, synonyms and definitions to
a new language is a major endeavour. SNOMED CT is presently
available in US and UK English, Spanish, Danish and Swedish. It
is also being translated to several other languages, but there is no
translation of SNOMED CT to Portuguese or an official initiative to
develop and maintain that translation. Hence, a tool to automatically
translate SNOMED CT to Portuguese would assist in the production
of a release to be validated and improved in a subsequent step at a
much lower cost than conducting the process manually.</p>
      <p>As new trannslations, concepts and revisions are continuously
being added, the manual translation and revision of the terms
will remain a major endeavour. This paper describes our work on
the development of an automatic translator of SNOMED CT to</p>
      <sec id="sec-1-1">
        <title>1 http://www.ihtsdo.org/snomed-ct</title>
        <p>
          Portuguese as an assisting tool that could be used for the production
of a future standard translation of SNOMED CT. We take the
approach of using available classifications and automatic translation
services as ontologies that can be aligned and later navigated to
provide the translations of such technical terms. In our method, we
start by identifying existing alignments between SNOMED CT and
other selected ontologies, including the releases of SNOMED CT
in different languages. For the Portuguese translation, given its
proximity to Spanish, many terms in the Spanish release of
SNOMED CT have almost identical spelling. There are other
medical terminologies for which multiple translations have been
published, such as ICD (International Classification of Diseases) 2.
Another major resource is DBPedia, an ontology derived from
Wikipedia, which is very rich in medical terms
          <xref ref-type="bibr" rid="ref8">(Lehmann et al.,
2015)</xref>
          . After the collection of these multilingual ontologies and
published mappings between their terms, we derive additional
alignments using the ontology mapping algorithms implemented
in AgreementMakerLight, a scalable automated ontology matching
system developed primarily for the life sciences domain
          <xref ref-type="bibr" rid="ref5">Faria et al.
(2013)</xref>
          . To obtain correspondences between terms in two distinct
languages, we can also explore online translation services, such as
the Google Translate Service3 or Microsoft translator4 to generate
additional mappings.
        </p>
        <p>
          To show the feasibility of the above outlined approach for
automatically generating translations of SNOMED CT terms to
Portuguese, we evaluated the translations obtained with the
alignments against the translations of a set of SNOMED CT
terms that have been mapped by medical experts to terms of
the Portuguese catalog of allergies and adverse reactions
          <xref ref-type="bibr" rid="ref13">(SPMS,
2015)</xref>
          . The latest release of CPARA includes curated translations
of SNOMED CT terms. The evaluation shows promising results.
The ontology-mapping translation method achieved an accuracy of
89% and coverage of 37% for the set of 191 terms on the translation
of the CPARA vocabulary terms previously hand-mapped to
SNOMED CT (using case-insensitive string comparison).
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>RESOURCES AND RELATED WORK</title>
      <p>In our work, we used the the 01/2014 International (English)
distribution of SNOMED CT and the Spanish version dated from
April 2014, both provided by the NLM (National Library of
Medicine) institutional site 5. The distribution also includes a
mapping between ICD-9, a WHO classification of diseases, and
SNOMED CT. This mapping can be used to link SNOMED CT</p>
      <sec id="sec-2-1">
        <title>2 http://www.who.int/classifications/icd/en/</title>
      </sec>
      <sec id="sec-2-2">
        <title>3 https://translate.google.com/</title>
      </sec>
      <sec id="sec-2-3">
        <title>4 http://www.microsoft.com/translator/</title>
        <p>translator-api.aspx</p>
      </sec>
      <sec id="sec-2-4">
        <title>5 http://www.nlm.nih.gov/research/umls/Snomed/</title>
        <p>snomed_main.html
codes to the ICD-9 Portuguese terms in a translation provided by
the Portuguese Ministry of Health 6.</p>
        <p>
          There are no comprehensive medical terminologies for European
Portuguese. In addition to ICD-9 in European Portuguese,
ICD10 has been manually translated to Brazilian Portuguese 7.
There is also an English to Brazilian Portuguese dictionary of
medical terms
          <xref ref-type="bibr" rid="ref14">(Stedman, 2003)</xref>
          . However, there are a number
of terminological differences between these two variants of the
language. Other terminologies, such as ICPC 8, have been
translated 9, but they have a much narrower scope than ICD.
        </p>
        <p>
          In computing, the translation of a terminology, such as the
set of SNOMED CT terms, is an instance of a common task
in Natural Language Processing (NLP) , designated as Named
Entity Translation
          <xref ref-type="bibr" rid="ref9">Ling et al. (2011)</xref>
          . The task is formulated as
the problem of, given a set of labels (named entities) in a source
language, obtaining the translations of these entities in a target
language.
          <xref ref-type="bibr" rid="ref7">Langlais et al. (2008)</xref>
          researched the translation of
medical terms using a bilingual lexicon. Recently,
          <xref ref-type="bibr" rid="ref1">Abdoune et al.
(2013</xref>
          ) performed an automatic translation of the CORE subset of
SNOMED CT to French by mapping this subset to four
Frenchtranslated terminologies integrated in the UMLS Metathesaurus:
SNOMED international, ICD10, MedDRA and MeSH. They were
able to map 89% of the preferred terms of the CORE Subset of
SNOMED CT with at least one preferred term in one of the four
terminologies.
        </p>
        <p>
          Other approaches for generating translations have been attempted.
Algorithms based on linguistic rules are particularly useful for
languages which are poor in language resources, like a recently
proposed Basque semi-automatic translation of SNOMED CT
          <xref ref-type="bibr" rid="ref10">(Perezde Vin˜aspre and Oronoz, 2014)</xref>
          . The algorithm takes an incremental
approach: first a lexical translation is attempted; then if a translation
is not found, generation/transcription-rules for terms, or chunk-level
generation to translate a term token by token are used; finally, a
rule-based automatic translation system is used to find a translation.
        </p>
        <p>
          In this work, we explore DBPedia, an ontology derived from
Wikipedia, as an alternative source of term translations
          <xref ref-type="bibr" rid="ref8">(Lehmann
et al., 2015)</xref>
          . We apply ontology matching methods to align
DBPedia and SNOMED CT, along with other web-based services,
like Google Translate. The DBPedia is a potentially rich resource
for medical terms mappings, given that the English and Portuguese
Wikipedias are among the largest. To map these ontologies we used
AgreementMakerLight, an ontology matching system developed to
tackle large ontology matching problems, and focused in particular
on the biomedical domain
          <xref ref-type="bibr" rid="ref5">(Faria et al., 2013)</xref>
          . This system can
handle the mapping of very large ontologies, as it is the case
with SNOMED CT and DBPedia. AgreementMakerLight is derived
from the AgreementMaker ontology matching systems
          <xref ref-type="bibr" rid="ref3">(Cruz et al.,
2009)</xref>
          . The alignments produced by AgreementMaker combine
multiple matching algorithms, in three layers: the first layer uses
string matching methods to identify similar labels, the second
matches ontology structures, and the third layer combines the results
from the matchers in the first two layers. The initial experiments
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>6 http://www.acss.min-saude.pt/Portals/0/</title>
        <p>ICD9CMOut2013.xlsx</p>
      </sec>
      <sec id="sec-2-6">
        <title>7 http://www.datasus.gov.br/cid10/V2008/cid10.htm</title>
      </sec>
      <sec id="sec-2-7">
        <title>8 http://goo.gl/IX9mqT</title>
      </sec>
      <sec id="sec-2-8">
        <title>9 http://icpc2.danielpinto.net/</title>
        <p>reported in this paper only used the first layer algorithms to perform
the alignments.</p>
        <p>
          Medical terms, like named entities in general, can matched using
similarity metrics like the Jaro distance, initially proposed for record
linkage systems
          <xref ref-type="bibr" rid="ref11">(Porter and Winkler, 1997)</xref>
          . The Jaro distance has
been used for the evaluation of automatic translations of named
entitites. It accounts the number of transpositions between two input
strings and also the number of different characters, resulting in a
numeric distance in the [0; 1] range.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>SNOMED CT TRANSLATION</title>
      <p>
        Given that SNOMED CT is mostly used to provide terminology
for electronic health records, the risks of using an automatically
generated translation of such large collection of terms without
expert validation are unacceptable. In fact, the SNOMED publisher
provides detailed guidelines for validating the translations made
by medical experts for the official translations available
        <xref ref-type="bibr" rid="ref6">(IHTSDO,
2012)</xref>
        . However, we believe that, if the initial quality of the
automatically generated translation is high, we could later validate
such candidate translations through a crowdsourcing activity, as
experimented by
        <xref ref-type="bibr" rid="ref12">Schulz et al. (2013)</xref>
        .
      </p>
      <p>Our approach for generating the translations of SNOMED CT
terms into Portuguese is illustrated in Figure 1. We start by
organising two mappings:
1. SNOMED CT to ICD-9: a correspondence between the codes
of SNOMED CT and codes and descriptions of ICD-9.
2. SNOMED CT to DBPEDIA: a correspondence between
SNOMED CT codes and DBPedia (English and Portuguese)
page URIs, and associated page titles.</p>
      <p>The first mapping is derived the SNOMED CT to ICD-9
mapping included in the from the UMLS distribution, which
includes the correspondence between SNOMED CT and
ICD9 codes. For the second mapping, the matching algorithms
implemented in AgreementMakerLight can generate an alignment
between SNOMED CT terms and English DBPedia labels. Once
this alignment is generated, we can map SNOMED CT codes to
DBPedia URIs and then obtain the corresponding label for the
Portuguese term by a simple lookup.</p>
      <p>To obtain candidate translations for SNOMED CT terms, we
implemented four translation methods:
1. Google Translate EN: the candidate translation into Portuguese
of each English term in SNOMED CT is provided by the
GoogleTranslate API service.
2. Google Translate ES: identical to the above, but the translation
service uses the Spanish term as input.
3. ICD-9 Mapping: for a given SNOMED CT term in English, we
take the corresponding code and lookup the SNOMED CT to
ICD-9 mapping in the UMLS distribution to obtain the ICD-9
code and next the term description in the Portuguese version of
ICD-9. This description becomes the candidate translation of
the SNOMED CT term to Portuguese.
4. DBPedia Mapping: starting with a SNOMED CT term in
English, we lookup the code on the SNOMED CT to DBPedia
mapping and, from there, obtain the available candidate
translation on the Portuguese DBPedia.</p>
      <p>DBPedia is too big to be fully mapped in one batch with limited
computing power, given the size of the ontologies involved. This
would make the time required by AgreementMakerLight to align
SNOMED CT with the full DBPedia prohibitive. However, it
is unnecessary, given that most of DBPedia is irrelevant to the
clinical domain, to use the full DBPedia. We expect that our users,
domain experts in clinical specialisations, will select a batch of
SNOMED CT terms of their interest at a time and create/revise
the translations of the terms in that smaller set. For instance, to
identify a set of allergy-related DBPedia pages to be aligned with
a set of SNOMED CT terms, we used the UNIX grep tool to filter
out of the DBPedia ontology every page with a label not containing
any of the words of the SNOMED CT terms. This resulted in a
size reduction from 2 GB to 12MB. To obtain the alignment with
DBPedia, we parameterized AgreementMakerLight to consider as
aligned all pairs of terms with a Jaro Distance 0:5.</p>
      <p>
        The last step in our method involves the application of an
ensemble learning algorithm
        <xref ref-type="bibr" rid="ref4">(Dietterich, 2000)</xref>
        . Each SNOMED CT
term has a class label, provided as “qualifier” in the term description.
For instance, the SNOMED CT term with code 158965000
has the term “Medical practitioner (occupation)”, from which
we can separate the description “Medical practitioner” and class
Occupation. Instead of choosing the best overall translation method,
we identify the best translation method for each class, based on the
validated translations. As this number will increase over time, we
expect that ensemble learning will in the end improve the automatic
translation process. However, given the small number of validated
and translated terms in Portuguese that we have at this time, we still
lack reliable data to evaluate this step.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>EVALUATION</title>
      <p>
        CPARA, Cata´logo Portugueˆs de Alergias e Reacc¸o˜es Adversas, is
a list of terms related to Allergies and Adverse Reactions in use
in the Portuguese National Health Service
        <xref ref-type="bibr" rid="ref13">(SPMS, 2015)</xref>
        . It was
developed with the goal of unifying the classification for allergies
and adverse reactions in Portugal. Given the high levels of patient
mobility, physicians frequently need to know precisely which
substances are known to affect an international patient. To address
this need, CPARA terms have been mapped to SNOMED CT
terms by a group of experts. These experts also created European
Portuguese translations of the SNOMED CT Common Terms and
Fully Specified Names (FSN) in the CPARA catalog. This mapping
is critical to making the medical information exchanged about
patients who travel internationally more accurate. In our evaluation,
we used the Common Terms translations as gold standard to
assess the accuracy of our translation approach. CPARA includes
191 codes and common terms of the US English distribution of
SNOMED CT, and the corresponding CPARA codes and terms. In
the Spanish SNOMED CT distribution there are 192 terms mapped
from these 191 codes (one code is mapped to two terms).
      </p>
      <p>Evaluation of the translated SNOMED CT terms started with
candidate translations for the allergy-related SNOMED CT codes in
CPARA generated by application of our method. We then evaluated
the resulting set of translations against the ground truth composed
by the corresponding CPARA terms as defined by the medical
committee that defined the mapping. To assess the accuracy of the
evaluated translation methods, we scored each term translation by
the Jaro distance between the automatically translated term and the
CPARA translation. The Jaro distance (JD) between two strings is 1
if the strings have the exact same number of characters and do not
have any transposition-10.</p>
      <p>Prior to computing Jaro distances all the translation candidates
and CPARA translations were normalised: we removed any
qualifiers from the SNOMED CT candidates, deleted quotes from
the CPARA translations, and converted all the named entities to
lowercase (e.g, “Moderate (severity modifier) (qualifier value)”
became “moderate” and “Contact metal agent (substance)” became
“contact metal agent”). These preparatory steps are necessary to
obtain meaningful similarity metrics, because these qualifiers are
common to many terms and can be translated independently. In
addition, the Jaro Distance considers the same letter in lowercase
and uppercase forms as two distinct characters. The statistics of the
translations obtained with each of the four implemented methods
described in the previous section are given in Table 1. In these
statistics, we considered as valid the translations with J D = 1.</p>
      <p>Method LSanoguurcaege Coverage #Method AJVDG STJDDEV
GT EN 100% 191 0.78 0.22
GT ES 114% 218 0.58 0.15
ICD 9 EN 10% 20 0.61 0.12
DBPedia EN 37% 70 0.89 0.03
Table 1. Global Results for all translation methods with the respective
average Jaro Distance (AVG JD) and Standard Deviation Jaro Distance
(STDEV JD). The implemented methods are Google Translate (GT), both
from English (EN) and Spanish (ES) to Portuguese, ICD-9 Mapping (ICD
9), and DBPedia Mapping (DBPedia). All translations were attempted with
two source languages, English (EN) and Spanish (ES). The number of
terms translated by each method is given in the # Method column.</p>
      <p>We observe that the SNOMED-DBPedia alignment obtains, for
a coverage of 37%, both the highest similarity (0.89) and lowest
standard deviation (0.03). This shows that we have been able to
accurately translate a set of SNOMED CT terms to Portuguese,
using basic alignment techniques, through the SNOMED CT to
DBPEDIA alignment. However, the generation of translations
10 The computation of the Jaro distances was made with the Python Jellyfish
library https://pypi.python.org/pypi/jellyfish
based on ontology alignments as proposed in this paper also has
limitations. In particular, only a fraction of the translations can be
obtained by this method, while Google Translate always proposed
a translation. Our success with Portuguese may not be granted
when aligning SNOMED CT with DBPedia in other languages with
smaller Wikipedias.</p>
      <p>Google Translate EN showed better accuracy than Google
Translate ES. This result was not initially expected, because
Spanish and Portuguese are close languages. This may result
from the CPARA terms being originally derived from the English
terminology. The number of translations obtained with Google
Translate ES is higher than the number of terms in the CPARA
dataset (yielding the 114% coverage). This is the result of how
we have obtained the Spanish SNOMED CT candidate terms for
translation. We started from the same initial SNOMED CT codes
that we used for the English translation and obtained the Spanish
codes matching the concept id and type id of the initial English
terms. This generated a higher number of ES candidate terms to
translate (218) then the initial EN terms (191).</p>
      <p>To evaluate which translation method works best for each class
of SNOMED CT terms, we measure which translation method
performs best in each class. This method is necessary to later model
an ensemble learning stage that could pick the best method for
each class. To obtain the results, we divided CPARA in classes
for translation purposes. These classes were extracted from the
qualifiers defined for the SNOMED full specified name terms. We
were interested in observing translation performance differences
across classes. To measure the differences, we calculated the
similarity and standard deviation as above of all the translation
candidates in each class. The results are summarised in Table 2.</p>
      <p>The SNOMED-DBPedia alignment generates better translations
for all classes, except Person and Qualifier Value. The poorer
performance could, however, reflect that only a small number of
related identified terms in the allergy domain have been identified
for both classes.</p>
      <p>Google TranslateES has better average similarity for the Person
class than Google Translate EN. This shows that the SNOMED CT
translation from Spanish could benefit from using the Spanish
language distribution for some CPARA translations.</p>
      <p>The translations obtained with the ICD-9 mapping translator are
worse than obtained by Google Translate (for both languages). This
results from ICD-9 being less comprehensive than SNOMED CT.
ICD codes mostly diseases, symptoms or causes of death. Therefore,
many of the CPARA terms in SNOMED CT were absent in the
ICD-9 to SNOMED CT mapping. The results also indicate that,
as expected and observed with ICD-9, terminologies of narrower
scope are not useful for translating clinical terms through ontology
alignment. The ICD-9 mapping is much less successful than other
resources, such as DBPedia and Google Translate, which can
provide much higher coverage of candidate translations, in many
cases while retaining equal or better accuracy. The ICD-9 mapping
method generates a high amount of 1-to-many matchings. However,
ICD-9 could still be useful in cases where it generates only one
matching description, which is usually very accurate and reliable,
attending that matchings between ICD and SNOMED CT and the
resulting translations are validated by medical experts.
5</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>SNOMED CT is increasingly prevalent in the heath care sector,
resulting from the increasing need to exchange medical records
in mobile societies. There is also a growing general interest
in accessing standardised machine-readable medical records for
improving managed heath care and biomedical research.</p>
      <p>We introduced a new methodology for translating SNOMED CT
terms, which relies primarily on aligning large ontologies,
complementing language-based methods that have been proposed
before. We prototyped an initial implementation of this methodology,
which obtained high coverage and good accuracy, despite only using
string matchers for SNOMED CT and DBPedia alignment along
with the domain-independent Google Translator. A translation was
considered valid when the expert mapping of an allergy-related
SNOMED CT term to Portuguese is identical to the obtained using
the SNOMED CT to DBPedia alignment. The accuracy under these
settings was 37%. This shows that both the English and Portuguese
versions of DBPedia are rich and accurately interlink with medical
terms. However, the results for Portuguese may not be indicative of
how this method would perform on other languages. Portuguese is
one of the top-10 Wikipedia languages in terms of the total number
of entries. The coverage of the obtained translations depend on
how rich the Wikipedia for a target language is in covering clinical
concepts and the extent to which these concepts are mapped to
Wikipedia pages in languages for which a SNOMED CT translation
exists. In addition, our validation experiment was confined to testing
about 200 SNOMED CT Common Terms in the alergies and adverse
reactions domain in European Portuguese. It is still unknown how
comprehensive and accurate the English and Portuguese Wikipedias
are across the full clinical domain, and how this factor affects the
accuracy of the SNOMED CT translations.</p>
      <p>Some improvements can still be added to the software
implementing the presented translation method. For instance,
the SNOMED CT to DBPedia alignment should explore the
defined semantic relationships between classes and terms in both
SNOMED CT and DBPedia. On the other hand, these relationships
could be explored to generate accurate translations for untranslated
terms in lexical methods to be provided. For this purpose, language
resources, such as WordNet, and parallel corpora of named entities,
such as previously validated SNOMED CT translations, could be
used to learn how words and multi-word expressions should be
properly translated.</p>
      <p>
        The measured accuracy of our translation method could still be
significantly increased without sacrificing the quality of translations,
by relaxing the similarity threshold. The negative impacts of such
relaxation are negligible, given that the generated translations
will always need to be validated by experts before used in a
clinical context. The expert-validation step presently relies on
the review of generated translations presented on spreadsheets. A
crowdsourcing platform could speed-up the process of creating and
maintaining a validated translation of SNOMED CT. Moreover,
active learning could also be incorporated in the crowdsourcing
platform, leading to fast improvement of the proposed translations
as the validated translation can also be used as input to generate
good candidates
        <xref ref-type="bibr" rid="ref2">(Ambati et al., 2010)</xref>
        .
      </p>
      <p>A complementary assessment of the alignment approach
proposed here could be obtained by applying it to the automatic
translation with one of the existing released translations, e.g.
Spanish. However, given that we rely on alignments bewteen lexical
resources we are not certain if the Wikipedia correspondences
between clinical term pages in Spanish and English have been
created based on SNOMED CT.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGEMENTS</title>
      <p>We thank Daniel Faria and the other members of the SOMER
project for help with running AgreementMakerLight and their
feedback. We also thank Dr. Anabela Santos for the help with the
CPARA translation of SNOMED CT to validate our tool, and Bruno
Martins for the pointers to previous works. This work was partially
supported by Fundac¸a˜o para a Cieˆncia e a Tecnologia (FCT), grants
PTDC/EIA-EIA/119119/2010 (SOMER), UID/CEC/50021/2013
and EXCL/EEI- ESS/0257/2012 (DataStorm).</p>
      <sec id="sec-6-1">
        <title>Class</title>
        <p>AVG JD</p>
      </sec>
      <sec id="sec-6-2">
        <title>STDEV JD</title>
      </sec>
      <sec id="sec-6-3">
        <title>Translation</title>
        <p>Technique</p>
      </sec>
      <sec id="sec-6-4">
        <title>Source</title>
        <p>Lang.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abdoune</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merabti</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Joubert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Assisting the translation of the core subset of snomed ct into french</article-title>
          .
          <source>Studies in health technology and informatics</source>
          ,
          <volume>169</volume>
          ,
          <fpage>819</fpage>
          -
          <lpage>823</lpage>
          . DOI:
          <volume>10</volume>
          .3233/978-1-
          <fpage>60750</fpage>
          -806-9-819.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ambati</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vogel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and Carbonell, J. (
          <year>2010</year>
          ).
          <article-title>Active learning and crowd-sourcing for machine translation</article-title>
          .
          <source>In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10).</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonelli</surname>
            ,
            <given-names>F. P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Agreementmaker: Efficient matching for large real-world schemas and ontologies</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1586</fpage>
          -
          <lpage>1589</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Dietterich</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Ensemble methods in machine learning</article-title>
          .
          <source>In Multiple Classifier Systems</source>
          , volume
          <volume>1857</volume>
          <source>of Lecture Notes in Computer Science</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Faria</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>The agreement maker light ontology matching system</article-title>
          .
          <source>In On the Move to Meaningful Internet Systems: OTM 2013 Conferences-Confederated International Conferences, number 8185 in Lecture Notes in Computer Science</source>
          , pages
          <fpage>527</fpage>
          --
          <lpage>541</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>IHTSDO</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Guidelines for Management of Translation of SNOMED CT</article-title>
          . IHTSDO - International
          <source>Health Terminology Standards Development Organisation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Langlais</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yvon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Analogical translation of medical words in different languages</article-title>
          . In B. Nordstro¨
          <article-title>m and A</article-title>
          . Ranta, editors,
          <source>Advances in Natural Language Processing</source>
          , volume
          <volume>5221</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>284</fpage>
          -
          <lpage>295</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isele</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jentzsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morsey</surname>
            , M., van Kleef,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calado</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trancoso</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Coheur</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Named entity translation using anchor texts</article-title>
          .
          <source>In The International Workshop on Spoken Language Translation (IWSLT).</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Perez-de Vin</surname>
            ˜aspre,
            <given-names>O.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Oronoz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Translating snomed ct terminology into a minor language</article-title>
          .
          <source>In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)</source>
          , pages
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          , Gothenburg, Sweden. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Winkler</surname>
            ,
            <given-names>W. E.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Approximate string comparison and its effect on an advanced record linkage system</article-title>
          .
          <source>In Advanced Record Linkage System. U.S. Bureau of the Census, Research Report</source>
          , pages
          <fpage>190</fpage>
          -
          <lpage>199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Schulz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernhardt-Melischnig</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kreuzthaler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daumkea</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Boeker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Machine vs. human translation of SNOMED CT terms</article-title>
          .
          <source>In MEDINFO</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>SPMS</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>CPARA - cata´logo portugueˆs de alergias e outras reac¸ o˜es adversas / portuguese catalogue of allergies and other adverse reactions</article-title>
          .
          <source>Technical Report V3.0</source>
          ,
          <fpage>09</fpage>
          -
          <lpage>03</lpage>
          -2015, SPMS - Servic¸os Partilhados do Ministe´rio da Sau´de. http: //tinyurl.com/me5jhq7,http://tinyurl.com/lehlhaa.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Stedman</surname>
            ,
            <given-names>T. L.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Stedman's English to Portuguese and Portuguese to English Medical Dictionary</article-title>
          .
          <source>French &amp; European Pubns. ISBN</source>
          <volume>13</volume>
          :
          <fpage>9780785975281</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>