<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Logic-based Assessment of the compatibility of UMLS sources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>E. Jim´enez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B.Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Berlanga</string-name>
          <email>berlanga@uji.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I. Horrocks</string-name>
          <email>ian.horrocks@comlab.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Jaume I</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The UMLS Metathesaurus (UMLS-Meta) is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies. The techniques used in the construction of UMLS-Meta are mostly based on lexical matching and often disregard the semantics of the sources being integrated. In this paper we aim at developing logic-based techniques to automatically detect and fix potential errors in UMLS-Meta. Our research is currently at an early stage, so we only present here our preliminary ideas and experimental results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        In its 2009AA version, UMLS-Meta [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] integrates more than one hundred
thesauri and ontologies. The main content of UMLS-Meta is a list with more than
two million unique identifiers (CUIs). Associated to each CUI, there is a set of
term names coming from different sources. Pairs of terms with the same CUI
are synonyms and hence can be represented as an equivalence mapping.
      </p>
      <p>
        Currently, the integration of new sources in UMLS-Meta combines automatic
techniques together with expert assessment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Automatic techniques are mainly
based on lexical matching algorithms (e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). Other techniques used to
improve the design process involve, for example, exploiting synonymy relations
from external knowledge sources such as WordNet (e.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
      </p>
      <p>The main limitation of these techniques is that they do not take into account
the logic-based semantics of the sources, which can be rich ontologies, rather
than simple taxonomies (e.g., FMA, NCI, and SNOMED). Our ultimate goal is
to develop logic-based techniques to detect both potential errors and missing
information in both UMLS-Meta and such rich ontologies. Our preliminary results
using heuristics inspired in logic-based reasoning and module extraction suggest,
on the one hand, that UMLS-Meta might be incomplete and, on the other hand,
that it contains a fair number of conflicting mappings, which reveal potential
design errors in either UMLS-Meta and/or in the integrated ontologies. We also
propose novel techniques for automating the conflict disambiguation process.</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Principles</title>
      <p>The logic-based techniques we aim at developing are based on the three general
principles that we propose next.</p>
      <p>Conservativity Principle: The mappings alone should not introduce
new semantic relationships between concepts from one of the sources.</p>
      <p>For example, UMLS-Meta contains two mappings establishing the
equivalence between the concept Cardiac Muscle Tissue from FMA and the NCI
concepts Myocardium and Heart Muscle respectively. As a consequence,
UMLSMeta implies that Myocardium is also equivalent to Heart Muscle. However, in
NCI Myocardium neither subsumes, nor it is subsumed by Heart Muscle. The
conservativity principle suggests that the obtained mappings are in conflict and
(at least) one of them may be incorrect.</p>
      <p>Consistency Principle: The integration of well-established ontologies
should not introduce unintended logical consequences.</p>
      <p>For example, UMLS-Meta maps the FMA concept Protein to the NCI
concept Protein, and the FMA concept Lymphokine to the NCI concept
Therapeutic Lymphokine. In FMA, Lymphokine is a type of Protein, whereas in NCI
Therapeutic Limphokine is a type of Drug. Furthermore, Drug and Protein are
disjoint in NCI and hence the union of NCI, FMA and UMLS-Meta would imply
that Lymphokine and Therapeutic Limphokine are unsatisfiable.</p>
      <p>Inconsistencies and other unintended logical consequences may be due to
either erroneous mappings or to inherent incompatibilities between the sources.
In any case, if the integrated sources are to be successfully used in an application,
these errors should be fixed by modifying either the sources or the mappings.</p>
      <p>Locality Principle: If two concepts C and C0 from ontologies O and
O0 are correctly mapped, then the concepts semantically related to C in
O are likely to be mapped to those semantically related to C0 in O0.</p>
      <p>
        If the locality principle does not hold, then UMLS-Meta may be incomplete
and new mappings should be discovered, or the definitions of both concepts
in their respective ontologies may be different or incompatible, or the
mapping between C and C0 may be erroneous. As an example of the latter,
UMLSMeta maps the concepts Upper Extremity from NCI and Arm from FMA. The
mapping violates the locality principle because none of the entities in their
respective logic-based modules [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have been mapped. After closer inspection of
the ontologies, the mapping can be clearly identified as erroneous.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Implemented Heuristics</title>
      <p>
        To implement these principles, we propose a preliminary collection of heuristics.
The first two heuristics given next are related to similar ones used by [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ] in
a different setting. The third one is, to the best of our knowledge, entirely novel.
Injectivity of mappings. If concepts C1 and C2 from O are mapped via
UMLSMeta to the same concept D from O0, then UMLS-Meta alone implies that C1
and C2 are logically equivalent. However, if O does not imply the equivalence of
C1 and C2 then the conservativity principle is violated (see previous example).
In that case, we say that these mappings are in conflict.
      </p>
      <p>Disjointness-based inconsistency. If C1 and C2 from O are mapped to D1 and
D2 from O0 and O implies that C1 is subsumed by C2, but O0 implies that
D1 and D2 are disjoint, then the consistency principle is violated (see previous
example). A variant of this heuristic, which we call assumption of disjointness,
is obtained by recording a conflict whenever no subsumption relationship holds
between D1 and D2 (and not only if they are disjoint). This reflects the fact that
ontologies are typically underspecified w.r.t. disjointness.</p>
      <p>
        Similarity of logic-based modules. To formalize the notion of a concept being
“semantically related” to another concept in an ontology, we use the well-known
modularization framework from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. If C from O is mapped via UMLS-Meta to
D from O0, and most of the concepts in the module MCO for C in O are not
mapped to those in the module MDO0 for D in O0, then the locality principle
is violated (see previous example). In this case the mapping between C and D
is recorded as “suspicious”. To implement this idea, we measure the similarity
between the corresponding modules by computing the relationship between the
number of concepts in the modules which are mapped via UMLS-Meta and those
which are not, using an adaptation of the well-known Dice’s coefficient:
sim(MCO, MDO0 ) = 2 ×
| Mappings between sig(MCO) &amp; sig(MDO0 ) |
| sig(MCO) | + | sig(MDO0 ) |
(1)
where sig(·) denotes the set of concepts and relationships in the corresponding
module. If the similarity between the modules of the mapped entities is lower
than a given threshold, we assume that the mapping is “suspicious”.
      </p>
      <p>The first two heuristics allow us to identify pairs of mappings in
UMLSMeta that are (potentially) in mutual conflict. However, it is not clear how
to automatically disambiguate these conflicts. To this end, we again exploit the
locality principle. Assume that C1 and C2 from O are mapped via UMLS-Meta to
D1 and D2 from O0 respectively and that these mappings are in conflict. We then
0 0
compute the similarities sim(MCO1 , MDO1 ) and sim(MCO2 , MDO2 ) respectively as in
(1) and select the mapping with the highest associated similarity.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Preliminary Experiments and Future Work</title>
      <p>We have evaluated our heuristics using UMLS-Meta version 2009AA and the
corresponding versions of FMA, SNOMED and NCI. FMA, NCI and SNOMED
contain 78989, 66724 and 304802 concepts respectively. UMLS-Meta 2009AA
contains 2271 mappings between FMA and NCI, 8376 mappings between FMA
and SNOMED and 18384 mappings between SNOMED and NCI.</p>
      <p>Using the principle of conservativity, we have found 513 conflicting pairs of
mappings between FMA and NCI, 1367 between FMA and SNOMED and 4290
between SNOMED and NCI. Using logic-based modules as explained in the
end of Section 3, we obtained that 239 mapping pairs between FMA and NCI
(resp. 65 between FMA and SNOMED, and 1158 between SNOMED and NCI)
could not be disambiguated since no other concepts in the relevant modules
where mapped by UMLS-Meta. For the remaining pairs we could produce a
recommendation.</p>
      <p>To evaluate the principle of consistency, we concentrate on the mappings
between NCI and FMA:
– Using the disjointness-based inconsistency heuristic we found 307 conflicting
mapping pairs between FMA and NCI. Using logic-based modules, we failed
to disambiguate only 36 conflicting pairs. Each of these conflicts will certainly
lead to the unsatisfiability of a concept in the union of the source ontologies
and UMLS-Meta. Thus, semantically, the integration of these ontologies via
UMLS-Meta is far from error-free.
– Using the assumption of disjointness heuristic we found 1707 conflicts
between FMA and NCI. We failed to disambiguate only 202 conflicting pairs.</p>
      <p>Finally, using the principle of locality and a similarity threshold of 1% (resp.
2%) we could identify 12 (resp. 110) “suspicious” mappings between FMA and
NCI, 10 (resp. 689) between FMA and SNOMED and 1420 (resp. 2336) between
SNOMED and NCI. This implies that there is a significant number of mappings
whose “semantic neighborhood” is not mapped accordingly.</p>
      <p>Previous results suggest the benefits of the implemented heuristics in the
design of normative mapping sets such as UMLS-Meta. For future work, we plan
to design new heuristics using the general principles from Section 2 and seek
feedback from domain experts in the conflict disambiguation process.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The unified medical language system (UMLS): integrating biomedical terminology</article-title>
          .
          <source>Nucleic acids research</source>
          32(Database issue) (
          <year>January 2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Effective mapping of biomedical text to the umls metathesaurus: the metamap program</article-title>
          .
          <source>Proc AMIA Symp</source>
          (
          <year>2001</year>
          )
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perl</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Using wordnet synonym substitution to enhance umls source integration</article-title>
          .
          <source>Artif. Intell. Med</source>
          .
          <volume>46</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          :
          <article-title>Just the right amount: Extracting modules from ontologies</article-title>
          .
          <source>In: Proc. of WWW</source>
          <year>2007</year>
          .
          <article-title>(</article-title>
          <year>2007</year>
          )
          <fpage>717</fpage>
          -
          <lpage>727</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Reasoning Support for Mapping Revision</article-title>
          .
          <source>Journal of Logic and Computation</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Jean-Mary</surname>
            ,
            <given-names>Y.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shironoshita</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kabuka</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          :
          <article-title>Ontology matching with semantic verification</article-title>
          .
          <source>Journal of Web Semantics</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlanga</surname>
          </string-name>
          , R.:
          <article-title>Ontology integration using mappings: Towards getting the right logical consequences</article-title>
          .
          <source>In: Proc. of ESWC</source>
          . (
          <year>2009</year>
          )
          <fpage>173</fpage>
          -
          <lpage>187</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>