=Paper= {{Paper |id=None |storemode=property |title=Towards a Logic-based Assessment of the compatibility of UMLS sources |pdfUrl=https://ceur-ws.org/Vol-559/HighlightPoster2.pdf |volume=Vol-559 |dblpUrl=https://dblp.org/rec/conf/swat4ls/Jimenez-RuizGLH09 }} ==Towards a Logic-based Assessment of the compatibility of UMLS sources== https://ceur-ws.org/Vol-559/HighlightPoster2.pdf
         Towards a Logic-based Assessment of the
             compatibility of UMLS sources

       E. Jiménez-Ruiz1? , B.Cuenca Grau2?? , R. Berlanga1 , and I. Horrocks2
               1
                  Universitat Jaume I, Spain, {ejimenez,berlanga}@uji.es
          2
              University of Oxford, UK, {berg,ian.horrocks}@comlab.ox.ac.uk




         Abstract The UMLS Metathesaurus (UMLS-Meta) is currently the
         most comprehensive effort for integrating independently-developed med-
         ical thesauri and ontologies. The techniques used in the construction of
         UMLS-Meta are mostly based on lexical matching and often disregard
         the semantics of the sources being integrated. In this paper we aim at
         developing logic-based techniques to automatically detect and fix poten-
         tial errors in UMLS-Meta. Our research is currently at an early stage, so
         we only present here our preliminary ideas and experimental results.



1      Motivation

In its 2009AA version, UMLS-Meta [1] integrates more than one hundred the-
sauri and ontologies. The main content of UMLS-Meta is a list with more than
two million unique identifiers (CUIs). Associated to each CUI, there is a set of
term names coming from different sources. Pairs of terms with the same CUI
are synonyms and hence can be represented as an equivalence mapping.
    Currently, the integration of new sources in UMLS-Meta combines automatic
techniques together with expert assessment [1]. Automatic techniques are mainly
based on lexical matching algorithms (e.g., [2]). Other techniques used to im-
prove the design process involve, for example, exploiting synonymy relations
from external knowledge sources such as WordNet (e.g., [3]).
    The main limitation of these techniques is that they do not take into account
the logic-based semantics of the sources, which can be rich ontologies, rather
than simple taxonomies (e.g., FMA, NCI, and SNOMED). Our ultimate goal is
to develop logic-based techniques to detect both potential errors and missing in-
formation in both UMLS-Meta and such rich ontologies. Our preliminary results
using heuristics inspired in logic-based reasoning and module extraction suggest,
on the one hand, that UMLS-Meta might be incomplete and, on the other hand,
that it contains a fair number of conflicting mappings, which reveal potential
design errors in either UMLS-Meta and/or in the integrated ontologies. We also
propose novel techniques for automating the conflict disambiguation process.
?
     Ernesto Jimenez was supported by the Valencian Government (BFPI06/372).
??
     Bernardo Cuenca is supported by a Royal Society University Research Fellowship.
2   Proposed Principles
The logic-based techniques we aim at developing are based on the three general
principles that we propose next.
    Conservativity Principle: The mappings alone should not introduce
    new semantic relationships between concepts from one of the sources.
    For example, UMLS-Meta contains two mappings establishing the equiva-
lence between the concept Cardiac Muscle Tissue from FMA and the NCI con-
cepts Myocardium and Heart Muscle respectively. As a consequence, UMLS-
Meta implies that Myocardium is also equivalent to Heart Muscle. However, in
NCI Myocardium neither subsumes, nor it is subsumed by Heart Muscle. The
conservativity principle suggests that the obtained mappings are in conflict and
(at least) one of them may be incorrect.
    Consistency Principle: The integration of well-established ontologies
    should not introduce unintended logical consequences.
    For example, UMLS-Meta maps the FMA concept Protein to the NCI con-
cept Protein, and the FMA concept Lymphokine to the NCI concept Thera-
peutic Lymphokine. In FMA, Lymphokine is a type of Protein, whereas in NCI
Therapeutic Limphokine is a type of Drug. Furthermore, Drug and Protein are
disjoint in NCI and hence the union of NCI, FMA and UMLS-Meta would imply
that Lymphokine and Therapeutic Limphokine are unsatisfiable.
    Inconsistencies and other unintended logical consequences may be due to
either erroneous mappings or to inherent incompatibilities between the sources.
In any case, if the integrated sources are to be successfully used in an application,
these errors should be fixed by modifying either the sources or the mappings.
    Locality Principle: If two concepts C and C 0 from ontologies O and
    O0 are correctly mapped, then the concepts semantically related to C in
    O are likely to be mapped to those semantically related to C 0 in O0 .
   If the locality principle does not hold, then UMLS-Meta may be incomplete
and new mappings should be discovered, or the definitions of both concepts
in their respective ontologies may be different or incompatible, or the map-
ping between C and C 0 may be erroneous. As an example of the latter, UMLS-
Meta maps the concepts Upper Extremity from NCI and Arm from FMA. The
mapping violates the locality principle because none of the entities in their re-
spective logic-based modules [4] have been mapped. After closer inspection of
the ontologies, the mapping can be clearly identified as erroneous.


3   Implemented Heuristics
To implement these principles, we propose a preliminary collection of heuristics.
The first two heuristics given next are related to similar ones used by [5, 6, 7] in
a different setting. The third one is, to the best of our knowledge, entirely novel.
Injectivity of mappings. If concepts C1 and C2 from O are mapped via UMLS-
Meta to the same concept D from O0 , then UMLS-Meta alone implies that C1
and C2 are logically equivalent. However, if O does not imply the equivalence of
C1 and C2 then the conservativity principle is violated (see previous example).
In that case, we say that these mappings are in conflict.
Disjointness-based inconsistency. If C1 and C2 from O are mapped to D1 and
D2 from O0 and O implies that C1 is subsumed by C2 , but O0 implies that
D1 and D2 are disjoint, then the consistency principle is violated (see previous
example). A variant of this heuristic, which we call assumption of disjointness,
is obtained by recording a conflict whenever no subsumption relationship holds
between D1 and D2 (and not only if they are disjoint). This reflects the fact that
ontologies are typically underspecified w.r.t. disjointness.
Similarity of logic-based modules. To formalize the notion of a concept being
“semantically related” to another concept in an ontology, we use the well-known
modularization framework from [4]. If C from O is mapped via UMLS-Meta to
D from O0 , and most of the concepts in the module MCO for C in O are not
                                     O0
mapped to those in the module MD        for D in O0 , then the locality principle
is violated (see previous example). In this case the mapping between C and D
is recorded as “suspicious”. To implement this idea, we measure the similarity
between the corresponding modules by computing the relationship between the
number of concepts in the modules which are mapped via UMLS-Meta and those
which are not, using an adaptation of the well-known Dice’s coefficient:

                                                                        0
                    0         | Mappings between sig(MCO ) & sig(MD
                                                                  O
                                                                    )|
       sim(MCO , MD
                  O
                    )=2×                    O              O 0                 (1)
                                     | sig(MC ) | + | sig(MD ) |

where sig(·) denotes the set of concepts and relationships in the corresponding
module. If the similarity between the modules of the mapped entities is lower
than a given threshold, we assume that the mapping is “suspicious”.
    The first two heuristics allow us to identify pairs of mappings in UMLS-
Meta that are (potentially) in mutual conflict. However, it is not clear how
to automatically disambiguate these conflicts. To this end, we again exploit the
locality principle. Assume that C1 and C2 from O are mapped via UMLS-Meta to
D1 and D2 from O0 respectively and that these mappings are in conflict. We then
                                      O0                   O0
compute the similarities sim(MCO1 , MD 1
                                         ) and sim(MCO2 , MD 2
                                                               ) respectively as in
(1) and select the mapping with the highest associated similarity.


4   Preliminary Experiments and Future Work
We have evaluated our heuristics using UMLS-Meta version 2009AA and the
corresponding versions of FMA, SNOMED and NCI. FMA, NCI and SNOMED
contain 78989, 66724 and 304802 concepts respectively. UMLS-Meta 2009AA
contains 2271 mappings between FMA and NCI, 8376 mappings between FMA
and SNOMED and 18384 mappings between SNOMED and NCI.
    Using the principle of conservativity, we have found 513 conflicting pairs of
mappings between FMA and NCI, 1367 between FMA and SNOMED and 4290
between SNOMED and NCI. Using logic-based modules as explained in the
end of Section 3, we obtained that 239 mapping pairs between FMA and NCI
(resp. 65 between FMA and SNOMED, and 1158 between SNOMED and NCI)
could not be disambiguated since no other concepts in the relevant modules
where mapped by UMLS-Meta. For the remaining pairs we could produce a
recommendation.
    To evaluate the principle of consistency, we concentrate on the mappings
between NCI and FMA:
 – Using the disjointness-based inconsistency heuristic we found 307 conflicting
   mapping pairs between FMA and NCI. Using logic-based modules, we failed
   to disambiguate only 36 conflicting pairs. Each of these conflicts will certainly
   lead to the unsatisfiability of a concept in the union of the source ontologies
   and UMLS-Meta. Thus, semantically, the integration of these ontologies via
   UMLS-Meta is far from error-free.
 – Using the assumption of disjointness heuristic we found 1707 conflicts be-
   tween FMA and NCI. We failed to disambiguate only 202 conflicting pairs.
    Finally, using the principle of locality and a similarity threshold of 1% (resp.
2%) we could identify 12 (resp. 110) “suspicious” mappings between FMA and
NCI, 10 (resp. 689) between FMA and SNOMED and 1420 (resp. 2336) between
SNOMED and NCI. This implies that there is a significant number of mappings
whose “semantic neighborhood” is not mapped accordingly.
    Previous results suggest the benefits of the implemented heuristics in the
design of normative mapping sets such as UMLS-Meta. For future work, we plan
to design new heuristics using the general principles from Section 2 and seek
feedback from domain experts in the conflict disambiguation process.


References
[1] Bodenreider, O.: The unified medical language system (UMLS): integrating
    biomedical terminology. Nucleic acids research 32(Database issue) (January 2004)
[2] Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus:
    the metamap program. Proc AMIA Symp (2001) 17–21
[3] Huang, K.C., Geller, J., Halper, M., Perl, Y., Xu, J.: Using wordnet synonym
    substitution to enhance umls source integration. Artif. Intell. Med. 46(2) (2009)
[4] Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Just the right amount:
    Extracting modules from ontologies. In: Proc. of WWW 2007. (2007) 717–727
[5] Meilicke, C., Stuckenschmidt, H., Tamilin, A.: Reasoning Support for Mapping
    Revision. Journal of Logic and Computation (2008)
[6] Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with se-
    mantic verification. Journal of Web Semantics (2009)
[7] Jimenez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Ontology integra-
    tion using mappings: Towards getting the right logical consequences. In: Proc. of
    ESWC. (2009) 173–187