<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards a Logic-based Assessment of the compatibility of UMLS sources</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">E</forename><surname>Jiménez-Ruiz</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universitat Jaume I</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">B</forename><surname>Cuenca Grau</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">University of Oxford</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">R</forename><surname>Berlanga</surname></persName>
							<email>berlanga@uji.es</email>
							<affiliation key="aff0">
								<orgName type="institution">Universitat Jaume I</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
							<email>ian.horrocks@comlab.ox.ac.uk</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Oxford</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards a Logic-based Assessment of the compatibility of UMLS sources</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8CC45FA02FD2FE5798E07DFEB203B240</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The UMLS Metathesaurus (UMLS-Meta) is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies. The techniques used in the construction of UMLS-Meta are mostly based on lexical matching and often disregard the semantics of the sources being integrated. In this paper we aim at developing logic-based techniques to automatically detect and fix potential errors in UMLS-Meta. Our research is currently at an early stage, so we only present here our preliminary ideas and experimental results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Motivation</head><p>In its 2009AA version, UMLS-Meta <ref type="bibr" target="#b0">[1]</ref> integrates more than one hundred thesauri and ontologies. The main content of UMLS-Meta is a list with more than two million unique identifiers (CUIs). Associated to each CUI, there is a set of term names coming from different sources. Pairs of terms with the same CUI are synonyms and hence can be represented as an equivalence mapping.</p><p>Currently, the integration of new sources in UMLS-Meta combines automatic techniques together with expert assessment <ref type="bibr" target="#b0">[1]</ref>. Automatic techniques are mainly based on lexical matching algorithms (e.g., <ref type="bibr" target="#b1">[2]</ref>). Other techniques used to improve the design process involve, for example, exploiting synonymy relations from external knowledge sources such as WordNet (e.g., <ref type="bibr" target="#b2">[3]</ref>).</p><p>The main limitation of these techniques is that they do not take into account the logic-based semantics of the sources, which can be rich ontologies, rather than simple taxonomies (e.g., FMA, NCI, and SNOMED). Our ultimate goal is to develop logic-based techniques to detect both potential errors and missing information in both UMLS-Meta and such rich ontologies. Our preliminary results using heuristics inspired in logic-based reasoning and module extraction suggest, on the one hand, that UMLS-Meta might be incomplete and, on the other hand, that it contains a fair number of conflicting mappings, which reveal potential design errors in either UMLS-Meta and/or in the integrated ontologies. We also propose novel techniques for automating the conflict disambiguation process.</p><p>The logic-based techniques we aim at developing are based on the three general principles that we propose next.</p><p>Conservativity Principle: The mappings alone should not introduce new semantic relationships between concepts from one of the sources.</p><p>For example, UMLS-Meta contains two mappings establishing the equivalence between the concept Cardiac Muscle Tissue from FMA and the NCI concepts Myocardium and Heart Muscle respectively. As a consequence, UMLS-Meta implies that Myocardium is also equivalent to Heart Muscle. However, in NCI Myocardium neither subsumes, nor it is subsumed by Heart Muscle. The conservativity principle suggests that the obtained mappings are in conflict and (at least) one of them may be incorrect.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Consistency Principle:</head><p>The integration of well-established ontologies should not introduce unintended logical consequences.</p><p>For example, UMLS-Meta maps the FMA concept Protein to the NCI concept Protein, and the FMA concept Lymphokine to the NCI concept Therapeutic Lymphokine. In FMA, Lymphokine is a type of Protein, whereas in NCI Therapeutic Limphokine is a type of Drug. Furthermore, Drug and Protein are disjoint in NCI and hence the union of NCI, FMA and UMLS-Meta would imply that Lymphokine and Therapeutic Limphokine are unsatisfiable.</p><p>Inconsistencies and other unintended logical consequences may be due to either erroneous mappings or to inherent incompatibilities between the sources. In any case, if the integrated sources are to be successfully used in an application, these errors should be fixed by modifying either the sources or the mappings. If the locality principle does not hold, then UMLS-Meta may be incomplete and new mappings should be discovered, or the definitions of both concepts in their respective ontologies may be different or incompatible, or the mapping between C and C may be erroneous. As an example of the latter, UMLS-Meta maps the concepts Upper Extremity from NCI and Arm from FMA. The mapping violates the locality principle because none of the entities in their respective logic-based modules <ref type="bibr" target="#b3">[4]</ref> have been mapped. After closer inspection of the ontologies, the mapping can be clearly identified as erroneous.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Implemented Heuristics</head><p>To implement these principles, we propose a preliminary collection of heuristics. The first two heuristics given next are related to similar ones used by <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref> in a different setting. The third one is, to the best of our knowledge, entirely novel.</p><p>Injectivity of mappings. If concepts C 1 and C 2 from O are mapped via UMLS-Meta to the same concept D from O , then UMLS-Meta alone implies that C 1 and C 2 are logically equivalent. However, if O does not imply the equivalence of C 1 and C 2 then the conservativity principle is violated (see previous example). In that case, we say that these mappings are in conflict. Disjointness-based inconsistency. If C 1 and C 2 from O are mapped to D 1 and D 2 from O and O implies that C 1 is subsumed by C 2 , but O implies that D 1 and D 2 are disjoint, then the consistency principle is violated (see previous example). A variant of this heuristic, which we call assumption of disjointness, is obtained by recording a conflict whenever no subsumption relationship holds between D 1 and D 2 (and not only if they are disjoint). This reflects the fact that ontologies are typically underspecified w.r.t. disjointness. Similarity of logic-based modules. To formalize the notion of a concept being "semantically related" to another concept in an ontology, we use the well-known modularization framework from <ref type="bibr" target="#b3">[4]</ref>. If C from O is mapped via UMLS-Meta to D from O , and most of the concepts in the module M O C for C in O are not mapped to those in the module M O D for D in O , then the locality principle is violated (see previous example). In this case the mapping between C and D is recorded as "suspicious". To implement this idea, we measure the similarity between the corresponding modules by computing the relationship between the number of concepts in the modules which are mapped via UMLS-Meta and those which are not, using an adaptation of the well-known Dice's coefficient:</p><formula xml:id="formula_0">sim(M O C , M O D ) = 2 × | Mappings between sig(M O C ) &amp; sig(M O D ) | | sig(M O C ) | + | sig(M O D ) |<label>(1)</label></formula><p>where sig(•) denotes the set of concepts and relationships in the corresponding module. If the similarity between the modules of the mapped entities is lower than a given threshold, we assume that the mapping is "suspicious". The first two heuristics allow us to identify pairs of mappings in UMLS-Meta that are (potentially) in mutual conflict. However, it is not clear how to automatically disambiguate these conflicts. To this end, we again exploit the locality principle. Assume that C 1 and C 2 from O are mapped via UMLS-Meta to D 1 and D 2 from O respectively and that these mappings are in conflict. We then compute the similarities sim(M O C1 , M O D1 ) and sim(M O C2 , M O D2 ) respectively as in (1) and select the mapping with the highest associated similarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Preliminary Experiments and Future Work</head><p>We have evaluated our heuristics using UMLS-Meta version 2009AA and the corresponding versions of FMA, SNOMED and NCI. FMA, NCI and SNOMED contain 78989, 66724 and 304802 concepts respectively. UMLS-Meta 2009AA contains 2271 mappings between FMA and NCI, 8376 mappings between FMA and SNOMED and 18384 mappings between SNOMED and NCI.</p><p>Using the principle of conservativity, we have found 513 conflicting pairs of mappings between FMA and NCI, 1367 between FMA and SNOMED and 4290 between SNOMED and NCI. Using logic-based modules as explained in the end of Section 3, we obtained that 239 mapping pairs between FMA and NCI (resp. 65 between FMA and SNOMED, and 1158 between SNOMED and NCI) could not be disambiguated since no other concepts in the relevant modules where mapped by UMLS-Meta. For the remaining pairs we could produce a recommendation.</p><p>To evaluate the principle of consistency, we concentrate on the mappings between NCI and FMA:</p><p>-Using the disjointness-based inconsistency heuristic we found 307 conflicting mapping pairs between FMA and NCI. Using logic-based modules, we failed to disambiguate only 36 conflicting pairs. Each of these conflicts will certainly lead to the unsatisfiability of a concept in the union of the source ontologies and UMLS-Meta. Thus, semantically, the integration of these ontologies via UMLS-Meta is far from error-free. -Using the assumption of disjointness heuristic we found 1707 conflicts between FMA and NCI. We failed to disambiguate only 202 conflicting pairs.</p><p>Finally, using the principle of locality and a similarity threshold of 1% (resp. 2%) we could identify 12 (resp. 110) "suspicious" mappings between FMA and NCI, 10 (resp. 689) between FMA and SNOMED and 1420 (resp. 2336) between SNOMED and NCI. This implies that there is a significant number of mappings whose "semantic neighborhood" is not mapped accordingly.</p><p>Previous results suggest the benefits of the implemented heuristics in the design of normative mapping sets such as UMLS-Meta. For future work, we plan to design new heuristics using the general principles from Section 2 and seek feedback from domain experts in the conflict disambiguation process.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Locality Principle: If two concepts C and C from ontologies O and O are correctly mapped, then the concepts semantically related to C in O are likely to be mapped to those semantically related to C in O .</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Ernesto Jimenez was supported by the Valencian Government (BFPI06/372). Bernardo Cuenca is supported by a Royal Society University Research Fellowship.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The unified medical language system (UMLS): integrating biomedical terminology</title>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<date type="published" when="2004-01">January 2004</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Effective mapping of biomedical text to the umls metathesaurus: the metamap program</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Aronson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc AMIA Symp</title>
		<imprint>
			<biblScope unit="page" from="17" to="21" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Using wordnet synonym substitution to enhance umls source integration</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Geller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Halper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Perl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artif. Intell. Med</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Just the right amount: Extracting modules from ontologies</title>
		<author>
			<persName><forename type="first">Cuenca</forename><surname>Grau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kazakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sattler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2007">2007. 2007</date>
			<biblScope unit="page" from="717" to="727" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Reasoning Support for Mapping Revision</title>
		<author>
			<persName><forename type="first">C</forename><surname>Meilicke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Stuckenschmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tamilin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Logic and Computation</title>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ontology matching with semantic verification</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">R</forename><surname>Jean-Mary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Shironoshita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Kabuka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics</title>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Ontology integration using mappings: Towards getting the right logical consequences</title>
		<author>
			<persName><forename type="first">E</forename><surname>Jimenez-Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Cuenca Grau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Berlanga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ESWC</title>
				<meeting>of ESWC</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="173" to="187" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
