<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Partitioning and Matching Tuning of Large Biomedical Ontologies</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Amir</forename><surname>Laadhar</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">UMR 5505)</orgName>
								<orgName type="institution" key="instit1">Toulouse University</orgName>
								<orgName type="institution" key="instit2">IRIT (CNRS</orgName>
								<address>
									<settlement>Toulouse</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Faiza</forename><surname>Ghozzi</surname></persName>
							<email>faiza.ghozzi@isims.usf.tn</email>
							<affiliation key="aff1">
								<orgName type="institution" key="instit1">University of Sfax</orgName>
								<orgName type="institution" key="instit2">MIRACL</orgName>
								<address>
									<settlement>Sfax</settlement>
									<country key="TN">Tunisia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ryutaro</forename><surname>Ichise</surname></persName>
							<email>ichise@nii.ac.jp</email>
							<affiliation key="aff2">
								<orgName type="institution">National Institute of Informatics</orgName>
								<address>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Imen</forename><surname>Megdiche</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">UMR 5505)</orgName>
								<orgName type="institution" key="instit1">Toulouse University</orgName>
								<orgName type="institution" key="instit2">IRIT (CNRS</orgName>
								<address>
									<settlement>Toulouse</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Franck</forename><surname>Ravat</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">UMR 5505)</orgName>
								<orgName type="institution" key="instit1">Toulouse University</orgName>
								<orgName type="institution" key="instit2">IRIT (CNRS</orgName>
								<address>
									<settlement>Toulouse</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Olivier</forename><surname>Teste</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">UMR 5505)</orgName>
								<orgName type="institution" key="instit1">Toulouse University</orgName>
								<orgName type="institution" key="instit2">IRIT (CNRS</orgName>
								<address>
									<settlement>Toulouse</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Partitioning and Matching Tuning of Large Biomedical Ontologies</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">87FB7B5A6F5076063C92247B5C7D2234</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Large biomedical ontologies such as SNOMED CT, NCI, and FMA are extensively employed in the biomedical domain. These complex ontologies are based on diverse modelling views and vocabularies. We define an approach that breaks up a large ontology alignment problem into a set of smaller matching tasks. We coupled this approach with an automated tuning process, which generates the adequate thresholds of the available similarity measure for any biomedical matching task. Experiments demonstrate that the coupling between ontology partitioning and threshold tuning outperforms the existing approaches.</p><p>2 Partitioning and Matching Tuning of Biomedical Ontologies</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Architecture overview</head><p>In figure <ref type="figure" target="#fig_0">1</ref>, we depict the different stages for ontologies partitioning and threshold tuning. These stages are detailed in the following sections. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Ontologies Partitioning</head><p>We employ the hierarchical agglomerative clustering technique to divide an ontology into a set of partitions. This method is based on the equation 1 to compute the structural similarity between the entities of the input ontologies. This equation is inspired by Wu and Palmer <ref type="bibr" target="#b3">[4]</ref> similarity measure. The partitioning of every ontology results in a dendrogram. We cut each dendrogram automatically in order to result in a set of partitions. We examine the output of all the possible cuts until finding the first cut which do not result in any isolated partitions. Isolated partitions are partitions containing only one entity. We identify the similar partition-pairs through the set of exact matchings between the input ontologies.</p><p>StrcSim(e i,m , e i,n ) = Dist(r i , lca) × 2 Dist(e i,m , lca) + Dist(e i,n , lca) + Dist(r i , lca) × 2</p><p>(1)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Threshold tuning</head><p>The available external knowledge sources represent mediator biomedical ontologies between the two input ontologies. We cross-search the input ontologies and the mediating ontology in order to find synthetic reference alignments. We compute the similarity score Sim between all the annotations of the generated alignments. These similarity scores are represented by: simScore = sim 1 ,... ,sim n . The threshold T h value is deducted from simScore using the Equation <ref type="formula">2</ref>:</p><formula xml:id="formula_0">T h = simn sim1 sim i |simScore| (2)</formula><p>3 Experiments</p><p>In Table <ref type="table" target="#tab_0">1</ref>, we compare our proposed partitioning approach to the currently available partitioning strategies using two OAEI 2017 biomedical data sets: the Anatomy task and the LargeBio small segments tasks. We employed UBERON as an external biomedical knowledge for deriving synthetic reference alignments. We use ISUB similarity measure to compute the similarity scores between the derived mappings. In Table <ref type="table" target="#tab_1">2</ref>, we illustrate the accuracy of the partitioning approach with the deduced thresholds. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion and Future Work</head><p>As future work, we intend to automate all the matching tuning process while focusing on different type of heterogeneity applied over the partitions-pairs.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Architecture Overview</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Anatomy track partitioning results</figDesc><table><row><cell></cell><cell cols="4">Precision F-Measure Recall Number of partitions</cell></row><row><cell cols="2">Proposed approach 0.945</cell><cell>0.883</cell><cell>0.829</cell><cell>57/57</cell></row><row><cell>SeeCOnt [3]</cell><cell>0.951</cell><cell>0.863</cell><cell>0.789</cell><cell>ND</cell></row><row><cell>Falcon [2]</cell><cell>0.964</cell><cell>0.730</cell><cell>0.591</cell><cell>139/119</cell></row><row><cell>Alsayed et al. [1]</cell><cell>0.975</cell><cell>0.753</cell><cell>0.613</cell><cell>84/80</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Accuracy and derived thresholds for Anatomy and LargeBio tracks</figDesc><table><row><cell></cell><cell cols="4">Precision F-Measure Recall Derived Threshold</cell></row><row><cell>Anatomy</cell><cell>0.945</cell><cell>0.883</cell><cell>0.829</cell><cell>0.91</cell></row><row><cell>FMA-NCI</cell><cell>0.957</cell><cell>0.870</cell><cell>0.789</cell><cell>0.69</cell></row><row><cell cols="2">FMA-SNOMED 0.860</cell><cell>0.674</cell><cell>0.554</cell><cell>0.75</cell></row><row><cell cols="2">SNOMED-NCI 0.911</cell><cell>0.697</cell><cell>0.564</cell><cell>0.85</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A clustering-based approach for large-scale ontology matching</title>
		<author>
			<persName><forename type="first">Alsayed</forename><surname>Algergawy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sabine</forename><surname>Massmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Erhard</forename><surname>Rahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">East European Conference on Advances in Databases and Information Systems</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Matching large ontologies: A divide-andconquer approach</title>
		<author>
			<persName><forename type="first">Wei</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yuzhong</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gong</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Knowledge Engineering</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Seecont: A new seeding-based clustering approach for ontology matching</title>
		<author>
			<persName><forename type="first">Alsayed</forename><surname>Algergawy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Samira</forename><surname>Babalou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohammad</forename><forename type="middle">J</forename><surname>Kargar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Hashem</forename><surname>Davarpanah</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">East European Conference on Advances in Databases and Information Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Verbs semantics and lexical selection</title>
		<author>
			<persName><forename type="first">Zhibiao</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martha</forename><surname>Palmer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 32nd annual meeting on Association for Computational Linguistics</title>
				<meeting>the 32nd annual meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
