<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using Word Semantics on Entity Names for Correspondence Set Generation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rafael</forename><surname>Vieira</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Federal University of the State of Rio de Janeiro (UNIRIO)</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Kate</forename><surname>Revoredo</surname></persName>
							<email>katerevoredo@uniriotec.br</email>
							<affiliation key="aff0">
								<orgName type="institution">Federal University of the State of Rio de Janeiro (UNIRIO)</orgName>
								<address>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using Word Semantics on Entity Names for Correspondence Set Generation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E60FFA48116825587293A887CBC1E2D3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>On ontology Matching, many works make use of word semantics to align the ontologies. One commonly used resource is WordNet <ref type="bibr" target="#b3">[4]</ref> <ref type="bibr" target="#b4">[5]</ref>, which groups words that share the same meaning together. Thesaurus and lexicons like WordNet indeed provide rich semantic information but require large amounts of human effort to be created and maintained.</p><p>Vector space representations of word semantics are a family of language models that associate words with vectors in a semantic space, where each dimension represents a component of the meaning of words <ref type="bibr" target="#b1">[2]</ref>[1] <ref type="bibr" target="#b2">[3]</ref>. The semantic similarity of words is exploited by these methods, providing vectors close in space when their related words are close in meaning. These vectors are usually calculated by a learning algorithm on large corpora like Wikipedia and then used to evaluate the similarity between two words.</p><p>In this work, we exploit the word-word similarities in the GloVe model as external resources for Ontology Matching. The hypothesis is that two entities can be matched based on the words in their names using the word-word similarity provided by the model. We built a prototype and evaluated its performance against the baselines from OAEI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Prototype</head><p>To build the simplest prototype, we used pre-trained vectors<ref type="foot" target="#foot_0">1</ref> from GloVe and two ontologies O 1 and O 2 . Then, each entity e defined in O 1 or O 2 is associated with one vector #» v e = (a 1 , . . . , a n ), based on its name, where each component a i represents the semantic dimension of words that have related meaning. In case entity e has a compound name, we average the vectors of each word in its name, and set the resulting vector as #» v e . To generate a correspondence between two entities e 1 and e 2 , from O 1 and O 2 respectively, we calculate the cosine similarity on vectors #» v 1 and #» v 2 , associated with e 1 and e 2 , respectively. If the value of cosine similarity is above a lower bound, we continue with this correspondence, otherwise, it is discarded. This lower bound was empirically set to 0.7 as this value showed the better results.</p><p>After doing this procedure for all entity pairs, we have the complete alignment. Finally, we compare this alignment with the baseline alignments edna(edit distance based) and StringEquiv(string equivalence based) from OAEI 2016 on the conference and benchmark data sets. The results are presented in table <ref type="table" target="#tab_0">1</ref> The prototype obtained low recall on both data sets. The majority of errors on the benchmark data set were on tests with random entity names, resulting in the low recall. This is expected since our method uses only this source of information to gather the entity semantics and then generate correspondences.</p><p>On the conference data set, the prototype performed between the two baselines. Many words from entity names were not in the vocabulary of the vectors, and were assigned the vector #» 0 , which contributes to the average recall.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusion</head><p>These results are not ground-breaking, but also promising. Furthermore, given the simplicity of the prototype, there are many places where it can be improved. For example, in a future experiment, we should train our own vectors and fine tune the hyperparameters of the model. We believe that these improvements may provide increased performance and lead to further research in the area.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>. Comparison between the prototype and baselines of each data set</figDesc><table><row><cell>Dataset (method)</cell><cell cols="3">Precision Recall F1-measure</cell></row><row><cell>Conference (edna)</cell><cell>0.74</cell><cell>0.45</cell><cell>0.56</cell></row><row><cell>Conference (StringEquiv)</cell><cell>0.76</cell><cell>0.41</cell><cell>0.53</cell></row><row><cell>Conference (Prototype)</cell><cell>0.71</cell><cell>0.45</cell><cell>0.54</cell></row><row><cell>Benchmark (edna)</cell><cell>0.35</cell><cell>0.51</cell><cell>0.41</cell></row><row><cell>Benchmark (Prototype)</cell><cell>0.72</cell><cell>0.26</cell><cell>0.34</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Obtained at http://nlp.stanford.edu/data/glove.6B.zip</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">GloVe: Global Vectors for Word Representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Empirical Methods in Natural Language Processing (EMNLP)</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Efficient Estimation of Word Representations in Vector Space Computing Research Repository</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno>abs-</idno>
	</analytic>
	<monogr>
		<title level="j">CoRR)</title>
		<imprint>
			<biblScope unit="page" from="1301" to="3781" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Wikipedia-based Semantic Interpretation for Natural Language Processing</title>
		<author>
			<persName><forename type="first">E</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Markovitch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Artif. Intell. Res</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="443" to="498" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Hybrid Approach for Measuring Semantic Similarity between Ontologies Based on WordNet Knowledge Science</title>
		<author>
			<persName><forename type="first">W</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Engineering and Management -5th International Conference</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="68" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Survey of Exploiting WordNet in Ontology Matching</title>
		<author>
			<persName><forename type="first">F</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sandkuhl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence in Theory and Practice II</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="341" to="350" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
