<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Coupling of WordNet Entries for Ontology Mapping using Virtual Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Frederik C. Schadd</string-name>
          <email>frederik.schadd@maastrichtuniversity.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nico Roos</string-name>
          <email>roos@maastrichtuniversity.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Knowledge Engineering, Maastricht University</institution>
          ,
          <addr-line>Maastricht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Facilitating information exchange is a crucial service for ontology-based knowledge systems. This can be achieved by the mapping of two heterogenous ontologies. Many mapping frameworks utilize language-based knowledge resources such as WordNet. By coupling all ontology concepts to a corresponding entry in WordNet, one can quantify the lexical relatedness of any two ontology concepts. However, coupling the correct entry is a difficult task due to the ambiguous nature of names. Coupling the wrong entries hence yields similarity values that do not correctly express the relatedness of two given concepts, resulting in a poor performance of the overall mapping framework. This paper proposes an approach for the more accurate coupling of ontology concepts with their corresponding WordNet entries. The basis of the proposed approach is the creation of separate virtual documents representing the different ontology concepts and WordNet entries and coupling these according to their document similarities. The extent of improvements using this approach are evaluated using a data set originating from the Ontology Alignment Evaluation Initiative (OAEI). Furthermore, the performance of a framework using our approach is demonstrated using the results of the OAEI 2011 competition.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Sharing and reusing knowledge is an important aspect in modern information
systems. Since multiple decades, researchers have been investigating methods that
facilitate knowledge sharing in the corporate domain, allowing for instance the integration
of external data into a company’s own knowledge system. Ontologies are at the
center of this research, allowing the explicit definition of a knowledge domain. With the
steady development of ontology languages, such as the current OWL language [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
knowledge domains can be modelled with an increasing amount of detail. Due to the
Semantic Web vision [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], information sources on the future World Wide Web will store
machine readable information, allowing autonomous agents to collect and interpret
information automatically. Just as in current knowledge systems, each information source
on the World Wide Web will store its structured content with a publicly available
ontology describing the semantics of stored information. Such ontologies are generally
developed independently, resulting in many different ontologies describing the same
domain. Thus, agents roaming the Semantic Web need to be able to integrate
knowledge of heterogenous sources into their own representation of a specific domain.
      </p>
      <p>
        Commonly, ontology mapping tools combine a variety of similarity measures using
advanced aggregation techniques. The application of an extraction technique on the
aggregated similarities can then be used to produce an alignment. The focus of this article
lies on similarities measures that utilize lexical ontologies. More specifically, we
investigate the automatic identification of corresponding entries in these ontologies through
the use of virtual documents and information retrieval techniques, such that the
semantic relatedness of any two ontology entities can be accurately specified. This article
expands on previous research [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] by applying a formal virtual document model and
evaluating the system against state-of-the-art frameworks in the OAEI 2011 campaign.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Matching heterogenous ontologies has traditionally been done either manually or using
semi-automatic tools. However, many research groups have focused their attention on
automatic mapping approaches. This has led to the development of ontology mapping
frameworks [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] which all utilize different techniques and resources. Many of these
include lexical ontologies such as WordNet in their matching procedure. Falcon-AO [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
was the first framework to successfully apply the concept of virtual documents in the
ontology mapping process. Here, virtual documents are created where each document
represents a different ontology concept, such that a similarity matrix can be computed
by applying a document similarity measure on the virtual documents.
      </p>
      <p>
        Budanitsky et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] evaluated five different measures of expressing the semantic
relatedness between WordNet concepts, which subsequently can be applied to approaches
that use different lexical ontologies. Buitelaar et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposed a linguistic model as
a labelling system, such that natural language can be generated using the ontology
concepts. Such a model would be useful for situations when the name of a concept has no
matching entry in a lexical ontology, allowing the linguistic decomposition of a name
such that an appropriate lexical entry might still be mapped.
      </p>
      <p>
        The techniques applied in this research are related to the field of Word Sense
Disambiguation, which can be approached using numerous different techniques [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The
strongest related technique is the Lesk [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] method, however key differences to the
proposed approach is that it is limited to the glossary of the concept, omitting other
information such as labels and the data of related concepts, and that it does not allow for
a weighting of the terms according to a specified document model. The Extended-Lesk
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] method does also incorporate glossaries of related concepts, however still lacks the
inclusion of non-glossary information and the weighting of terms according to their
origin within the ontology, which the proposed approach does provide.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Motivation</title>
      <p>
        Lexical ontologies are useful assets for ontology mapping systems. Established research
primarily focused on developing frameworks or theoretical models which allow
sophisticated reasoning functionalities, provided the ontology concepts are annotated, or
’coupled’, using the framework constructs. However, in order to utilize a lexical
ontology or appropriate framework it is necessary that the given ontologies actually contain
these couplings. Unfortunately, this is rarely the case, meaning that the ontology
concepts need to be coupled during the mapping procedure. This is not a straight-forward
task since words can have different meaning, such that when looking up the name of a
concept in a lexical ontology it can occur that there are multiple entries for that name.
Figure 1 indicates the extent of such situations occurring within WordNet [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], by
displaying the frequency of the amounts of possible concepts that can be coupled to a given
word, using all unique concept labels that occur in WordNet as queries.
      </p>
      <p>From Figure 1 one can see that, while there is a large collection of words that only
have one entry in WordNet, a significant proportion of the data leads to multiple entries.
This issue becomes increasingly prevalent when the concept names do not directly
occur in a lexical ontology, due to the names being composite words or technical terms.
Research is required into methods that can automatically couple ontology concepts to
entries in lexical ontologies for the situation when such couplings are not specified.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Virtual Documents</title>
      <p>
        We will provide a brief introduction to virtual documents and provide a detailed
description of the creation of a virtual document representing the meaning of a ontology
concept or WordNet entry. The general definition of a virtual document [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] is any
document for which no persistent state exists, such that some or all instances of the given
document are generated at run-time. A simple example would be creating a template
for a document and completing the document using values stored in a database.
      </p>
      <p>
        In this domain the basic data structure used for the creation of a virtual document is a
linked-data model. It consists of different types of binary relations that relate concepts
in order to create an exploitable structure, i.e. a graph. RDF [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is an example of a
linked-data model, which can be used to denote an ontology according to the OWL
specification [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The inherent data model of WordNet has similar capacities, however
it stores its data using a database. A key feature of a linked-data model is that it not only
allows the extraction of literal data for a given concept, but also enables the exploration
of concepts that are related to that particular concept, such that the information of these
neighboring concepts can then be included in the virtual document.
      </p>
      <p>
        We will provide a generalized description of the creation of a virtual document
based on established research [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The generalization has the purpose of providing a
description that is not only applicable to an OWL/RDF ontology like the description
given in Qu et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], but also to the WordNet model. To provide the functions that are
used to create a virtual document, the following terminology is used:
Synset: Basic element within WordNet, used to denote a specific meaning using a list
of synonyms. Synsets are related to other synsets by different semantic relations,
such as hyponymy and holonymy.
      </p>
      <p>Concept: A named entity in the linked-data model. A concept denotes a named class
or property given an ontology and a synset when referring to WordNet.
Link: A basic component of a linked-data model for relating elements. A link is
directed, originating from a source and pointing towards a target, such that the type
of the link indicates what relation holds between the two elements. An example of
a link is a triplet in an RDF graph.
sou(s), type(s), tar(s): The source element, type and target element of a link s,
respectively. Within the RDF model, these three elements of a link are also known as the
subject, predicate and object of a triplet.</p>
      <p>Collection of words: A list of unique words where each word has a corresponding
weight in the form of a rational number.
+: Operator denoting the merging of two collections of words.</p>
      <p>A concept definition within a linked-data model contains different types of literal
data, such as a name, different labels, annotations and comments. The RDF model
expresses some of these values using the rdfs:label, rdfs:comment relations. Concept
descriptions in WordNet have similar capacities, but the labels of a concepts are referred
to as its synonyms and the comments of a concept are linked via the glossary relation.
Definition 1. Let ! be a concept of a linked-data model, the description of ! is a
collection of words defined by (1):</p>
      <p>Des(e) =</p>
      <p>1 collection of words in the name of !
+ 2 collection of words in the labels of !
+ 3 collection of words in the comments of !
+ 4 collection of words in the annotations of !
(1)
1, 2, 3 and 4 are each rational numbers in [0; 1], such that words can be weighed
according to their origin.</p>
      <p>
        Next to accumulating information that is directly related to a specific concept, one
can also include the descriptions of neighboring concepts that are associated with that
concept via a link. Such a link can be a standard relation that is defined in the linked-data
model, for instance the specialization relation. However, it can also be a relation that is
defined specifically for this ontology, such as an object property in the OWL language.
The OWL language supports the inclusion of blank-node concepts which allow complex
logical expressions to be included in concept definitions. However, since not all
linkeddata models support the blank-node functionality, among which WordNet, these are
omitted in our generalization. For more information on how to include blank nodes in
the description, consult the work by Qu et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>To explore neighboring concepts, three neighbor operations are defined. SON (!)
denotes the set of concepts that occur in any link for which ! is the source of that link.
Likewise TYN (!) denotes the set of concepts that occur in any link for which ! is
the type of that link and TAN (!) denotes the set of concepts that occur in any link for
which ! is the target. WordNet contains inverse relations, such as hypernym being the
inverse of the hyponym relation. When faced with two relations which are the inverse
of each other, only one of the two should be used such that descriptions of neighbors
are not included twice in the virtual document. The formal definition of the neighbor
operators is given below.</p>
      <p>Definition 2. Let ! be a named concept and s be a variable representing an arbitrary
link. The set of source neighbors SON (!) is defined by (2), the set of type neighbors of
! is defined by (3) and the set of target neighbors of ! is defined by (4).</p>
      <p>SON (!) =
T Y N (!) =
T AN (!) =</p>
      <p>[
sou(s)=!</p>
      <p>[
type(s)=!</p>
      <p>[
tar(s)=!
ftype(s); tar(s)g
fsou(s); tar(s)g
fsou(s); type(s)g
Given the previous definitions, the definition of a virtual document of a specific concept
can be formulated as follows.</p>
      <p>Definition 3. Let ! be a concept of a linked-data model. The virtual document of !,
denoted as V D(!), is defined by (5):</p>
      <p>V D(!) =Des(!) + 1</p>
      <p>Des(!0)
(2)
(3)
(4)
(5)
Here, 1, 2 and 3 are rational numbers in [0; 1]. This makes it possible to allocate
a different weight to the descriptions of neighboring concepts of ! compared to the
description of the concept ! itself.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Coupling Synsets</title>
      <p>Our proposed approach aims at improving matchers applying lexical ontologies, in this
case WordNet. When applying WordNet for ontology mapping, one is presented with
the problem of identifying the correct meaning, or synset, for each entity in both
ontologies that are to be matched. The goal of our approach is to automatically identify</p>
      <p>X
!02SON(!)</p>
      <p>Des(!0) + 3
the correct synsets for each entity of an ontology using information retrieval techniques.
Given two ontologies O1 and O2 that are to be matched, O1 contains the sets of entities
Ex1 = f 1</p>
      <p>e1; e12; :::; e1mg, where x distinguishes between the set of classes, properties or
instances, O2 contains the sets of entities Ex2 = f 1
e2; e22; :::; e2ng, and C(e) denotes a
collection of synsets representing entity e. The main steps of our approach, performed
separately for classes, properties and instances, can be described as follows:
1. For every entity e in Exi, compute its corresponding set C(e) by performing the
following procedure:
(a) Assemble the set C(e) with synsets that might denote the meaning of entity e.
(b) Create a virtual document of e, and a virtual document for every synset in C(e).
(c) Calculate the document similarities between the virtual document denoting e
and the different virtual documents originating from C(e).
(d) Discard all synsets from C(e) that resulted in a low similarity score with the
virtual document of e, using some selection procedure.
2. Compute the WordNet similarity for all combinations of e1 2 Ex1 and e2 2 Ex2
using the processed collections C(e1) and C(e2).</p>
      <p>The essential operation of the approach is the exclusion of synsets from the WordNet
similarity calculation. This is determined using the document similarities between the
virtual documents originating from the synsets and the virtual document originating
from the ontology concepts. Figure 2 illustrates steps 1.b - 2 of our approach for two
arbitrary ontology entities e1 and e2: Once the similarity matrix, meaning all pairwise
similarities between the entities of both ontologies, are computed, the final alignment
of the mapping process can be extracted or the matrix can be combined with other
similarity matrices.
5.1</p>
      <sec id="sec-5-1">
        <title>Synset Selection and Virtual Document Similarity</title>
        <p>The initial step of the approach entails the allocation of synsets that might denote the
meaning of a concept. The name of the concept, meaning the fragment of its URI, and
alternate labels, when provided, are used for this purpose. While ideally one would
prefer synsets which contain an exact match of the concept name or label, precautions must
be made for the eventually that no exact match can be found. For this research, several
pre-processing methods have been applied such as the removal of special characters,
stop-word removal and tokenization. It is possible to enhance these precautions further
by for instance the application of advanced natural language techniques, however the
investigation of such techniques in this context is beyond the scope of this research.
When faced with ontologies that do not contain concept names using natural language,
for instance by using numeric identifiers instead, and containing no labels, it is unlikely
that any pre-processing technique will be able to reliably identify possible synsets, in
which case a lexical similarity is ill-suited for that particular matching problem.</p>
        <p>
          In the second step, the virtual document model as described in section 4 is applied
to each ontology concept and to each synset that has been gathered in the previous
step. The resulting virtual document are represented using the well known vector-space
model [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. In order to compute the similarities between the synset documents and the
concept documents, the established cosine-similarity is applied [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Synset Selection</title>
        <p>Once the similarities between the entity document and the different synset documents
are known, a selection method is applied in order to only couple synsets that resulted in
a high similarity value, while discarding the remaining synsets. It is possible to tackle
this problem from various angles, ranging from very lenient methods, discarding only
the very worst synsets, to strict methods, coupling only the highest scoring synsets.
Several selection methods have been investigated for this research, such that both strict
and lenient methods are tested. To test lenient selection methods, two methods using the
arithmetic (A-MEAN) and geometric mean (G-MEAN) as a threshold have been
investigated. Two other methods have been tested in order to investigate whether a more strict
approach is more suitable. The first method, annotated as M-STD, consists of
subtracting the standard deviation of the similarities from the maximum obtained similarity, and
using the resulting value as a threshold. This method has the interesting property that
it is more strict when there is a subset of documents that is significantly more similar
than the remaining documents, and more lenient when it not as easy to identify the
correct correspondences. The second investigated strict method (MAX) consists of only
coupling the synset where its corresponding virtual document resulted in the highest
similarity value.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>WordNet Distance</title>
        <p>After selecting the most appropriate synsets using the document similarities, the
similarity between two entities can now be computed using their assigned synsets. This
presents the problem of determining the similarity between two sets of synsets, where
one can assume that within each of these sets resides one synset that represents the true
meaning of its corresponding entity. Thus, if one were to compare two sets of synsets,
assuming that the originating entities are semantically related, then one can assume that
the resulting similarity between the two synsets that both represent the true meaning of
their corresponding entities, should be a high value. Inspecting all pairwise similarities
between all combinations of synsets between both sets should yield at least one high
similarity value. When comparing two sets originating from semantically unrelated
entities, one can assume that there should be no pairwise similarity of high value present.
A reasonable way of computing the similarity of two sets of synsets is to compute the
maximum similarity over all pairwise combination between the two sets.</p>
        <p>
          There exist several ways to compute the semantic similarity within WordNet [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] that
can be applied, however finding the optimal measure is beyond the scope of this paper.
Here, a similarity measure with similar properties as the Leacock-Chodorow similarity
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] has been applied. The similarity sim(s1; s2) of two synsets s1 and s2 is computed
using the distance function dist(s1; s2), which determines the distance of two synsets
inside the taxonomy, and the over depth D of the taxonomy:
sim(s1; s2) =
0
D dist(s1;s2)
        </p>
        <p>D
if dist(s1; s2)
otherwise</p>
        <p>D
(6)
This measure is similar to the Leacock-Chodorow similarity in that it relates the
taxonomic distance of two synsets to the depth of the taxonomy. In order to ensure that
the resulting similarity values fall within the interval of [0; 1] and thus can be integrated
into larger mapping systems, the log-scaling has been omitted in favor of a linear scale.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <p>
        In this section, the experiments that have been performed to test the effectiveness of our
approach will be presented. Subsection 6.1 details an evaluation on the conference data
set, originating from the Alignment Evaluation Initiative 2010 (OAEI 2010)
competition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which demonstrates to what extent our Synset coupling method can improve
a framework using WordNet. Subsections 6.2 and 6.3 compares our matcher, referred
to as MaasMatch (MM), against existing frameworks using the results from the OAEI
2011 campaign [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in which MaasMatch participated. For this research, the weighing
parameters for the virtual documents were all given the equal value of 1 such that the
vectors resemble document vectors originating from human-written documents, since a
sensitivity analysis of these parameters is beyond the scope of this article and will be
addressed in future research. The WordNet similarity matrix is combined with the
similarity matrix stemming from the Jaro [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] string similarity using the average similarity
of each pairwise combination, upon which the Naive descending extraction algorithm
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is applied to generate a temporary mapping. For the experiment in subsection 6.1
a threshold of 0.7 is used, where for the OAEI 2011 evaluation a threshold of 0.95 has
been applied. MaasMatch can be downloaded from the SEALS-platform, which can be
accessed at http://www.seals-project.eu/ .
      </p>
      <p>
        When evaluating the performance of an ontology mapping procedure, the most
common practise is to compare a generated alignment with a reference alignment of the
same data set. Measures such as precision and recall [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], can then be computed to
express the correctness and completeness of the computed alignment. Given a generated
alignment A and reference alignment R, the precision P (A; R) and recall R(A; R) of
the generated alignment A are defined as:
      </p>
      <p>P (A; R) = R \ A (7) R(A; R) = R \ A (8)</p>
      <p>A R</p>
      <p>
        Given the precision and recall of an alignment, a common measure to express the
overall quality of the alignment is the F-measure [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Given a generated alignment A
and a reference alignment R, the F-measure can be computed al follows:
F-measure =
2
      </p>
      <p>P (A; R) R(A; R)
P (A; R) + R(A; R)
(9)
The F-measure is the harmonic mean between precision and recall. Given that these
measurements require a reference alignment, they are often inconvenient for large-scale
evaluations, since reference alignments require an exceeding amount of effort to create.
The used data sets, however, do feature reference alignments, such that the performance
of a mapping approach can easily be computed and compared.
6.1</p>
      <sec id="sec-6-1">
        <title>Synset Coupling</title>
        <p>To investigate to what extent our approach improves a framework using a WordNet
similarity, we evaluated our framework using different variations of our approach on the
conference data set of the OAEI 2010 competition. This data set consists of real-world
ontologies describing the conference domain and contains a reference alignment for
each possible combination of ontologies from this data set. Figure 3 displays the results
of our approach on the conference data set. Each entry in Figure 3 denotes a different
synset selection procedure, which are arranged according to their strictness, such that
the most lenient method is located on the far left side and the most strict method is
located on the far right. Note that the most lenient method, denoted as ’none’, does not
discard any synsets based on their document similarities, resulting in the equivalent of
a conventional WordNet similarity, which can be used as a basis for comparison. From
Figure 3 we can see two notable trends. First and foremost is the observation that the
more strict the synset selection procedure is, the higher the overall performance of the
matcher is, as indicated by the F-Measure. This is solely due to a steady increase of
the precision of the alignments. Secondly, it is notable that the recall of the alignments
decreases slightly upon increasing the strictness of the selection procedure. This can be
explained by the possibility that during the selection synsets are discarded that better
denote the meaning of a given concept than its similarity value indicates.</p>
        <p>Overall, the highest performing variation of our coupling technique achieved an
fmeasure 0.44, which is an increase of 0.11 when compared to our framework without a
selective coupling method. These results indicate that our coupling technique improves
the computed WordNet similarities to such an extent that the computed alignments
exhibit a significant increase in quality, mostly with regard to their precision.
6.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>OAEI 2011: Conference Dataset</title>
        <p>Using the best performing synset selection method, as determined in subsection 6.1,
our framework has been evaluated in the OAEI 2011 competition. The results of the
evaluation on the conference data set can be seen in Figure 4. From Figure 4 one can
see that MaasMatch achieved a high precision and moderate recall over the conference
data set, resulting in the fifth-highest f-measure among the participants, which is above
average. A noteworthy aspect of this result is that this result has been achieved by only
applying lexical similarities, which are better suited at resolving naming conflicts as
opposed to other conflicts. This in turn also explains the moderate recall value, since it
would require a larger, and more importantly a more varied set of similarity values, to
deal with the remaining types of heterogeneities as well. Hence, it is encouraging to see
these good results when taking into account the moderate complexity of the framework.
6.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>OAEI 2011: Benchmark Dataset</title>
        <p>The benchmark data set is a synthetic data set, where a reference ontology is matched
with many systematic variations of itself. These variations include many aspects, such
as introducing errors or randomizing names, omitting certain types of information or
altering the structure of the ontology. Since a base ontology is compared to variations
of itself, this data set does not contain a large quantity of naming conflicts, which our
approach is targeted at. However, it is interesting to see how our framework performs
when faced with every kind of heterogeneity. Figure 5 displays the results of the OAEI
2011 evaluation on the benchmark data set.</p>
        <p>From Figure 5 we can see that the overall performance MaasMatch resulted in a
high precision score and relatively low recall score when compared to the competitors.
The low recall score can be explained by the fact that the WordNet similarity of our
approach relies on collecting synsets using information stored in the names of the
ontology concepts. The data set regularly contains ontologies with altered or scrambled
names, making it extremely difficult to couple synsets that might denote the meaning of
an entity. These alterations also have a negative impact on the quality of the constructed
virtual documents, especially if names or annotations are scrambled or completely left
out, resulting in MaasMatch performing poorly in benchmarks that contain such
alterations. Despite these drawbacks, it was possible to achieve results similar to established
matchers that address all types of heterogeneities. Given these results, the performance
can be improved if measures are added which tackle other types of heterogeneities,
especially if such measures increase the recall without impacting the precision.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper, we proposed a method to improve the coupling of ontology concepts with
their corresponding WordNet entries. The experiment on the OAEI 2010 data set shows
that our approach increases the quality of the computed alignments, mainly with regards
to their precision. Furthermore, it is established that strict coupling methods produce
better results than lenient coupling methods. The result of the OAEI 2011 evaluation
show that a framework using the proposed technique can compete with established
frameworks, especially with regards to the conference data set. However, the results of
the benchmark data set indicate a reliance on the presence of adequate concept names
and descriptions. Future research can be performed on improving the robustness of our
approach when given distorted names and descriptions.</p>
      <p>The recall of the alignments slightly decreases if our approach is applied, indicating
that occasionally the correct meaning of an entity is not established. A possible solution
would be the improvement of the representative strength of the virtual documents. This
can be achieved by refining the current virtual document model, such that for instance
descriptions from different OWL types of relations receive different weights.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          .
          <article-title>Extended gloss overlaps as a measure of semantic relatedness</article-title>
          .
          <source>In Proceedings of the 18th international joint conference on Artificial intelligence, IJCAI'03</source>
          , pages
          <fpage>805</fpage>
          -
          <lpage>810</lpage>
          , San Francisco, CA, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          , and
          <string-name>
            <surname>O. Lassila.</surname>
          </string-name>
          <article-title>The semantic web</article-title>
          .
          <source>Scientific American</source>
          ,
          <volume>284</volume>
          (
          <issue>5</issue>
          ):
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Budanitsky</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hirst</surname>
          </string-name>
          .
          <article-title>Semantic distance in wordnet: An experimental, applicationoriented evaluation of five measures</article-title>
          . In Workshop on WordNet and
          <article-title>other lexical resources, second meeting of the NAACL</article-title>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimianop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sintek</surname>
          </string-name>
          .
          <article-title>Towards linguistically grounded ontologies</article-title>
          .
          <source>In The Semantic Web: Research and Applications</source>
          , volume
          <volume>5554</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>111</fpage>
          -
          <lpage>125</lpage>
          . Springer Berlin / Heidelberg,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meilicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scharffe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Svab-Zamazal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Svatek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Trojahn</surname>
          </string-name>
          .
          <article-title>First results of the ontology alignment evaluation initiative 2010</article-title>
          .
          <source>In Proceedings of ISWC Workshop on OM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , R.W. van Hague,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meilicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scharffe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Svab-Zamazal</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Trojahn dos Santos. Results of the ontology alignment evaluation initiative 2011</article-title>
          .
          <source>In Proc. 6th ISWC workshop on ontology matching (OM)</source>
          ,
          <string-name>
            <surname>Bonn (DE).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Avesani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          .
          <article-title>A large dataset for the evaluation of ontology matching</article-title>
          .
          <source>Knowl. Eng. Rev.</source>
          ,
          <volume>24</volume>
          :
          <fpage>137</fpage>
          -
          <lpage>157</lpage>
          ,
          <year>June 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jaro</surname>
          </string-name>
          .
          <article-title>Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida</article-title>
          .
          <source>J. of the American Statistical Association</source>
          ,
          <volume>84</volume>
          (
          <issue>406</issue>
          ):pp.
          <fpage>414</fpage>
          -
          <lpage>420</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Swick</surname>
          </string-name>
          , and W3C.
          <article-title>Resource description framework (rdf) model</article-title>
          and syntax specification,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lesk</surname>
          </string-name>
          .
          <article-title>Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone</article-title>
          .
          <source>In Proceedings of the 5th annual international conference on Systems documentation, SIGDOC '86</source>
          , pages
          <fpage>24</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          and
          <string-name>
            <surname>F. van Harmelen. OWL</surname>
          </string-name>
          <article-title>web ontology language overview</article-title>
          .
          <source>W3C recommendation, W3C</source>
          ,
          <year>February 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Meilicke</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          .
          <article-title>Analyzing mapping extraction approaches</article-title>
          .
          <source>The Second International Workshop on Ontology Matching</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          :
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>November 1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>Word sense disambiguation: A survey</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <volume>10</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          :
          <fpage>69</fpage>
          ,
          <year>February 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          , and G. Cheng.
          <article-title>Constructing virtual documents for ontology matching</article-title>
          .
          <source>In Proceedings of the 15th international conference on World Wide Web, WWW '06</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>31</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.S.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>A vector space model for automatic indexing</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>18</volume>
          :
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          ,
          <year>November 1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Schadd</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Roos</surname>
          </string-name>
          .
          <article-title>Improving ontology matchers utilizing linguistic ontologies: an information retrieval approach</article-title>
          .
          <source>In Proceedings of the BNAIC</source>
          <year>2011</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>A survey of schema-based matching approaches</article-title>
          .
          <source>In Journal on Data Semantics IV</source>
          , volume
          <volume>3730</volume>
          , pages
          <fpage>146</fpage>
          -
          <lpage>171</lpage>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.-N.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Steinbach</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          . Introduction to Data Mining.
          <source>Addison Wesley</source>
          ,
          <volume>1</volume>
          <fpage>edition</fpage>
          , May
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Watters</surname>
          </string-name>
          .
          <article-title>Information retrieval and the virtual document</article-title>
          .
          <source>J. Am. Soc. Inf. Sci.</source>
          ,
          <volume>50</volume>
          :
          <fpage>1028</fpage>
          -
          <lpage>1029</lpage>
          ,
          <year>September 1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>