<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Constructing Linguistic Ontologies:</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University Of Milano Biccoca</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Cross-language ontology matching methods leverage translation based approach in mapping ontological resources (e.g., concepts) lexicalized in di erent languages. Such methods are largely dependent on the structural information encoded in the matched ontologies. The paper presents an approach for large scale cross-language linguistic ontology matching. In particular, it shows how synsets belonging to two distinct language wordnets can be mapped, by means of machine translation and frequency analysis on word senses. A preliminary experimental analysis of the problem is presented, through which we conduct experiments on existing gold-standard datasets. The results are encouraging and show the feasibility of the approach and demonstrate that the sense-selection task is a crucial step towards high quality mappings. We conclude with our observations and draw our potential future directions.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology matching</kwd>
        <kwd>linguistic ontology</kwd>
        <kwd>wordnet</kwd>
        <kwd>translation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The last decades witnessed remarkable e orts in the development of
ontologies that aim at capturing relationships between words and concepts aiming
to represent a commonsense knowledge in natural languages, and hence
making the semantic connections between words in di erent languages explicit, such
ontologies are often called wordnets or linguistic ontologies [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. One such
commonly used linguistic ontology is the English WordNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The success of WordNet motivated the construction of similarly structured
lexicons for individual and multiple languages (multi-language lexicons). These
include, among others, EuroWordNet [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], MultiWordNet [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Recently, wordnets
for many languages have been constructed under the guidelines of Global
WordNet Association. However, the manual method of constructing these ontologies
is expensive and time-consuming. Automatic construction of wordnets is another
method for building and linking wordnets. De Melo and Weikum [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] used
bilingual dictionaries to automatically provide equivalents in various languages for
the English WordNet synsets. However, translation tools might remove the
language barrier but not necessarily the socio-cultural one [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The main challenge
is to nd the appropriate word sense of the translated word [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
To enable semantic interoperability across di erent languages, ontology based
cross-language matching was explored in the last years [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. However, the
culturallinguistic barriers [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] still need to be overcome in terms of the mapping process
and techniques, as well as to formally de ne the semantic of mappings that align
concepts lexicalized across di erent natural languages[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In addition, languages
do not cover exactly the same part of the lexicon and, even where they have
common concepts, these concepts are lexicalized di erently [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. One of the problems
in creating linguistic ontologies via cross-language matching approach is that one
needs to map an unstructured or a weakly structured lexicon, to a structured
lexicon [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This introduce an extremely di cult and challenging matching
problem for many reasons: (i) the lack of structural information, which is often used
in matching techniques [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] (ii) the large mapping space (e.g., WordNet(3.0) has
117659 concepts), (iii) the quality (uncertainty) and coverage of the translation
resources. We need to assume that all sense distinctions given by the translations
are correct and available in a translation resource, and (iv) the words ambiguity,
that is, we need to select the most commonly and accepted meaning (sense) of a
word [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The di culties of this task rise due to the words polysemy (if a word
can convey more than one meaning) and synonymy (if two or more words can
convey the same meaning).
      </p>
      <p>
        The research presented here aims to contribute to the construction of large
linguistic ontologies [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The idea is to semi-automate this process by rst
matching source language concepts (synsets) to an established wordnet concepts in
a target language, and then deriving the semantic relations among the source
concepts using relations among concepts in the target wordnet. We argue that,
the resultant relations can provide an initial set of relations that can be
manually validated and corrected [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Before selecting and/or extending the more
appropriate existing cross-language mapping techniques, we need to be able to
compare alternative methods and to assess the quality of their output.
Contribution.
      </p>
      <p>
        In this paper I introduce a semi-automatic mapping framework that tries to
map synsets in di erent languages by combining translation tools and word
sense disambiguation (WSD) into a hybrid task. I de ne a mapping algorithm
for constructing linguistic ontologies, through mapping unstructured concepts
to structured one, as a maximization problem [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] that retrieve top k mappings
from a set of sorted candidate mappings.
      </p>
      <p>
        Since the creation of wordnets uses mappings among concepts expressed
(lexicalized) in di erent languages, one of the research areas more relevant to the
ontology creation problem is cross-language ontology matching (CLOM) . CLOM
techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] can play a crucial role in bootstrapping the creation of large
linguistic ontologies and, for analogous reasons, in enriching existent ontologies.
We also remark that the above considerations are general and can be reused for
di erent languages. We demonstrate this by considering benchmark datasets for
di erent pairs of languages. In particular, we discuss the results of investigating
and assessing the mapping of an unstructured set of synsets in one language to
an existent wordnet in di erent language. The proposed algorithm leverages on
translation tools and tries to map synsets lexicalized in one language (e.g.,
Arabic) to their correspondence synsets in other language (e.g., English), di erent
translating settings were considered to investigate the appropriate translation
methods for obtaining the correct translation. the algorithm ranks the
translated synsets in order to select the most appropriate senses. This is followed by
an experimental analysis and discussion. By this experiment we aim to respond
on the following questions: (i) what is the best translation tool in term of
coverage and correctness. (ii) what is the impact of the correct translation on the
sense disambiguation task. (iii) what is the impact of providing
(partial/semi)structural knowledge among the source synsets. This paper will focus on the
rst two questions while leaving the third question for more investigation in the
future work.
      </p>
      <p>The rest of the paper is structured as follows. Section 2 overviews the related
work. Section 3 illustrates the mapping algorithm. In section 4 the experiment,
the evaluation settings, and a discussion about the obtained results are given.
In section 5 we conclude and outline the future steps.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background and State of the Art</title>
      <p>
        The last decade witnessed a wide range of ontology matching methods which
have been successfully developed and evaluated in OAEI [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The majority of
the proposed techniques in these systems have mainly focused on mapping
between ontological resources that are lexicalized in the same natural language
(socalled, mono-language ontology matching, MOM). However, methods developed
for MOM systems cannot directly access semantic information when ontologies
are lexicalized in di erent natural languages [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. There is a need for a method
that automatically reconciles information when ontologies are lexicalized in
different natural languages [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Manual mapping (by experts) was used to generate and review the mappings
quality [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The mappings generated by such approaches are likely to be
accurate and reliable. However, this can be a resource consuming process specially
for maintaining large and complex ontologies. An unsupervised method was
suggested based on (non-parallel) bilingual corpora [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].This approach, as it happens
with most unsupervised learning methods, heavily relies on corpus statistics. De
Melo and Weikum [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] constructed a binary classi cation learning problem to
automatically determine the appropriate senses among the translated candidates.
To create their classi er they used several scores that take into account
structural properties as well as semantic relatedness and corpus frequency information
where used. The authors claimed that, this technique is imperfect in terms of
their quality and coverage of language-speci c phenomena [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In general, to resolve the cross-lingual issue, a translation based approach [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is
considered in order to transform the CLOM problem into a MOM one. These
systems deeply leverage the structural information derived from the mapped
ontologies. Furthermore, Spohr et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] approach, like all supervised learning
methods, requires a signi cant number of labeled training samples and
welldesigned features to achieve good performance.
Another interesting work for resolving the cross-lingual issue exploits Wikipedia,
a collaborative and multilingual resource of world and linguistic knowledge.
Hassan and Mihalcea [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] developed a cross-lingual version of Explicit Semantic
Analysis (ESA, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), CL-ESA. Similar works that used ESA in linking concepts across
di erent languages are also presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. WikiMatch, a matching
system presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], searches Wikipedia pages (articles) titles for a given term
(e.g., the ontology labels, and comments) and retrieve all language links
describing the term, making use of the inter-lingual links between Wikipedia pages.
However, such approaches are limited and highly dependent on the lexical
coverages which are provided by Wikipedia inter-lingual links.
      </p>
      <p>
        A notable approach for disambiguating and linking cross-lingual senses was
presented in BabelNet [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Since Wikipedia inter-lingual links do not exist for all
Wikipedia pages, Navigli and Ponzetto [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] proposed a context-translation
approach. They automatically translate (using Google translator) a set of English
sense-tagged sentences. After they applied the automatic translation, the most
frequent translation is detected and included as a variant for the mapped senses
in the given language. However, it is not clear if they employed any speci c NLP
techniques in this process, or if they aligned the translated words with words in
the original (English) sentence (cf, the word aligner KNOWA [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]). Moreover,
such frequency counts do not necessarily preserve the part of speech of the
translated words. They only translated Wikipedia entries whose lemmas (page titles)
do not refer to named entities 1. They contextless translated lemmas in WordNet
which are monosemous (i.e., words that have only one meaning), they simply
included the translations returned by Google Translate. They [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] reported that
the majority of the translated senses are monosemous.
      </p>
      <p>
        BabelNet [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] mappings were evaluated against manually mapped random
sample and gold-standard datasets. However, it is important to understand if the
obtained mappings were achieved through the context- or contextless-translation
approaches. More importantly, the monosemous and polysemy translated senses
were not quanti ed in their evaluation, noticing that monosemous senses form a
substantial large portion of the evaluated wordnets. It is important to measure
both the contribution and the quality of the context-translation approach. For
instance, the Italian wordnet contains about 66% monosemous senses, while in
BabelNet they covered less than 53% and 74% of the Italian WordNet's [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
senses (words) and synsets, respectively. They determined the coverage as the
percentage of gold-standard synsets that share a term (one synonym overlap)
with the correspondence synset obtained from the mapping algorithm. This does
not necessarily implies a high quality mappings. It worth noting that BabelNet
context-translation approach covers about 25% of the Arabic WordNet's [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
words (which was used as benchmark).
1 About 90% of Wikipedia pages are named entities which were included directly in
BabelNet without translation [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Mapping Algorithm</title>
      <p>Given a pair of wordnets wnL1 and wnL2 in di erent languages (L1 6= L2),
respectively called the source and target wordnet (we call it source wordnet
although it has no relations among the synsets), the mapping algorithm tries to
nd for each synset synL1 2 wnL1 an equivalent synset from the target wordnet
synL2 2 wnL2 , such that it maximizes the probability of providing an
appropriate correspondence to synL1 . In order to map the synsets, we make use of the
mapping algorithm whose pseudocode is presented in Algorithm 1. The
following steps are performed for each synset synL1 2 wnL1 :
line 3: the translation function trans lookup all the possible translations in the
target language for each word in the source synset, wiL1 2 syniL1 .
line 4: the sense function (sense(wL) = fsyn1L; ::; synnLg) lookup all the
candidate senses, candSenses, from the target wordnet for each translated word.
line 5: the rank function accepts the candSenses and returns a ranked senses
(rSenses) ordered by the most appropriate senses, this is performed by counting
the frequency of each sense in the candSenses set, the rank function also gives a
higher priority (weight) to the synsets which obtained from translating di erent
words in the source synset (i.e., synonym words give the same translation).
line 6: the select function selects the top k mappings (a set of mappings) from
the (rSenses). For a single W SD mapping task k = 1. If a tie occurs, the senses
will be selected based on the higher ratio between the number of translated
words and the number of synonym words in the candidate senses (i.e., the ratio
between the size of the mapped synsets), else it is randomly selected. As a result
of executing the algorithm, a set of ranked mappings is returned.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiment</title>
      <p>
        Wordnets. Three wordnets have been used in the experiment; the Arabic
WordNet (2.0) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], the Italian component of the MultiWordNet [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and the English
WordNet (3.0) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The wordnets, respectively, have 11214, 34728 and 117659
synsets, such that each wordnet has 15964, 40178 and 155287 words, this forms
23481, 61558 and 206941 total word senses, respectively for each wordnet.
The Arabic WordNet and the Italian WordNet were used to benchmarked the
mapping algorithm. Moreover, details on each wordnet are provided.
- The Arabic WordNet (ArWN) consists of 11214 Arabic synsets which were
constructed from 15964 vocalized Arabic words (13867 unvocalized word; i.e.,
without diacritics). The ArWN authors [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] mapped the Arabic synsets to their
equivalent synsets in the English WordNet (2.0). As a result 10413 synsets have
been manually mapped to the correspondent English WordNet (EnWN) synsets.
Out of the 10413 Arabic-English mapped synsets, 54 mappings do not have
Arabic lexicon ( Arabic concepts without lexicalization, such concepts were
considered as lexical gaps [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]), and 10 synsets have no English correspondence synsets
in EnWN (3.0) due to the part of speech mismatching. Overall, the resulted
mappings are 10349 Arabic-English equivalent synsets mappings.
- The Italian WordNet (ItWN)2 authors [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] have manually mapped the Italian
synsets to their equivalent synsets in the EnWN (1.6). Later under the Open
Multilingual Wordnet initiative the ItWN was mapped to EnWN (3.0). As a
result we have 34728 Italian-English equivalent synsets mappings including 997 (
lexical gaps), mappings do not have the correspondence Italian lexicon. Overall,
the resulted mappings were 33731 Italian-English equivalent synsets mappings.
Evaluation. The goal is to disambiguate the senses and to nd the appropriate
mappings between synsets lexicalized in di erent languages. The disambiguation
task was evaluated using evaluation measures borrowed from the information
retrieval eld [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The coverage is de ned as the percentage of mappings in the
test set for which the mapping method provides a sense mapping. The precision
of the mapping method is computed as the percentage of the correct mappings
given by the method, this re ects how good are the mappings which obtained
by the assessed mapping method. The recall is de ned as the ratio between the
number of the correct mappings provided by the method being assessed, and the
total number of mappings to be provided (number of mappings in the test set;
the benchmark dataset). The F measure is also used, the F-measure is de ned
as the weighted harmonic mean of precision and recall.
      </p>
      <p>
        In addition, a Baseline (lower bound) setting was used; the rst sense heuristic
was used to measure the lower bound of the experiment. Note that in general
most of the WSD algorithms hardly can beat these bounds [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The upper
bound (having the correct translations, Oracle translation) was also computed
in order to specify the highest expected performance of the proposed approach.
Translation Methods. In the experiment we used di erent resource for
translation. (i) Machine translation tools : Google translator was used to obtain the
English translation for all the Arabic and Italian words in the benchmark dataset.
(ii) Machine readable dictionaries : Sina dictionary was used for Arabic to English
translations, Sina dictionary is a result of the ongoing Arabic Ontology project
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the dictionary was constructed by integrating several specialized and
general domain dictionaries. An Italian to English translation dictionary should be
considered in the future work. (iii) Oracle translation: An oracle is a hypothetic
system which is always supposed to know the correct answer (i.e., the correct
translation). We used the translations provided in the benchmark wordnets as
an oracle (correct translation). An oracle translation was used to demonstrate
2 http://multiwordnet.fbk.eu/
the upper bounds. Moreover, (iv) an extended-oracle translation was obtained
for the Arabic-English translations by considering all the synonyms of the
translated word, not only the translations provided in the ArWN. For the Italian
words one can only obtain the extend-oracle translation from the ItWN dataset.
Finally,(v) all dictionaries : the translations above were combined to investigate
the di erent translation resources accuracy.
      </p>
      <p>Results. The results are reported in Figure 1, the "Experiment" column
species the translation method. The evaluation measures were reported, the reported
measures evaluate (as %) if the equivalent mappings are among the top k
mappings, k 2 [1; 100]. The lower bounds (baseline) experiments were also reported.
In Figure 2, four variants that exploit the structural information of the
target wordnet were considered to select the equivalent mappings: (1)isEquivalent
(isCorrect): the correct equivalent mappings appear among the top k candidate
synsets. (2)isHypernym: the candidate synset is a hyponym of the correct
mapping. (3)hasHypernym (or isHyponym): the hypernym of the candidate synset
is an equivalent mapping. (4)isSister : the candidate synset is a sister node of a
equivalent mapping (has the same hypernym synset). Figure 2, also shows the
upper and lower bounds, and the precision of the obtained mappings using the
Google translator for the ArWN synsets (Figure 1, experiment No.1).
Discussion. In experiment No.2, (Sina &amp; Google translation) although it has a
similar coverage to (experiment No.1) about 100 mappings more were
covereddue to the correct translations by using Sina dictionary. From experiment No.3
(Oracle translation) we can notice that the missed mappings are only the lexical
gaps synsets. From experiment No.4 (Extended oracle translation) the
important observation is that as more accurate translation we have, the better we can
rank and select the candidate senses, that is, the di erent between using the
oracle translation and adding the synonyms words is obtaining the equivalent
mappings with lower value of k.</p>
      <p>
        Experiment No.5 (All dictionaries ) highlights the fact that even we have the
correct translations; the existence of inaccurate translations (by combining Google
translations and the domain speci c translations in Sina dictionary) introduced
some noises and rise the ambiguity. This ranked the correct sense in lower
positions, and increases the needed value of k to nd the correct mappings. This
means it is important to lter and discriminate the incorrect translations at rst
and then perform the ranking and selection steps. In the Baseline experiments
all the performed experiments outperformed the lower bound settings.
Having the correct translation is an important step, still selecting the correct
senses is a crucial task; One can notice from the results that although we have
high (about 100%) coverage, still we need to consider a high value of k to
obtain these correct mappings. In fact, the goodness and the better performance
of the mapping algorithm is to provide the correct mappings while
minimizing the value of k, and this depends on the rank function. Moreover,
considering the structural information in the target wordnet (EnWN) improves the
results. For instance, in experiment No.1 (Google dictionary) considering the
neighbor nodes (e.g.,isSister ) improves the results around 10% for small k
values, k 2 [5; 20]. For that, we believe that exploiting the structural information
(e.g., graph based and the similarity based approaches [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]) is expected to help in
ranking the correct senses in higher positions. Considering the candidate synsets
hypernyms (hasHypernym plot, Figure2) is also expected to improve the results.
Error Analysis. The main reasons for missing some mappings can be divided
into the following errors:
(a) Incorrect translation, number of mappings were missed due to the incorrect
translation, this can be divided into 4 errors categories (i) compound words, (ii)
named entity, (iii) translation (word) do not exist in WordNet, and (iv) bad
translation, by bad translation we mean that Google translator only re-spell
the source word in English alphabet. For instance, regarding the ArWN-EnWN
mappings, out of the 988 missing mappings (about 10% of the mappings) 760
are compound (one) word synsets, and 33 mappings are named entity synsets.
Moreover, 799 mappings do not have an equivalent EnWN lemma, and 189
mappings were considered as bad translations.
(b) Monosemous words, the ArWN has 4197 one word synsets, and it has 4370
monosemous synsets (all words in the synsets are monosemous). 2111 synsets are
compound (one) word synsets. The majority of the compound one words (about
94%) which used in mapping the Arabic and the English synsets are
monosemous synsets. Due to this fact, about 10% of the ArWN-EnWN mappings have
only one candidate sense, and about 24% have up to 10 candidate senses.
Regarding the ItWN-EnWN mappings, about 35% of the mappings have only one
candidate sense, this high values is due to the fact that most of the ItWN synsets
(about 66%) are monosemous synsets. About 74% synsets have up to 10
candidate senses. By studying the distribution of the rank values for the Google
translation for both the ArWN and ItWN, about 85% of the candidate senses
were equally ranked with one (the sense appear one time in the candSenses),
thus the increment of k retrieves more correct equivalent mappings.
(c) Polysemous words, the ArWN has 3641 polysemous words, such that about
30% and 16% of the polysemous synsets are respectively, two and three words
synsets. The ItWN has 10366 polysemous words, such that about 64% and 20%
of the polysemous synsets are respectively, two and three words synsets. This
highlights the fact that most of the benchmark polysemous sysnets are small
size (number of words), this make it more di cult to distinguish the correct
sense. For instance, in experiment No.4, when the synonym translations were
considered (we increased the size of the translated synsets) the raking functions
performs better than experiment No.3.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions &amp; Future Works</title>
      <p>This paper investigated how synsets from di erent languages can be mapped,
especially the impact of translation tools and the selection of candidate synsets
for mappings. A cross language mapping algorithm was presented, the algorithm
tries to maximizes the probability that mappings with higher rank are
considered correct mappings by users, based on the frequency of translated synsets
and majority voting approaches. The experiments have demonstrated several
outcomes that can summarized as follow:
1. the approach was successfully tested over two di erent pairs of languages,
which demonstrates its adoptability across di erent languages.
2. using structural information encoded in the target wordnet improves the
sense selection task.
3. combining the translations of Machine Translation (MT) tools with a
bilingual dictionary translation improves the results (Figure 1, experiment No.2).
4. the proposed approach outperforms the baseline settings. The upper bounds
indicate that there is a space for more improvements in terms of obtaining
the correct translations and to better rank the candidate senses.
Moreover, features obtained from the MT tools (Google translation) such as the
translation score and the synset translations need to be explored in order to
lter the correct translations and to better rank the candidate senses. In
addition, NLP techniques (e.g., stemming, headword extraction, ..etc) are expected
to improve the MT coverage, and obtain more candidate senses (instead of using
pure translation-lookup and word-sense exact matching).</p>
      <p>
        Currently I am planning to consider other languages datasets, and to
investigate the construction of partial structural source synsets and its impact on the
mapping algorithm inspired by the work presented in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Another interesting
direction is to crowdsource the construction process by providing the workers
with the top k mappings [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], at the same time this should simulates the
majority of speakers agreement [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Acknowledgments. This work was funded by EU FP7 SIERA project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Abu Helou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarrar</surname>
          </string-name>
          , Ch. Fellbaum.
          <article-title>Towards Building Linguistic Ontology via Cross-Language Matching</article-title>
          .
          <source>GWN 2014</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Espinoza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gmez-Prez</surname>
          </string-name>
          .
          <article-title>A note on ontology localization</article-title>
          .
          <source>Applied Ontology</source>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sizov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sorg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <article-title>Explicit vs. latent concept models for cross-language information retrieval</article-title>
          ,
          <source>in: Proc. of the 21st IJCAI</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ch</surname>
          </string-name>
          . Fellbaum. Wordnet:
          <article-title>An electronic lexical database</article-title>
          . MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. G. De Melo and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Constructing and utilizing wordnets using statistical methods</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brennan</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. O'Sullivan</surname>
          </string-name>
          .
          <article-title>A con gurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>E.</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Markovitch</surname>
          </string-name>
          ,
          <article-title>Computing semantic relatedness using Wikipediabased explicit semantic analysis</article-title>
          ,
          <source>in: Proceedings of the 20th IJCAI, India</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gracia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gmez-Prez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McCrae</surname>
          </string-name>
          ,
          <article-title>Challenges for the multilingual web of data</article-title>
          ,
          <source>JWS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassan</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <article-title>Cross-lingual Relatedness using Encyclopedic Knowledge</article-title>
          , to appear
          <source>in Proc. EMNLP</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>WikiMatch - Using Wikipedia for Ontology Matching</article-title>
          .
          <source>In Proceedings OM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. G. Hirst,
          <article-title>Ontology and the Lexicon</article-title>
          ,
          <source>in Handbook on Ontologies and Information Systems</source>
          . eds. S. Staab and
          <string-name>
            <given-names>R.</given-names>
            <surname>Studer</surname>
          </string-name>
          . Heidelberg: Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. Jarrar</surname>
          </string-name>
          ,
          <string-name>
            <surname>Building A Formal Arabic</surname>
          </string-name>
          <article-title>Ontology (invited paper)</article-title>
          ,
          <source>Proc. of the Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League. Tunis, July 26-28</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>M. Jarrar</surname>
          </string-name>
          ,
          <source>Lexical Semantics and Multilingualism. Lecture Notes</source>
          , Sina Institute, Birzeit University,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>A.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sini</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mapping</surname>
            <given-names>AGROVOC</given-names>
          </string-name>
          &amp; the Chinese Agricultural Thesaurus: De nitions,
          <source>Tools Procedures. New Review of Hypermedia &amp; Multimedia</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>F.</given-names>
            <surname>Narducci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <article-title>Cross-language semantic matching for discovering links to e-gov services in the LOD cloud</article-title>
          .
          <source>Know@LOD,ESWC</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. G. Ngai,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Identifying Concepts Across Languages: A First Step towards A Corpus-based Approach to Automatic Ontology Alignment</article-title>
          .
          <source>In: Proceedings of the 19th COLING</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <source>Word Sense Disambiguation: a Survey. ACM Computing Surveys</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <string-name>
            <surname>S. Ponzetto.</surname>
          </string-name>
          <article-title>BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network</article-title>
          .
          <source>AI</source>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. E.
          <string-name>
            <surname>Pianta</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bentivogli</surname>
          </string-name>
          , C. Girardi,
          <article-title>MultiWordNet: Developing an aligned multilingual database</article-title>
          ,
          <source>in: Proceedings of the 1st International GWC</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. E.
          <string-name>
            <surname>Pianta</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bentivogli</surname>
          </string-name>
          ,
          <article-title>Knowledge Intensive Word Alignment with KNOWA</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>M. T. Pilehvar</surname>
            and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>A Robust Approach to Aligning Heterogeneous Lexical Resources</article-title>
          .
          <source>Proc. of the ACL 2014</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. H.
          <string-name>
            <surname>Rodrguez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Farwell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Farreres</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bertran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Alkhalifa</surname>
            ,
            <given-names>M. Antonia</given-names>
          </string-name>
          <string-name>
            <surname>Mart</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Elkateb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kirk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pease</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Vossen</surname>
          </string-name>
          , Ch. Fellbaum.
          <source>Arabic WordNet: Current State and Future Extensions in: Proc. of the GWC</source>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>Ontology matching: State of the art and future challenges</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>C. Sarasua</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Simperl</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          <string-name>
            <surname>Noy</surname>
          </string-name>
          <article-title>: CrowdMAP: Crowdsourcing Ontology Alignment with Microtasks</article-title>
          . In: Cudre-Mauroux,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , et al. (eds.)
          <source>ISWC</source>
          <year>2012</year>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I. LNCS</given-names>
          </string-name>
          , Springer, Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>D.</given-names>
            <surname>Spohr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>Ph. Cimiano.</surname>
          </string-name>
          <article-title>A machine learning approach to multilingual and cross-lingual ontology matching</article-title>
          .
          <source>ISWC</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Venetis</surname>
          </string-name>
          , Petros and
          <string-name>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <article-title>Hector and Huang, Kerui and Polyzotis, Neoklis. Max Algorithms in Crowdsourcing Environments</article-title>
          .
          <source>Proc. WWW</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27. P. Vossen.
          <article-title>EuroWordNet: a multilingual database of autonomous and languagespeci c wordnets connected via an Inter-Lingual-</article-title>
          <string-name>
            <surname>Index. IJL</surname>
          </string-name>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>