<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-lingual Linking on the Multilingual Web of Data (position statement)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jorge Gracia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Montiel-Ponsoda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asuncion Gomez-Perez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology Engineering Group, Universidad Politecnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recently, the Semantic Web has experienced signi cant advancements in standards and techniques, as well as in the amount of semantic information available online. Even so, mechanisms are still needed to automatically reconcile semantic information when it is expressed in di erent natural languages, so that access to Web information across language barriers can be improved. That requires developing techniques for discovering and representing cross-lingual links on the Web of Data. In this paper we explore the di erent dimensions of such a problem and re ect on possible avenues of research on that topic.</p>
      </abstract>
      <kwd-group>
        <kwd>multilingualism</kwd>
        <kwd>ontology matching</kwd>
        <kwd>multilingual linked data</kwd>
        <kwd>multilingual mappings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        The large and growing amount of semantic data available on the Web, mainly
in the form of Linked Data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], online ontologies, and annotated Web pages, has
resulted in the emergence of the so-called Web of Data. This fact has been
accompanied by signi cant advancements in standards and techniques, contributing to
the realization of the Semantic Web vision [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Some issues, however, need to be
solved before a fully realised Semantic Web can be achieved, as for instance,
language barriers, amongst others. In this sense, mechanisms are still needed to
automatically reconcile semantic data (ontologies and data underlying ontologies)
when they are expressed in di erent natural languages on the Web, in order to
enable access to semantic information across language barriers. To this respect,
several challenges arise [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], speci cally: (i) ontology translation/localization, (ii)
cross-lingual ontology linking, (iii) representation of multilingual lexical
information, and (iv) cross-lingual access and querying of linked data.
      </p>
      <p>In this paper we focus on the second challenge, namely, the need of
establishing, representing, and storing cross-lingual links among semantic information
on the Web. In fact, in the multilingual Web of Data that we envision, semantic
data with lexical representations in one natural language would be mapped to
equivalent or related information in other languages, thus making navigation
across multilingual information possible for software agents. In the following we
will refer to \cross-lingual ontology linking" in a broad sense, including (semi-)
automatic ontology and instance matching methods and techniques applied to
the linking of semantic data documented in several natural languages.</p>
      <p>
        The problem of cross-lingual linking is a fundamental one, since more and
more legacy data sources available in di erent natural languages are being
transformed into linked data, and have to be linked to be exploited at its full potential.
In fact, the establishment of links between or among multilingual data sources
would also contribute to the localisation issue, since it would transform
monolingual, isolated, data resources into \multilingual resources" just thanks to the
links. However, the linking of resources documented in di erent languages is
not so immediate. Several issues that arise in the localization of semantic web
resources [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] would be also involved in the liking task, namely, a)
conceptualization mismatches due to language and cultural discrepancies; b) conceptualization
mismatches due to the perspectives from which the same domain is approached;
or even c) di erent levels of granularity in the conceptualization.
      </p>
      <p>The main purpose of this position paper is to give an insight into the problem
of cross-lingual linking on the Web of Data and identify some research topics that
will allow us to advance towards a truly multilingual Web of Data. In the rest
of the paper (Section 2) we refer to the di erent knowledge representation levels
in which cross-lingual links can be established. Then, we explore the problem
and identify possible research lines grouped in three aspects: cross-lingual link
discovery, representation, and reuse. Finally, the main conclusions of the paper
are summed up in Section 3.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dimensions of the problem and research lines</title>
      <p>Cross-lingual links between ontologies and data sources can be established at
di erent knowledge representation levels:
1. Conceptual level: links between ontology entities at the schema level.
2. Instance level: links between data underlying ontologies.
3. Linguistic level: links between lexical representations associated with
ontology concepts and/or instances.</p>
      <p>The last one is particularly important if certain lexical relations have to be
represented across ontologies (e.g., translations or term variations). Each of these
levels will require its own link discovery/representation methods and techniques.</p>
      <p>In the following we propose some enhancements of available methods and
techniques and suggest new avenues of research that could help overcome the
problem.
2.1</p>
      <sec id="sec-2-1">
        <title>Cross-lingual Link Discovery</title>
        <p>
          Current ontology matching techniques have to be extended with multilingual
capabilities, and novel techniques need to be investigated as well. Cross-lingual
links can be discovered by means of some of these techniques:
1. Projecting the lexical content of the mapped ontologies into a common
language (either one of the languages of the aligned ontologies or a pivot
language) e.g., using machine translation.
2. Comparing the ontology entities directly by means of cross-lingual semantic
measures, that is, measures capable of evaluating similarity or relatedness
between (ontology) entities documented in di erent natural languages (e.g.,
cross-lingual explicit semantic analysis [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]).
        </p>
        <p>Both avenues have to be further explored, compared, and possibly combined.
There are a number of early cross-lingual ontology alignment tools that already
implement the rst technique1, while the second one remains unexplored yet.
Notice that such preliminary systems are intended to discover cross-lingual links
at the conceptual level and that cross-lingual alignment systems operating at
the instance and linguistic levels are still to come.</p>
        <p>
          An alternate way to discover cross-lingual links is by using the Web of Data
as a source of background knowledge. The idea is to infer links from other links
already existent among online ontology entities (that are similar to the entities
I intend to link). Such an approach was explored in a monolingual context by
the Scarlet system [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and could be extrapolated to a multilingual landscape.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Cross-lingual Link Representation</title>
        <p>In principle, existing constructs of ontology languages can be utilised for
representing cross-lingual mappings at the conceptual and instance levels (e.g.,
owl:sameAs or owl:equivalentClass), whenever the two concepts or instances can
be considered cross-lingual equivalents.</p>
        <p>Other commonly used vocabularies (e.g. rdfs:subclassOf, skos:narrower or
skos:broarder) could also be re-used in case of granularity discrepancies, i.e.,
when one conceptualization regards a certain concept with a granularity level
di erent from the other conceptualization. In this case, we would suggest an
adaptation or enhancement of such relations for a multilingual scenario, so that
ner language distinctions are captured.</p>
        <p>In the case no equivalence exists (the one language does not conceptualize
a certain phenomenon of the world, whereas the other has a concept for it), we
could still provide a lexical description for the \inexistent concept" in the target
language, provide a link to its closest concept, and signalize it as a speci c
crosslingual case. We believe this kind of links should also be accounted for in the
Web of Data.</p>
        <p>
          Regarding cross-lingual mappings at the linguistic level, mappings could be
established between the natural language descriptions of their concepts. At this
level, lexical-semantic relations could be used (hypernym-hyponym, synonym,
antonym, translation, etc.). In the simplest case in a cross-lingual scenario, a
property labelled \translation" or \cultural equivalent" (for instance) might be
established between the lexical realizations of the concepts [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Novel ontology
lexica representation models [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] have to be explored for this task.
        </p>
        <p>We argue that speci c representation models have to be able to de ne
speci c relations between natural language descriptions in di erent languages, what</p>
        <sec id="sec-2-2-1">
          <title>1 See for instance the systems that participated</title>
          <p>http://oaei.ontologymatching.org/2011.5/multifarm/index.html
in</p>
          <p>
            OAEI2011.5
we term translation relations or cross-lingual relations. Highly related with this
issue is the representation of term variation at a monolingual or multilingual
level. A term variant has been de ned as \an utterance which is semantically
and conceptually related to an original term" [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. To put it in simple words, we
could de ne them as synonymous terms that refer to the same concept but that
highlight a di erent aspect. We believe that the accounting for and
representing term variants would also contribute to the automatic linking of the lexical
descriptions associated to concepts (within or across languages).
          </p>
          <p>Further, to facilitate processing and interchange of alignments, speci c
formats has been proposed in the literature such as the Alignment Format 2 or
the EDOAL language 3. They should be explored and, if needed, extended to
accommodate the representation of cross-lingual and multilingual alignments.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Cross-lingual Link Storage and Reuse</title>
        <p>Cross-lingual links can be discovered runtime/o ine. However, owing to the
growing size and dynamic nature of the Web, it is unrealistic to conceive a
Semantic Web in which all possible cross-lingual links are established beforehand.
Thus, scalable techniques to dynamically discover cross-lingual links on demand
of semantic applications have to be investigated. Although the scalability
requirement is not inherent to the multilingual dimension in ontology matching,
multilingualism exacerbates the problem due to the introduction of a higher
heterogeneity degree and the possible explosion of compared language pairs.</p>
        <p>On the other hand, one can imagine some application scenarios (in restricted
domains for a restricted number of languages) in which computation and storage
of links for later reuse is a viable option. In that case, suitable ways of storing
and representing cross-lingual links become crucial. Also links computed runtime
could be stored and made available online, thus con guring a sort of pool of
crosslingual links that grows with time. Such online links should follow the Linked
Data principles to favour their later access and reuse by other applications.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>In this paper we have motivated the study of cross-lingual ontology links as one
of the fundamental challenges to solve in order to attain the goals of a truly
multilingual Web of Data. There are, in particular, three subproblems to treat,
namely cross-lingual link discovery, representation, and reuse. We have given an
overview of the characteristics of each of them, as well as identi ed some relevant
research topics that have to be further explored to be part of the solution. For
instance, representation of cross-lingual links at the linguistic level, as well as the
study of cross-lingual semantic measures and cross-lingual ontology alignment
techniques. In our view such topics require more atention by the community and</p>
      <sec id="sec-3-1">
        <title>2 http://alignapi.gforge.inria.fr/format.html 3 http://alignapi.gforge.inria.fr/edoal.html</title>
        <p>will be crucial to enable the multilingual capabilities on the Web of Data.
Acknowledgments. This work is supported by the EU project Monnet
(FP7248458), the Spanish national project BabeLData (TIN2010-17550), and the
Spanish Ministry of Economy and Competitiveness within the Juan de la Cierva
program.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          , and
          <string-name>
            <surname>O. Lassila.</surname>
          </string-name>
          <article-title>The semantic web</article-title>
          .
          <source>Scienti c American</source>
          ,
          <volume>284</volume>
          (
          <issue>5</issue>
          ):
          <volume>34</volume>
          {
          <fpage>43</fpage>
          , May
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked data - the story so far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <string-name>
            <surname>MarMar</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>B.</given-names>
            <surname>Daille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Habert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jacquemin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Royaute</surname>
          </string-name>
          .
          <article-title>Empirical observation of term variations and principles for their description</article-title>
          .
          <source>Terminology</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <volume>197</volume>
          {
          <fpage>258</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Espinoza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gmez-Prez</surname>
          </string-name>
          .
          <article-title>Ontology Localization</article-title>
          .
          <source>In Proceedings of the 5th International Conference on Knowledge Capture (KCAP09)</source>
          , pages
          <fpage>33</fpage>
          {
          <fpage>40</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Gracia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>McCrae</surname>
          </string-name>
          .
          <article-title>Challenges for the multilingual web of data</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>11</volume>
          :
          <fpage>63</fpage>
          {
          <fpage>71</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>J. McCrae</surname>
            , G. A. de Cea,
            <given-names>P.</given-names>
            Buitelaar, P.
          </string-name>
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gracia</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Montiel-Ponsoda</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spohr</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Wunner</surname>
          </string-name>
          .
          <article-title>Interchanging lexical resources on the semantic web</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>46</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>E.</given-names>
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gracia</surname>
          </string-name>
          , G. A. de Cea,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gomez-Perez</surname>
          </string-name>
          .
          <article-title>Representing translations on the semantic web</article-title>
          .
          <source>In Proc. of 2nd Workshop on the Multilingual Semantic Web</source>
          , at ISWC'
          <volume>11</volume>
          ,
          <string-name>
            <surname>Bonn</surname>
          </string-name>
          , Germany,
          <source>ISSN 1613-0073</source>
          , volume
          <volume>775</volume>
          , pages
          <fpage>25</fpage>
          {
          <fpage>37</fpage>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          , Oct.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabou</surname>
          </string-name>
          , M. d'Aquin,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Motta</surname>
          </string-name>
          .
          <article-title>Exploring the semantic web as background knowledge for ontology matching</article-title>
          .
          <source>J. Data Semantics</source>
          ,
          <volume>11</volume>
          :
          <fpage>156</fpage>
          {
          <fpage>190</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>P.</given-names>
            <surname>Sorg</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Exploiting wikipedia for cross-lingual and multilingual information retrieval</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>74</volume>
          :
          <fpage>26</fpage>
          {
          <fpage>45</fpage>
          ,
          <string-name>
            <surname>Apr</surname>
          </string-name>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>