<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Direct and Indirect Linking of Lexical Ob jects for Evolving Lexical Linked Data</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Graduate School of Language and Culture, Osaka University 1-8 Machikaneyama</institution>
          ,
          <addr-line>5600043 Toyonaka</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>62</fpage>
      <lpage>67</lpage>
      <abstract>
        <p>Servicization of language resources in a Web-based environment has opened up the potential for dynamically combined virtual lexical resources. Evolving lexical linked data could be realized, provided being recovered/discovered links among lexical resources are properly organized and maintained. This position paper examines a scenario, in which lexical semantic resources are cross-linguistically enriched, and sketches how this scenario could come about while discussing necessary ingredients. The discussions naturally include how the existing lexicon modeling framework could be applied and should be extended.</p>
      </abstract>
      <kwd-group>
        <kwd>lexical linked data</kwd>
        <kwd>lexicon models</kwd>
        <kwd>multilingual lexical resources</kwd>
        <kwd>cross-lingual semantic similarity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Servicization of language resources provides the potential of a dynamic lexical
resource [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which realizes a virtual yet composite lexical resource by combining
servicized resources with a service workflow. Furthermore, it is expected that
the recovered/discovered relationships among lexical objects in existing language
resources can be organized as a secondary language resource, and hence can be
effectively reused [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This direction could harmonize with the recent trend of
Linked Data, as the derived relationships are being overplayed as links on top of
the primary lexical resources. We would call such a lexical space evolving lexical
linked data as a whole.
      </p>
      <p>This position paper argues that by opportunistically associating different
lexical resources across a language barrier, relevant portion of the lexical resources
can be gradually enriched and could be made public by standing on the Linked
Data mechanism. This paper also argues more relationships could be acquired,
when there exists a lexical semantic disparity.
The presented work concentrates on WordNet-type semantic lexicons. Their
fundamental information structures are represented by the following lexical class
objects.
{ A Lexical Entry comprises of Forms and Senses.
{ A Form can be a Lemma or a Phrase; the latter comprises of more than one</p>
      <p>Lemmas.
{ A Sense denotes a Synset.
{ A Synset is denoted by one or more Senses.</p>
      <p>{ Synsets are linked by one of the predefined Conceptual Relations.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Conceptual Framework of Evolving Lexical Linked</title>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>Below we introduce a motivating example, where an English query term gadget is
issued to search for a set of corresponding Japanese translations, each hopefully
grounded in a Japanese conceptual system. Suppose we get two translations,
under the same sense division, for gadget by using an appropriate translation
resource: t1 :”” (gajetto ), which is the transliteration of gadget, and
t2 :”” (yuuyounakiki ), which actually is a two-word phrase.
3.1</p>
      <sec id="sec-3-1">
        <title>Direct Linking of Lexical Objects</title>
        <p>“yuo”na
:2t “yuonaki”
c2
a2
a1
into [ (yuuyouna)/Adj, (kiki)/Noun], a Phrase node is introduced
to associate this two-word phrase with its constituents by the c1 and c2 links.</p>
        <p>These successive operations are invoked directly while handling the query;
we thus call them direct linking of lexical objects. Note that the ad-hoc Synset
node is yet to ground in the Japanese conceptual system at this time.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Indirect Linking of Lexical Objects</title>
        <p>While the structure around t1 has been settled in the current configuration,
that of around the ad-hoc Synset node for t2 can be further enriched, again by
seeking cross-lingual correspondences. Figure 2 summarizes the outcomes.
“yuo”na</p>
        <p>Two cross-lingual synset-to-synset links (d1 and d2 ) are first introduced by
associating a sense of ”” (kiki ) with a sense of device and a sense of ””
(yuuyouna) with a sense of useful respectively. By establishing d1, the semantic
head of the ad-hoc synset for t2 is then identified and represented by the link
e1. The same story holds for the semantic modifier of t2, and the link e2 is
introduced to represent this semantic relationship. These operations also enable
the introduction of the link f, which, in a sense, shows ”” (gajetto )
can be rephrased as ”” ( yuuyounakiki ).</p>
        <p>The evolving story so far signifies us the possibility of lexical knowledge
enrichment that takes advantage of the opportunity to interrelate lexical objects
across a language barrier. Let us remind that a semantic gap brought about by
differences in the lexicalization would provide us a further opportunity to enrich
relevant range of the existing lexical structures.</p>
        <p>We could acquire more correspondences as illustrated in Figure 3 by further
pursuing this strategy. In the figure, another ad-hoc Synset node in the
English lexical space, and two semantic links (g1 and g2 ) to label the semantic
head/modifier of the ad-hoc synset are introduced. Besides, the ad-hoc Synset
node is linked to that of gadget by the link h; this is in parallel with the link f
in the Japanese lexical space. Notice again that almost instant introduction of
these links is originated from the cross-lingual synset-to-synset matching that is
invoked for establishing the correspondences represented by d1 and d2.
“yuo”na
:1t oe”tjag“
c1
J: ivtcejda tpenoc
1e
a2
a1
d2</p>
        <p>We would call these secondary operations initiated after the direct linking as
indirect linking. The lexical objects introduced in this motivating example are
examined in more detail in the next section to sort the necessary elements to
realize the scenario.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Enabling Direct and Indirect Linking</title>
      <sec id="sec-4-1">
        <title>Modeling lexical information structure</title>
        <p>The basic lexicon model described in section 2 has to be extended in some ways.</p>
        <p>First, in the motivating example, two ad-hoc Synset nodes were introduced
to accommodate the two-word translation phrase t2, and the corresponding
virtual phrase (could be verbalized as useful device) in English. These nodes, in
their nature, may be ad-hoc and represent a kind of complex concept that may
lexicalize to a phrase rather than a single word in one language. Therefore an
instance of the ad-hoc Synset class should have an attribute to indicate the
instance is typed complex, and could have Morpho-syntactic Head/Modifier
links (like c1,c2 ) as well as Semantic Head/Modifier links (like e1,e2,g1,g2 ).</p>
        <p>
          Second, some of the introduced links should be typed differently from the
existing lexicon model. Table 1 classifies the links introduced in the motivating
example. The link type #1 is of intrinsic important in the presented
framework. As the correspondence between synsets in different languages, in a sense,
is rarely equivalent [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], it is necessary to label the relation type for each
crosslingual synset-to-synset link instance. We could develop a proper label inventory,
presumably by basing on the one developed by EuroWordNet [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], while
considering more bilingual characteristics. The link type #5, in a sense, is a variant
of the link type #1; the difference is that the correspondence is cross-lingual or
not. Therefore we can assume an upper class that subsumes these link types.
        </p>
        <p>The link type #3 represents morpho-syntactic head/modifier relationships,
whereas link type #4 represents semantic head/modifier relationships. As far as
semantic compositionality holds, these two link types exhibit a kind of parallel
structure as illustrated in the example: the semantic links (e1 and e2 ; typed
#4) were eventually introduced, corresponding to the already existing
morphosyntactic links (c1 and c2 ; typed #3).</p>
        <p>On the other hand, in cases where the semantic compositionality does not
hold, we should demur the introduction of these semantic links, even each of
the Japanese synsets could find their mates in the English lexical space. In
such a case, we have to devise an independent method to check the semantic
compositionality, or we should seek more semantic constraints to apply, probably
from the English lexical space; but this issue largely remains as a future issue.</p>
        <p>
          As for the actual modeling and representation of lexical resources, we can
rest with the existing frameworks, including the ISO standard lexical markup
framework (LMF) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], and Lemon [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Matching synsets across a language</title>
        <p>One of the most important elements is obviously a computational process for
finding a synset mate in another language. We are now studying a method to
calculate semantic similarity between synsets across a language, by simply
employing bilingual translation resources and probability distributions acquired
from a sense-tagged corpus in the target language.</p>
        <p>
          We can also apply and/or combine previously proposed methods. For
example, the method reported highly accurate [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] may be applicable with
modifications, even it computes similarity between words rather than between synsets;
the gloss-overlap-based method presented in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] would also be readily applied,
if we could translate the gloss in one language to another with a reasonable
accuracy. However even with a highly promising method at hand, any
synsetto-synset relation has to be established by choosing among computationally
proposed candidates. The underlying process thus has to incorporate human
intervention, where a collaborative operational environment plays a role.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Further issues</title>
        <p>The following issues have to be considered in implementing an effective
operating environment. First, we need to have a global mechanism to control the
indirect linking operations. As shown in the example, indirect links can be
introduced upon establishment of a direct link. However who/what should decide
to initiate the indirect linking process is unclear. Moreover, to what extent the
indirect linking should be propagated remains uncertain. Second, we are in need
of having a proper vocabulary to annotate the lexical objects that participated
in direct/indirect linking operations. For example, we would need to know when
and how a particular link was established. We thus need to have a sort of
ontology for describing linking events, which naturally includes references to the
linguistic processes that were actually applied, as well as the human approvals.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>This position paper presented a notion of evolving linked data, in which
recovered/discovered relationships among lexical objects would be published as links.
It also argued that the associated lexical resources could be enriched further, in
particular cases where a sort of lexical semantic disparity exists.</p>
      <p>Acknowledgments. The presented work was supported by KAKENHI (21520401)
provided by MEXT, Japan, and the Strategic Information and Communications
R&amp;D Promotion Programme (SCOPE) of the Ministry of Internal Affairs and
Communications of Japan.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alfonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches</article-title>
          .
          <source>In: NAACL-HLT2009</source>
          , pp.
          <volume>19</volume>
          {
          <issue>27</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          , T.:
          <article-title>Extended Gloss Overlaps as a Measure of Semantic Relatedness</article-title>
          .
          <source>In: IJCAI</source>
          <year>2003</year>
          , pp.
          <volume>805</volume>
          {
          <issue>810</issue>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.:
          <article-title>Towards Linguistically Grounded Ontologies</article-title>
          .
          <source>In: ESWC</source>
          <year>2009</year>
          , pp.
          <volume>111</volume>
          {
          <issue>125</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Calzolari</surname>
          </string-name>
          , N.:
          <article-title>Approaches towards a 'Lexical Web': the Role of Interoperability</article-title>
          .
          <source>In: ICGL</source>
          <year>2008</year>
          , pp.
          <volume>34</volume>
          {
          <issue>42</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Francopoulo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bel</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          et al.:
          <article-title>Multilingual Resources for NLP in the Lexical Markup Framework (LMF)</article-title>
          .
          <source>Language Resources and Evaluation</source>
          , Vol.
          <volume>43</volume>
          , No.
          <issue>1</issue>
          , pp.
          <volume>57</volume>
          {
          <issue>70</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hayashi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A Representation Framework for Cross-lingual/Interlingual Lexical Semantic Correspondences</article-title>
          .
          <source>In: IWCS</source>
          <year>2011</year>
          , pp.
          <volume>155</volume>
          {
          <issue>164</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hirst</surname>
          </string-name>
          , G.:
          <article-title>Ontology and the Lexicon</article-title>
          . In: Staab,
          <string-name>
            <given-names>S.</given-names>
            , and
            <surname>Studer</surname>
          </string-name>
          , R. (eds.): Handbook of Ontologies,
          <source>Second Edition</source>
          . Springer, pp.
          <volume>269</volume>
          {
          <fpage>292</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bond</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.:
          <article-title>Development of the Japanese WordNet</article-title>
          .
          <source>In: LREC</source>
          <year>2008</year>
          , pp.
          <volume>2420</volume>
          {
          <issue>2423</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>EuroWordNet: A Multilingual Database of Autonomous and LanguageSpeci c Wordnets Connected via an Inter-lingual Index</article-title>
          .
          <source>International Journal of Lexicography</source>
          , Vol.
          <volume>17</volume>
          , No.
          <issue>2</issue>
          , pp.
          <volume>161</volume>
          {
          <issue>173</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>