<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>English-Russian WordNet for Multilingual Mappings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergey Yablonsky</string-name>
          <email>serge_yablonsky@hotmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>St. Petersburg State University</institution>
          ,
          <addr-line>Volkhovsky Per. 3, St. Petersburg, 199004</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper reports about the current results of the development of the English-Russian WordNet. It describes usage of English-Russian lexical language resources and software to process English-Russian WordNet and design of a XML/RDF/OWL-markup of the English-Russian WordNet resources. Relevant aspects of the DTD/XML/RDF/OWL formats and related technologies are surveyed.</p>
      </abstract>
      <kwd-group>
        <kwd>WordNet</kwd>
        <kwd>English-Russian WordNet</kwd>
        <kwd>Grid</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>RDF</kwd>
        <kwd>OWL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The Semantic Web, a Web with the meaning, is often associated with specific
XMLbased standards for semantics, such as RDF and OWL [http://www.w3.org/RDF/,
http://www.w3.org/TR/owl-features/]. If HTML and the Web made all the online
documents look like one huge book, RDF, schema, and inference languages will make
all the data in the world look like one huge database. One of the key promises of the
Semantic Web is that it will provide the necessary infrastructure for enabling services
and applications on the Web to automatically aggregate and integrate information into
a sum which is greater than the individual parts. So the Semantic Web should enable
users to locate, select, employ, compose, and monitor Web-based services
automatically. To make use of a Web service a software agent needs a
computerinterpretable description of the service, and the means by which it is accessed. An
important goal for Semantic Web markup languages is to establish a framework
within which these descriptions are made and shared. Web sites should be able to
employ a standard ontology, consisting of a set of basic classes and properties, for
declaring and describing services, while the cross-lingual ontology structuring
mechanisms of OWL provide an appropriate, Web-compatible representation
language framework within which to do this.</p>
      <p>
        Web-compatible representation language framework today usially is based on
lexical ontologies. Wordnets are cross-lingual lexical ontologies, including
information on hypernyms, synonyms, polysemous terms, relations between terms,
and sometimes multilingual equivalents. Wordnets are valuable resources as sources
of ontological distinctions. WordNets provide a conceptual framework for
multilingual mappings in ontologies. Linking concepts across many cross-lingual
lexicons belonging to the WordNet-family started by using the Interlingual Index
(ILI) [
        <xref ref-type="bibr" rid="ref3">2</xref>
        ]. Unfortunately, no version of the ILI can be considered a standard and often
the various lexicons exploit different version of WordNet as ILI.
      </p>
      <p>At the 3rd GWA Conference in Korea there was launched the idea to start building
a WordNet grid around a Common Base Concepts expressed in terms of WordNet
synsets and SUMO definitions (http://www.globalwordnet.org/gwa/gwa_grid.htm).
This first version of the Grid was planned to be build around the set of 4689 Common
Base Concepts. Since then only three languages with essentially various number of
synsets and different WordNet versions were placed in the Grid mappings (English –
4689 synsets with WN 2.0 mapping, Spanish – 15556 synsets with WN1.6 mapping
and Catalan - 12942 synsets with WN1.6 mapping). But there is yet no official format
for the Global WordNet Grid. So far there are just only 3 files in the specified format.</p>
      <p>
        This paper reports about the current results of the English-Russian WordNet
development [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">2, 3, 4</xref>
        ]. It describes usage of Russian and English-Russian lexical
language resources and software to process English-Russian WordNet and
EnglishRussian WordNet Grid (4600 synsets with WN 3.0 mapping) and design of a
XML/RDF/OWL-markup of the English-Russian WordNet resources. Relevant
aspects of the DTD/XML/RDF/OWL formats and related technologies are surveyed.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Lexical Resources</title>
      <sec id="sec-2-1">
        <title>Lexical Resources for English-Russian WordNet</title>
        <p>On December 2003 our research group got license from OUP to explore and
exploit for research purposes such language resources:
– Oxford Russian Dictionary;
– New Oxford Dictionary of English, 2nd Edition;
– New Oxford Thesaurus of English.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Language Software</title>
        <p>For many linguistic tasks of WordNet development we use language processor
Russicon that includes such main blocks:

</p>
      </sec>
      <sec id="sec-2-3">
        <title>System for construction and support of machine dictionaries</title>
        <p>System allows receiving morphological information of the word and to build
normal form of the word, shows paradigm for the word, constructs new words
lexicon, constructs frequency lexicon (Fig.1).</p>
      </sec>
      <sec id="sec-2-4">
        <title>Morphological analyzer and normalyzer</title>
        <p>
          The theoretical foundation of the morphological analyzer and normalyzer
program is a language-independent model of morphological analysis [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">6-8</xref>
          ].
Morphological analyzer and normalyzer allows a) defining the following
grammatical characteristic s of a word: part of speech, case, gender, number,
tense, person, degree of comparison, voice, aspect, mood, form, type,
transitiveness, reflexive, animation, b) modifying a given word to its normal
grammatical form/s – lemma/s.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>WordNet Editor</title>
        <p>WordNet editor TenDrow was developed to help join production of Russian
WordNet from above mentioned linguistic resources. It allows to join sysnsets
from Thesaurus, explanatory and other dictionaries; proceed relations between
synsets and words of synsets. WordNet editor is not only viewer but also a real
tool for constructing and editing multiligual/monolingual WordNet. It is a
database management system in which users (linguist or knowledge engineer)
can create, edit and look at the English and Russian (Fig. 2).</p>
        <p>III
III</p>
        <p>I - main form
II - word search panel</p>
        <p>III - synset search panel</p>
        <p>IV - synset editor</p>
        <p>Fig. 2. TenDrow
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>English-Russian WordNet translation</title>
      <p>Usually there were several standard variants (Fig.3, a,b,c,d) of the English-Russian
WordNet and English-Russian WordNet Grid translation equivalents.</p>
      <p>The simplest is the a variant. Approximately 24000 English-Russian synsets could
be translated in such way. The hardest is d variant because such kind of translation
destroys normal mapping and forms additional sub mappings. More then 15000
English synsets have no right word to word translation to Russian.</p>
    </sec>
    <sec id="sec-4">
      <title>English-Russian WordNet [Grid] construction</title>
      <p>The porting of the English-Russian WordNet was done into XML using the DTD
for the XML structure from http://www.globalwordnet.org/gwa/gwa_grid.htm and the
DTD from the Arabic Wordnet: http://www.globalwordnet.org/AWN/DataSpec.html.
We could use it just the same for English and Russian languages.</p>
      <p>
        The English-Russian DTD and XML format for the English-Russian WordNet and
English-Russian WordNet Grid is shown on Fig.4,5. The WordNet Task Force [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]
developed a new approach in WordNet RDF conversion. The W3C WordNet project
is still in the process of being completed, at the level of schema and data
(http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html). It was used
for porting of the English-Russian WordNet and English-Russian WordNet Grid into
RDF and OWL.
      </p>
      <p>But still there are open issues how to support different versions of WordNet in
XML/RDF/OWL and how to define the relationship between them and how to
integrate WordNet with sources in other languages.</p>
    </sec>
    <sec id="sec-5">
      <title>Framework architecture for English-Russian</title>
    </sec>
    <sec id="sec-6">
      <title>Improvement</title>
    </sec>
    <sec id="sec-7">
      <title>WordNet Grid</title>
      <p>XMLSpy 2007 and Oracle 11g were used for managing WordNet Semantic web
models that provided important XML/RDF/OWL support for data modeling and
editing of XML/RDF/OWL WordNet models. RDF specification defines the syntax
and semantics of the SPARQL query language for RDF. SPARQL can be used to
express queries across diverse data sources, whether the data is stored natively as
RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying
required and optional graph patterns along with their conjunctions and disjunctions.
SPARQL also supports extensible value testing and constraining queries by source
RDF graph. The results of SPARQL queries can be results sets or RDF graphs
(http://www.w3.org/TR/rdf-sparql-query/).</p>
      <p>Example. The following queries for all Synsets that contain a Word with the lexical
form "bank" (http://www.w3.org/TR/wordnet-rdf/):</p>
      <sec id="sec-7-1">
        <title>PREFIX wn20schema: &lt;http://www.w3.org/2006/03/wn/wn20/schema/&gt;</title>
      </sec>
      <sec id="sec-7-2">
        <title>SELECT ?aSynset</title>
      </sec>
      <sec id="sec-7-3">
        <title>WHERE { ?aSynset wn20schema:containsWordSense ?aWordSense . ?aWordSense wn20schema:word ?aWord . ?aWord wn20schema:lexicalForm "bank"@en-US }</title>
        <p>
          Proposed semantic framework [
          <xref ref-type="bibr" rid="ref9">8</xref>
          ] for grid improvement is based on such main
counterparts (Fig.6): RDF/OWL store; tools for information extraction; tools for
Ontology Engineering Modeling Process; knowledge mining, SPAROL/SQL search
and analysis tools.
Today from 117659 English WordNet synsets more then 50000 synsets have been
translated from English to Russian and evaluated. Just now we are designing Web 2.0
wiki system of translation much alike as http://www.asianwordnet.org/. At the same
time we have enriched English-Russian WordNet by 30000 English-Russian
translations from DBpedia (http://dbpedia.org/) and LOD (http://linkeddata.org/) RDF
stores.
        </p>
        <p>Wordnets have been created in more then 50 of other languages
http://www.globalwordnet.org/gwa/wordnet_table.htm.</p>
        <p>This work was partly funded by The Russian Foundation for Basic Research grant
10-07-90005.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fellbaum</surname>
          </string-name>
          , C. (ed.):
          <article-title>WordNet: An Electronic Lexical Database</article-title>
          . Bradford
          <string-name>
            <surname>Books</surname>
          </string-name>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          1.
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>EuroWordNet: A Multilingual Database with Lexical Semantic Network</article-title>
          . Dordrecht (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          2.
          <string-name>
            <surname>Balkova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suhonogov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yablonsky</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          :
          <article-title>Russia WordNet. From UML-notation to Internet / Intranet Database Implementation</article-title>
          . In: Proceedings of the Second International WordNet Conference (GWC
          <year>2004</year>
          ), pp.
          <fpage>31</fpage>
          -
          <lpage>38</lpage>
          .
          <string-name>
            <surname>Brno</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          3.
          <string-name>
            <surname>Balkova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suhonogov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yablonsky</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          :
          <article-title>Some Issues in the Construction of a Russian WordNet Grid</article-title>
          .
          <source>In: Proceedings of the Forth International WordNet Conference (GWC</source>
          <year>2008</year>
          ), pp.
          <fpage>44</fpage>
          -
          <lpage>55</lpage>
          , Szeged, Hungary, January
          <volume>22</volume>
          -
          <issue>25</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          4.
          <string-name>
            <surname>Yablonsky</surname>
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suhonogov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Semi-Automated English-Russian WordNet Construction: Initial Resources</surname>
          </string-name>
          ,
          <source>Software and Methods of Translation. In: Proceedings of the Third International WordNet Conference (GWC</source>
          <year>2006</year>
          ), South Jeju Island, Korea, January
          <volume>22</volume>
          -
          <fpage>26</fpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yablonsky</surname>
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Russicon</surname>
          </string-name>
          <article-title>Slavonic Language Resources and Software</article-title>
          . In: A.
          <string-name>
            <surname>Rubio</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Gallardo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Castro</surname>
          </string-name>
          &amp; A. Tejada (eds.)
          <source>Proceedings First International Conference on Language Resources &amp; Evaluation</source>
          , Granada, Spain (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yablonsky</surname>
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Russian Morphological</surname>
          </string-name>
          <article-title>Analyses</article-title>
          .
          <source>In: Proceedings of the International Conference VEXTAL, November 22-24</source>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>90</lpage>
          ), Venezia, Italia (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yablonsky</surname>
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Russian</surname>
          </string-name>
          <article-title>Morphology: Resources and Java Software Applications</article-title>
          .
          <source>In: Proceedings EACL03 Workshop Morphological Processing of Slavic Languages</source>
          , Budapest, Hungary (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          8.
          <string-name>
            <surname>Yablonsky</surname>
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Semantic</surname>
          </string-name>
          <article-title>Web Framework for Development of Very Large Ontologies</article-title>
          ,
          <string-name>
            <surname>POLIBITS</surname>
          </string-name>
          , Issue
          <volume>39</volume>
          ,
          <fpage>19</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>9. WordNet OWL Ontology, http://www2.unine.ch/imi/page11291_en.html </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>