<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Terminologies with OntoLex-Lemon</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patricia Martín-Chozas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thierry Declerck</string-name>
          <email>declerck@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Terminologies, Multilingualism, Formal Representation, OntoLex-Lemon</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>German Research Center for Artificial Intelligence GmbH (DFKI), Multilinguality and Language Technology Lab</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ontology Engineering Group, Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>Avda. Montepríncipe, s/n, Boadilla del Monte, 28660</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Saarland Informatics Campus D3 2</institution>
          ,
          <addr-line>Stuhlsatzenhausweg 3, 66123 Saarbrücken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper is framed within a project to make multilingual terminologies available in a native graph representation format. We are exploring the use of the OntoLex-Lemon model, suggesting also some extensions, for achieving a declarative encoding of relations between multilingual expressions contained in terminologies. This model is not only used for encoding terms but also for their associated definitions, contexts and notes. With this efort, we aim at supporting the publication of multilingual terminologies In the context of work dealing with the conversion of multilingual terminologies onto an RDF1 model, we came into modelling decisions concerning also additional language data included in such resources. While the original purpose of the porting exercise is not to change anything at the level of the content of the considered terminologies, their modelling in a graph-based representation ofers possibilities for their interlinking and merging with other resources, being in the realm of terminologies or of other types of data, like for example detailed lexicographic resources. Thus, the focus of our work is the possible improved formal representation of the language data used in multilingual terminologies. We discuss in this short paper few decisions points concerning our modelling strategy, also comparing our work with a directly related former approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>https://github.com/pmchozas/ (P. Martín-Chozas); https://www.dfki.de/~declerck/ (T. Declerck)</p>
      <p>© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Data Basis: Two Terminological Resources</title>
      <p>Currently, we consider two terminological resources as the input for our transformation work:
the multilingual terminology of the Deutsche Bahn (German Railways), which is encoded
within the TBX2 standard and can be accessed online3; and IATE (Interactive Terminology for
Europe)4, one of the most representative terminological database in Europe. The consideration
of the latter was motivated by a previous exercise that focused on the conversion of the data
contained in IATE, structured in TBX, into RDF. This efort is a great starting point to compare
our approach.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The TBX2RDF Guidelines</title>
      <p>
        The past LIDER project5 was already concerned with mapping TBX to RDF, with the goal of
transforming and publishing terminologies as Linked Data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. LIDER developed guidelines
for this task6 in which TBX elements are converted into OWL7 and associated with other RDF
vocabularies, while the basic vocabularies chosen as the backbone of the conversion were SKOS8
and the lemon model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a predecessor of the OntoLex-Lemon framework [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we are using. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
describe the TBX2RDF approach9 and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] presents recent developments related to this initiative,
relying on a virtualization approach that is making use of containerization technologies.
      </p>
      <p>The LIDER TBX2RDF approach is representing the TBX terminological concepts as
skos:Concept and the TIG/NTIG elements of TBX as ontolex:LexicalEntry, and most of the other TBX
elements are straightforwardly mapped onto RDF, meaning that they are encoding as URIs for
representing a resource that can be associated with RDF predicates and objects. We note also
that TXB2RDF is not representing the TBX langSet data as such, but instead is creating language
specific lexicons in which all the data included in the original langSet element are encoded.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Our Approach</title>
      <p>
        We make use of the most recent version of OntoLex-Lemon,10 which is efectively integrating
the SKOS vocabulary for representing conceptual units and their associated language data. This
was not the case with its former version, lemon, which was used in the LIDER project. We can
now use properties defined in OntoLex-Lemon for directly linking the conceptually oriented
2TBX stands for ”TermBase eXchange”. See https://www.tbxinfo.net/ [accessed 2022-02-14], or [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for more
details.
      </p>
      <p>3www.deutschebahn.com/dblanguageportal [accessed 2021-10-02]
4See https://iate.europa.eu/ [accessed 2022-02-14]
5http://lider-project.eu/lider-project.eu/index.html [accessed 2021-10-02]
6The latest version of those guidelines is available at https://github.com/bpmlod/report/blob/gh-pages/
multilingual-terminologies/index.html [accessed 2022-02-14]
7OWL stands for “Web Ontology Language”. See https://www.w3.org/TR/owl2-primer/ [accessed 2022-14-02]
8SKOS stands for „Simple Knowledge Organization System”. See also https://www.w3.org/2009/08/
skos-reference/skos.html [last consulted: 2022-02-14]</p>
      <p>9The corresponding W3C Community Group Report is avaialable at https://www.w3.org/2015/09/
bpmlod-reports/multilingual-terminologies/[accessed2022-02-14]</p>
      <p>10See https://www.w3.org/2016/05/ontolex/ [accessed 2022-02-14] for technical details.
terms to lexical entries, while the LIDER TBX2RDF converter was using a custom property
for this purpose. We introduce a skos:ConceptScheme for encoding the whole conceptual
organisation of the original terminology, and within this scheme we allow for the definition
of specific domain subsets, a feature not supported in TBX. 11 OntoLex-Lemon is foreseeing as
a subclass of skos:Concept the class ontolex:LexicalConcept for linking lexical entries to the
conceptual part described in the SKOS vocabulary. We encode all the terms as instances of this
class, and no longer as instances of the class ontolex:LexicalEntry, as it was implemented in
TBX2RDF. Another, and more significant, departure from the LIDER TBX2RDF model is the fact
that we model definitions and contexts as instances of classes, and no longer as literal values.
In doing so, we can describe specific relations between the definitions within one language
or across diferent languages. In the latter case, we can specify if the definitions given for
terms in two diferent languages are translations of each other, multilingual equivalents or just
monolingual definitions included in the multilingual terminology. Suggested additions to the
OntoLex-Lemon model are marked with the prefix “termlex”.</p>
      <p>Figure 1 shows how an IATE term entry is currently represented following our approach, while
also representing the synonymy of two Spanish terms. Figure 2 displays the relations between
the terms and their definitions, which as instances of a class, can link to further information,
like the provenance or the definitions for the same original term entry in another language. The
English equivalents for the Spanish terms “surco ferroviario” and “franja ferroviaria” (displayed
in Figures 1 and 2) – “train path”, “train slot” –, as well as the English definitions and their
context of use are linked to the Spanish terms and entries via the properties defined in the
Vartrans module of OntoLex-Lemon,12 supporting a declarative description of the diferent types
of relations that can exist between those diferent types of language data (terms, definitions and
contexts of use).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>We described ongoing work in porting the multilingual terminology resources onto a Linked
Data compliant representation language. This work led us to the question if it would not be
suitable to extend the modelling of TBX terminologies in RDF already proposed by the LIDER
TBX2RDF converter. One aspect consists in considering definitions, contexts and notes as
full ontological elements that can thus be put explicitly in relation to each other. This way,
definitions in diferent languages can be declaratively interlinked and marked as translations,
equivalents or as not having any of those relations.</p>
      <p>As an outcome of our work, we are currently proposing an extension module for
OntoLexLemon,13 that deals with the representation of terminological data that is not covered in the
core module, as the main motivation of the development of OntoLex-Lemon vocabulary was to
represent language data with references to ontologies.</p>
      <p>
        11See [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for a discussion on the diference between the “subjectField” in TBX and the conceptual hierarchy in
SKOS.
      </p>
      <p>12https://www.w3.org/2016/05/ontolex/#variation-translation-vartrans
13https://www.w3.org/community/ontolex/wiki/Terminology
&lt;transport&gt;
lexinfo:NormativeAuthorization</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>This short paper is based upon work from COST</title>
      </sec>
      <sec id="sec-6-2">
        <title>Action</title>
      </sec>
      <sec id="sec-6-3">
        <title>NexusLinguarum – European network</title>
        <p>for Web-centered linguistic data science (CA18209), supported by COST (European</p>
      </sec>
      <sec id="sec-6-4">
        <title>Cooperation in Science and Technology). The article is also supported by the Horizon 2020 research and innovation programme with the project Prêt-à-LLOD (grant agreement no. 825182).</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Melby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Glenn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          , T. Snow,
          <string-name>
            <surname>TBX-Min</surname>
          </string-name>
          :
          <article-title>A Simplified TBXBased Approach to Representing Bilingual Glossaries</article-title>
          ,
          <source>in: Terminology and Knowledge Engineering</source>
          <year>2014</year>
          , Berlin, Germany,
          <year>2014</year>
          , p.
          <volume>10</volume>
          p. URL: https://hal.archives-ouvertes.fr/ hal-01005851.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <article-title>Linked data: The story so far, in: Semantic services, interoperability and web applications: emerging concepts</article-title>
          ,
          <source>IGI global</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          , G. Aguado de Cea, P. Buitelaar,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gracia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spohr</surname>
          </string-name>
          , T. Wunner,
          <article-title>Interchanging lexical resources on the semantic web</article-title>
          ,
          <source>Lang. Resour. Evaluation</source>
          <volume>46</volume>
          (
          <year>2012</year>
          )
          <fpage>701</fpage>
          -
          <lpage>719</lpage>
          . URL: https: //doi.org/10.1007/s10579-012-9182-3. doi:
          <volume>10</volume>
          .1007/s10579- 012- 9182- 3.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <article-title>The OntoLex-Lemon Model: development and applications</article-title>
          ,
          <source>in: Proc. of the 5th Biennial Conference on Electronic Lexicography (eLex)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rodríguez-Doncel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gornostay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Siemoneit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lagzdins</surname>
          </string-name>
          ,
          <article-title>Linked terminologies: applying linked data principles to terminological resources</article-title>
          ,
          <source>in: Proceedings of the eLex 2015 Conference</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M. P.</surname>
          </string-name>
          di Buono,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Elahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Grimm</surname>
          </string-name>
          ,
          <article-title>Terme-à-LLOD: Simplifying the conversion and hosting of terminological resources as linked data</article-title>
          ,
          <source>in: Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)</source>
          , European Language Resources Association, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>35</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .ldl-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Reineke</surname>
          </string-name>
          , L. Romary,
          <article-title>Bridging the gap between SKOS and TBX</article-title>
          , edition - Die
          <source>Fachzeitschrift für Terminologie</source>
          <volume>19</volume>
          (
          <year>2019</year>
          ). URL: https://hal.inria.fr/hal-02398820.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>