=Paper=
{{Paper
|id=Vol-3161/short1
|storemode=property
|title=Representing Multilingual Terminologies with OntoLex-Lemon (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3161/short1.pdf
|volume=Vol-3161
|authors=Patricia Martin Chozas,Thierry Declerck
|dblpUrl=https://dblp.org/rec/conf/mdtt/Martin-ChozasD22
}}
==Representing Multilingual Terminologies with OntoLex-Lemon (short paper)==
Representing Multilingual Terminologies with
OntoLex-Lemon
Patricia Martín-Chozas1 , Thierry Declerck2
1
Ontology Engineering Group, Universidad Politécnica de Madrid, Avda. Montepríncipe, s/n, Boadilla del Monte, 28660,
Spain
2
German Research Center for Artificial Intelligence GmbH (DFKI), Multilinguality and Language Technology Lab,
Saarland Informatics Campus D3 2, Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany
Abstract
This paper is framed within a project to make multilingual terminologies available in a native graph
representation format. We are exploring the use of the OntoLex-Lemon model, suggesting also some
extensions, for achieving a declarative encoding of relations between multilingual expressions contained
in terminologies. This model is not only used for encoding terms but also for their associated definitions,
contexts and notes. With this effort, we aim at supporting the publication of multilingual terminologies
in the Linked Open Data cloud.
Keywords
Terminologies, Multilingualism, Formal Representation, OntoLex-Lemon
1. Introduction
In the context of work dealing with the conversion of multilingual terminologies onto an RDF1
model, we came into modelling decisions concerning also additional language data included in
such resources. While the original purpose of the porting exercise is not to change anything
at the level of the content of the considered terminologies, their modelling in a graph-based
representation offers possibilities for their interlinking and merging with other resources, being
in the realm of terminologies or of other types of data, like for example detailed lexicographic
resources. Thus, the focus of our work is the possible improved formal representation of the
language data used in multilingual terminologies. We discuss in this short paper few decisions
points concerning our modelling strategy, also comparing our work with a directly related
former approach.
1st International Conference on ”Multilingual digital terminology today. Design, representation formats and
management systems”, June 16 – 17, Padova, Italy
Envelope-Open pmchozas@fi.upm.es (P. Martín-Chozas); declerck@dfki.de (T. Declerck)
GLOBE https://github.com/pmchozas/ (P. Martín-Chozas); https://www.dfki.de/~declerck/ (T. Declerck)
Orcid 000-0002-8922-7521 (P. Martín-Chozas); 0000-0002-9450-6648 (T. Declerck)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings CEUR Workshop Proceedings (CEUR-WS.org)
http://ceur-ws.org
ISSN 1613-0073
1
RDF stands for “Resource Description Framework”. See https://www.w3.org/TR/rdf-primer/for more details.
2. The Data Basis: Two Terminological Resources
Currently, we consider two terminological resources as the input for our transformation work:
the multilingual terminology of the Deutsche Bahn (German Railways), which is encoded
within the TBX2 standard and can be accessed online3 ; and IATE (Interactive Terminology for
Europe)4 , one of the most representative terminological database in Europe. The consideration
of the latter was motivated by a previous exercise that focused on the conversion of the data
contained in IATE, structured in TBX, into RDF. This effort is a great starting point to compare
our approach.
3. The TBX2RDF Guidelines
The past LIDER project5 was already concerned with mapping TBX to RDF, with the goal of
transforming and publishing terminologies as Linked Data [2]. LIDER developed guidelines
for this task6 in which TBX elements are converted into OWL7 and associated with other RDF
vocabularies, while the basic vocabularies chosen as the backbone of the conversion were SKOS8
and the lemon model [3], a predecessor of the OntoLex-Lemon framework [4] we are using. [5]
describe the TBX2RDF approach9 and [6] presents recent developments related to this initiative,
relying on a virtualization approach that is making use of containerization technologies.
The LIDER TBX2RDF approach is representing the TBX terminological concepts as skos:Con-
cept and the TIG/NTIG elements of TBX as ontolex:LexicalEntry, and most of the other TBX
elements are straightforwardly mapped onto RDF, meaning that they are encoding as URIs for
representing a resource that can be associated with RDF predicates and objects. We note also
that TXB2RDF is not representing the TBX langSet data as such, but instead is creating language
specific lexicons in which all the data included in the original langSet element are encoded.
4. Our Approach
We make use of the most recent version of OntoLex-Lemon,10 which is effectively integrating
the SKOS vocabulary for representing conceptual units and their associated language data. This
was not the case with its former version, lemon, which was used in the LIDER project. We can
now use properties defined in OntoLex-Lemon for directly linking the conceptually oriented
2
TBX stands for ”TermBase eXchange”. See https://www.tbxinfo.net/ [accessed 2022-02-14], or [1] for more
details.
3
www.deutschebahn.com/dblanguageportal [accessed 2021-10-02]
4
See https://iate.europa.eu/ [accessed 2022-02-14]
5
http://lider-project.eu/lider-project.eu/index.html [accessed 2021-10-02]
6
The latest version of those guidelines is available at https://github.com/bpmlod/report/blob/gh-pages/
multilingual-terminologies/index.html [accessed 2022-02-14]
7
OWL stands for “Web Ontology Language”. See https://www.w3.org/TR/owl2-primer/ [accessed 2022-14-02]
8
SKOS stands for „Simple Knowledge Organization System”. See also https://www.w3.org/2009/08/
skos-reference/skos.html [last consulted: 2022-02-14]
9
The corresponding W3C Community Group Report is avaialable at https://www.w3.org/2015/09/
bpmlod-reports/multilingual-terminologies/[accessed2022-02-14]
10
See https://www.w3.org/2016/05/ontolex/ [accessed 2022-02-14] for technical details.
terms to lexical entries, while the LIDER TBX2RDF converter was using a custom property
for this purpose. We introduce a skos:ConceptScheme for encoding the whole conceptual
organisation of the original terminology, and within this scheme we allow for the definition
of specific domain subsets, a feature not supported in TBX.11 OntoLex-Lemon is foreseeing as
a subclass of skos:Concept the class ontolex:LexicalConcept for linking lexical entries to the
conceptual part described in the SKOS vocabulary. We encode all the terms as instances of this
class, and no longer as instances of the class ontolex:LexicalEntry, as it was implemented in
TBX2RDF. Another, and more significant, departure from the LIDER TBX2RDF model is the fact
that we model definitions and contexts as instances of classes, and no longer as literal values.
In doing so, we can describe specific relations between the definitions within one language
or across different languages. In the latter case, we can specify if the definitions given for
terms in two different languages are translations of each other, multilingual equivalents or just
monolingual definitions included in the multilingual terminology. Suggested additions to the
OntoLex-Lemon model are marked with the prefix “termlex”.
Figure 1 shows how an IATE term entry is currently represented following our approach, while
also representing the synonymy of two Spanish terms. Figure 2 displays the relations between
the terms and their definitions, which as instances of a class, can link to further information,
like the provenance or the definitions for the same original term entry in another language. The
English equivalents for the Spanish terms “surco ferroviario” and “franja ferroviaria” (displayed
in Figures 1 and 2) – “train path”, “train slot” –, as well as the English definitions and their
context of use are linked to the Spanish terms and entries via the properties defined in the
Vartrans module of OntoLex-Lemon,12 supporting a declarative description of the different types
of relations that can exist between those different types of language data (terms, definitions and
contexts of use).
5. Conclusions and Future Work
We described ongoing work in porting the multilingual terminology resources onto a Linked
Data compliant representation language. This work led us to the question if it would not be
suitable to extend the modelling of TBX terminologies in RDF already proposed by the LIDER
TBX2RDF converter. One aspect consists in considering definitions, contexts and notes as
full ontological elements that can thus be put explicitly in relation to each other. This way,
definitions in different languages can be declaratively interlinked and marked as translations,
equivalents or as not having any of those relations.
As an outcome of our work, we are currently proposing an extension module for OntoLex-
Lemon,13 that deals with the representation of terminological data that is not covered in the
core module, as the main motivation of the development of OntoLex-Lemon vocabulary was to
represent language data with references to ontologies.
11
See [7] for a discussion on the difference between the “subjectField” in TBX and the conceptual hierarchy in
SKOS.
12
https://www.w3.org/2016/05/ontolex/#variation-translation-vartrans
13
https://www.w3.org/community/ontolex/wiki/Terminology
skos:ConceptScheme
lexinfo:NormativeAuthorization
lime:Lexicon
http://lexinfo#preferredTerm
lime:language "es"
termlex:isEvokedBy
lime:entry
skos:inScheme termlex:normativeAuthorization ontolex:LexicalConcept ontolex:LexicalSense
ontolex:LexicalEntry ontolex:Form
termlex:lexicalizedSense
<1443648_LC1> <1443648_LS1> ontolex:isSenseOf writtenRep
<1443648_LC1_LEN> ontolex:lexicalForm
termlex:lexicalizedConcept "surco ferroviario"
skos:Concept
lexinfo:synonym
termlex:lexicalizedConcept
lime:entry
ontolex:LexicalConcept termlex:lexicalizedSense ontolex:LexicalSense ontolex:LexicalEntry ontolex:Form
ontolex:isSenseOf
termlex:normativeAuthorization ontolex:lexicalForm writtenRep
<1443648_LC2> <1443648_LS2> <1443648_LC2_LEN>
"franja ferroviaria"
termlex:isEvokedBy
lexinfo:NormativeAuthorization
http://lexinfo#deprecatedTerm
Figure 1: Representing a IATE term entry in OntoLex-Lemon, showing two Spanish terms used for a
term entry. One term is marked as “preferred” while the other is marked as “deprecated”. Our suggested
extensions (“termlex”) to OntoLex-Lemon are displayed in blue colour.
skos:note termlex:Note
rdf:value "Se trata de unidades cuya disponibilidad depende de
factores como el número de vías disponibles, el sistema de
señalización o la diferencia de velocidad entre trenes. El
Administrador de Infraestructuras Ferroviarias (Adif) concede a los
operadores el derecho de explotar un tramo de vía en un día, una
hora y un sentido determinado."
dc:source
termlex:Definition termlex:Source
rdf:value rdf:value "Directiva 2001/14/CE relativa a
skos:ConceptScheme skos:definition dc:source
"capacidad de infraestructura la adjudicación de la capacidad de
skos:definition necesaria para que un tren infraestructura ferroviaria y la aplicación
circule entre dos puntos en de cánones por su utilización"
un momento dado." dc:identifier "CELEX:32001L0014/ES"
lime:Lexicon
lime:language "es"
skos:inScheme termlex:isEvokedBy
ontolex:LexicalConcept ontolex:LexicalSense lime:entry
ontolex:LexicalEntry ontolex:Form
termlex:lexicalizedSense
<1443648_LC1> <1443648_LS1> ontolex:isSenseOf
ontolex:lexicalForm writtenRep
<1443648_LC1_LEN>
termlex:lexicalizedConcept "surco ferroviario"
skos:Concept
lexinfo:synonym
termlex:lexicalizedConcept lime:entry
ontolex:LexicalConcept termlex:lexicalizedSense ontolex:LexicalSense ontolex:isSenseOf ontolex:LexicalEntry ontolex:Form
ontolex:lexicalForm writtenRep
<1443648_LC2> <1443648_LS2> <1443648_LC2_LEN>
"franja ferroviaria"
termlex:isEvokedBy
Figure 2: Representing the links between terms and their definitions, which are now instances of a
specific class. Our suggested extensions (“termlex”) to OntoLex-Lemon are displayed in blue colour.
Acknowledgments
This short paper is based upon work from COST Action NexusLinguarum – European network
for Web-centered linguistic data science (CA18209), supported by COST (European Cooperation
in Science and Technology). The article is also supported by the Horizon 2020 research and
innovation programme with the project Prêt-à-LLOD (grant agreement no. 825182).
References
[1] A. Lommel, A. K. Melby, N. Glenn, J. Hayes, T. Snow, TBX-Min: A Simplified TBX-
Based Approach to Representing Bilingual Glossaries, in: Terminology and Knowledge
Engineering 2014, Berlin, Germany, 2014, p. 10 p. URL: https://hal.archives-ouvertes.fr/
hal-01005851.
[2] C. Bizer, T. Heath, T. Berners-Lee, Linked data: The story so far, in: Semantic services,
interoperability and web applications: emerging concepts, IGI global, 2011, pp. 205–227.
[3] J. P. McCrae, G. Aguado de Cea, P. Buitelaar, P. Cimiano, T. Declerck, A. Gómez-Pérez,
J. Gracia, L. Hollink, E. Montiel-Ponsoda, D. Spohr, T. Wunner, Interchanging lexical
resources on the semantic web, Lang. Resour. Evaluation 46 (2012) 701–719. URL: https:
//doi.org/10.1007/s10579-012-9182-3. doi:10.1007/s10579- 012- 9182- 3 .
[4] J. P. McCrae, P. Buitelaar, P. Cimiano, The OntoLex-Lemon Model: development and
applications, in: Proc. of the 5th Biennial Conference on Electronic Lexicography (eLex),
2017.
[5] P. Cimiano, J. P. McCrae, V. Rodríguez-Doncel, T. Gornostay, A. Gómez-Pérez, B. Siemoneit,
A. Lagzdins, Linked terminologies: applying linked data principles to terminological
resources, in: Proceedings of the eLex 2015 Conference, 2015.
[6] M. P. di Buono, P. Cimiano, M. F. Elahi, F. Grimm, Terme-à-LLOD: Simplifying the conversion
and hosting of terminological resources as linked data, in: Proceedings of the 7th Workshop
on Linked Data in Linguistics (LDL-2020), European Language Resources Association,
Marseille, France, 2020, pp. 28–35. URL: https://aclanthology.org/2020.ldl-1.5.
[7] D. Reineke, L. Romary, Bridging the gap between SKOS and TBX, edition - Die Fachzeitschrift
für Terminologie 19 (2019). URL: https://hal.inria.fr/hal-02398820.