=Paper=
{{Paper
|id=Vol-1495/paper_31
|storemode=property
|title=Towards the Integration of Multilingual Terminologies: an Example of a Linked Data Prototype
|pdfUrl=https://ceur-ws.org/Vol-1495/paper_31.pdf
|volume=Vol-1495
|dblpUrl=https://dblp.org/rec/conf/tia/Montiel-Ponsoda15
}}
==Towards the Integration of Multilingual Terminologies: an Example of a Linked Data Prototype==
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
205
Towards the Integration of Multilingual Terminologies:
an Example of a Linked Data Prototype
Elena Montiel-Ponsoda, Julia Bosque-Gil, Jorge Gracia,
Guadalupe Aguado-de-Cea, Daniel Vila-Suero
Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
Campus de Montegancedo sn, Boadilla del Monte 28660 Madrid (Spain)
{emontiel, jbosque, jgracia, lupe, dvila}@fi.upm.es
lexical resources, specifically, the so-called
Abstract vartrans module, a dedicated module that
accounts for terminological variation and translation
Many language resources are nowadays
relations among entries (Bosque-Gil et al., 2015).
available in machine readable formats, but
Building on that experience, we have now
still contained in isolated silos. Current
transformed additional multilingual terminological
Semantic Web-based techniques enable the
resources, namely, a set of freely available
transformation and linking of those resources
terminology databases2 from the Catalan
to become a navigable graph of linked
Terminological Centre, TERMCAT, into linked data
language resources, which can be directly
(LD) using lemon-ontolex as underlying data format,
consumed by third-party applications. The
and aim to showcase the benefits of integrating
prototype we have developed builds on a
terminological resources.
web user interface and SPARQL endpoint
In this paper, we focus on the design decisions
initially developed to query a single
taken in the transformation and linking steps, and on
terminological database (Terminesp), now
the impact they have in the search and navigation of
extended to navigate a set of multilingual
the resulting linked terminological data.
terminologies. The vocabulary used to
In Section 2, we introduce lemon-ontolex and the
represent these terminologies into the linked
vartrans module. In section 3, we describe the
data format is lemon-ontolex, a de facto
standard for representing lexical information design decisions taken in the transformation process.
relative to ontologies and for linking In section 4, we refer to the benefits of browsing and
lexicons and machine-readable dictionaries navigating linked multilingual terminologies.
to the Semantic Web.
2. lemon-ontolex
The lemon-ontolex model is the resulting work of
1. Introduction the efforts made by the W3C Ontology Lexica
The Linguistic Linked Open Data (LLOD) cloud1 Community Group since 2011 to build a rich model
is a sub-cloud of linguistic resources provided in an to represent the lexicon-ontology interface. It is
interoperable way (using the Resource Description largely based on the lemon model (McCrae et al.,
Framework or RDF data model), freely accessible 2012) and consists of a core set of classes and
and linked with each other. In its current state, the several modules3. The vartrans module has been
LLOD Cloud contains monolingual and multilingual developed to record lexico-semantic relations across
dictionaries, lexicons, thesauri and even corpora. entries in the same or different languages (Fig. 1.):
English is the best represented language, and some those among senses and those among lexical entries
languages are underrepresented or not present at all. and/or forms. Lexico-semantic relations among
With Terminesp (a multilingual terminological senses are of semantic nature and include
database created by the Spanish Association for
Terminology, AETER), we aimed at validating the 2 http://www.termcat.cat/es/terminologiaoberta/
lemon-ontolex model as a representation scheme for 3 See lemon-ontolex final model specifications at
http://www.w3.org/community/ontolex/wiki/Final_Model_Sp
1 http://linguistic-lod.org/ ecification
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
206
terminological relations (dialectal, register, domains. In this way, the lexical entry :red-n-es will
chronological, discursive, and dimensional variation) be mapped to a sense :red-n-es-Internet-sense, as
and translation relations. In contrast, relations among well as to a :red-n-es-Industria-sense, etc. Each of
lexical entries and/or forms concern the surface form these senses refers to a skos:Concept with a
of a term and encode morphological and particular definition and domain. Regarding
orthographical variation, among other aspects. translations, the vartrans module represents them
as relations across lexical senses of the entries of
each lexicon. Parts of speech, subcategorization,
gender and number are accounted for as well.
Generation and linking. For the transformation
we used the data cleaning and transformation tool
OpenRefine5 with its extension for LD. We linked to
lexinfo6 to cover morphosyntactic information, and
to Terminesp at the lexical entry level. Linking to
DBpedia is also planned as a next step.
Fig. 1. Classes and properties in vartrans
3. Migration and linking of the resources
For the transformation of TERMCAT terminology
repertoires to the LD format and linking to
Terminesp we followed these steps: data exploration,
URI naming strategy, data modeling, RDF Fig. 2. Web user interface
generation and linking (Vila-Suero et al., 2014).
Data exploration. TERMCAT terminology
repertoires are divided by domain. Each database 4. Browsing multilingual terminologies
consists of a list of entries in Catalan and their We reuse the Terminesp web user interface (see
translations into Spanish, English, French, etc., along Fig. 2.) and SPARQL endpoint to browse and query
with the term type (full form or abbreviation), this set of integrated terminologies7. Benefits are
references to associated terms, synonyms, and, related to easy access and reuse of linguistic data by
sometimes, definitions. Data for part-of-speech, end users (translators, terminologists) and semantic-
gender and number in nouns, and subcategorization aware software agents.
of verbs, is also available.
URI naming strategy. Inspired by the work in Acknowledgements. This work is supported by
the Apertium dictionaries4, the term itself, its part of the FP7 EU project LIDER (610782), and the
speech and the language of the term are part of the Spanish 4V project (TIN2013-46238-C4-2-R).
URI of the lexical entry. For lexical senses, the
domain is included in the URI. References
Modeling. For the modeling process, we regard J. Bosque-Gil et al. (2015). Applying the OntoLex
each term in a set of translations as a specific sense Model to a Multilingual Terminological Resource.
of a lexical entry, a sense that is mapped to a concept In Proc. of ESWC 2015. Springer.
in a particular domain. This allows us to have a
J. McCrae et al. (2012). Interchanging lexical
unique lexical entry red (network), for instance,
resources on the semantic web. Language
which occurs both in the lexicon Internet i societat
Resources and Evaluation, vol. 46.
de la informació as in the lexicon Indústria
electrònica i dels materials elèctrics, with different D. Vila-Suero et al. (2014). Publishing Linked Data
senses that we extract from each domain lexicon. on the Web: the Multilingual Dimension. In P.
This results in a number of RDF lexica that matches Cimiano & P. Buitelaar (Eds.) Towards the
the number of languages available in TERMCAT Multilingual Semantic Web. Springer.
data, and each lexical entry will have a different
number of senses depending on its use across 5 http://openrefine.org/index.html
6 http://lexinfo.net/
4 http://linguistic.linkeddata.es/apertium/ 7 http://linguistic.linkeddata.es/terminesp/