=Paper= {{Paper |id=Vol-1495/paper_31 |storemode=property |title=Towards the Integration of Multilingual Terminologies: an Example of a Linked Data Prototype |pdfUrl=https://ceur-ws.org/Vol-1495/paper_31.pdf |volume=Vol-1495 |dblpUrl=https://dblp.org/rec/conf/tia/Montiel-Ponsoda15 }} ==Towards the Integration of Multilingual Terminologies: an Example of a Linked Data Prototype== https://ceur-ws.org/Vol-1495/paper_31.pdf
                        Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                   205




                     Towards the Integration of Multilingual Terminologies:
                          an Example of a Linked Data Prototype

                                Elena Montiel-Ponsoda, Julia Bosque-Gil, Jorge Gracia,
                                   Guadalupe Aguado-de-Cea, Daniel Vila-Suero


                      Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
                      Campus de Montegancedo sn, Boadilla del Monte 28660 Madrid (Spain)
                               {emontiel, jbosque, jgracia, lupe, dvila}@fi.upm.es



                                                                         lexical    resources, specifically, the so-called
                                Abstract                                 vartrans       module, a dedicated module that
                                                                         accounts for terminological variation and translation
     Many language resources are nowadays
                                                                         relations among entries (Bosque-Gil et al., 2015).
     available in machine readable formats, but
                                                                         Building on that experience, we have now
     still contained in isolated silos. Current
                                                                         transformed additional multilingual terminological
     Semantic Web-based techniques enable the
                                                                         resources, namely, a set of freely available
     transformation and linking of those resources
                                                                         terminology      databases2    from      the   Catalan
     to become a navigable graph of linked
                                                                         Terminological Centre, TERMCAT, into linked data
     language resources, which can be directly
                                                                         (LD) using lemon-ontolex as underlying data format,
     consumed by third-party applications. The
                                                                         and aim to showcase the benefits of integrating
     prototype we have developed builds on a
                                                                         terminological resources.
     web user interface and SPARQL endpoint
                                                                            In this paper, we focus on the design decisions
     initially developed to query a single
                                                                         taken in the transformation and linking steps, and on
     terminological database (Terminesp), now
                                                                         the impact they have in the search and navigation of
     extended to navigate a set of multilingual
                                                                         the resulting linked terminological data.
     terminologies. The vocabulary used to
                                                                            In Section 2, we introduce lemon-ontolex and the
     represent these terminologies into the linked
                                                                         vartrans module. In section 3, we describe the
     data format is lemon-ontolex, a de facto
     standard for representing lexical information                       design decisions taken in the transformation process.
     relative to ontologies and for linking                              In section 4, we refer to the benefits of browsing and
     lexicons and machine-readable dictionaries                          navigating linked multilingual terminologies.
     to the Semantic Web.
                                                                                            2.    lemon-ontolex
                                                                            The lemon-ontolex model is the resulting work of
                      1.       Introduction                              the efforts made by the W3C Ontology Lexica
   The Linguistic Linked Open Data (LLOD) cloud1                         Community Group since 2011 to build a rich model
is a sub-cloud of linguistic resources provided in an                    to represent the lexicon-ontology interface. It is
interoperable way (using the Resource Description                        largely based on the lemon model (McCrae et al.,
Framework or RDF data model), freely accessible                          2012) and consists of a core set of classes and
and linked with each other. In its current state, the                    several modules3. The vartrans module has been
LLOD Cloud contains monolingual and multilingual                         developed to record lexico-semantic relations across
dictionaries, lexicons, thesauri and even corpora.                       entries in the same or different languages (Fig. 1.):
English is the best represented language, and some                       those among senses and those among lexical entries
languages are underrepresented or not present at all.                    and/or forms. Lexico-semantic relations among
   With Terminesp (a multilingual terminological                         senses are of semantic nature and include
database created by the Spanish Association for
Terminology, AETER), we aimed at validating the                          2 http://www.termcat.cat/es/terminologiaoberta/

lemon-ontolex model as a representation scheme for                       3     See lemon-ontolex final model specifications at
                                                                             http://www.w3.org/community/ontolex/wiki/Final_Model_Sp
1 http://linguistic-lod.org/                                                 ecification
                       Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                  206




terminological     relations     (dialectal,    register,               domains. In this way, the lexical entry :red-n-es will
chronological, discursive, and dimensional variation)                   be mapped to a sense :red-n-es-Internet-sense, as
and translation relations. In contrast, relations among                 well as to a :red-n-es-Industria-sense, etc. Each of
lexical entries and/or forms concern the surface form                   these senses refers to a skos:Concept with a
of a term and encode morphological and                                  particular definition and domain. Regarding
orthographical variation, among other aspects.                          translations, the vartrans module represents them
                                                                        as relations across lexical senses of the entries of
                                                                        each lexicon. Parts of speech, subcategorization,
                                                                        gender and number are accounted for as well.
                                                                           Generation and linking. For the transformation
                                                                        we used the data cleaning and transformation tool
                                                                        OpenRefine5 with its extension for LD. We linked to
                                                                        lexinfo6 to cover morphosyntactic information, and
                                                                        to Terminesp at the lexical entry level. Linking to
                                                                        DBpedia is also planned as a next step.




   Fig. 1. Classes and properties in vartrans


  3. Migration and linking of the resources
   For the transformation of TERMCAT terminology
repertoires to the LD format and linking to
Terminesp we followed these steps: data exploration,
URI naming strategy, data modeling, RDF                                    Fig. 2. Web user interface
generation and linking (Vila-Suero et al., 2014).
   Data exploration. TERMCAT terminology
repertoires are divided by domain. Each database                             4. Browsing multilingual terminologies
consists of a list of entries in Catalan and their                         We reuse the Terminesp web user interface (see
translations into Spanish, English, French, etc., along                 Fig. 2.) and SPARQL endpoint to browse and query
with the term type (full form or abbreviation),                         this set of integrated terminologies7. Benefits are
references to associated terms, synonyms, and,                          related to easy access and reuse of linguistic data by
sometimes, definitions. Data for part-of-speech,                        end users (translators, terminologists) and semantic-
gender and number in nouns, and subcategorization                       aware software agents.
of verbs, is also available.
   URI naming strategy. Inspired by the work in                            Acknowledgements. This work is supported by
the Apertium dictionaries4, the term itself, its part of                the FP7 EU project LIDER (610782), and the
speech and the language of the term are part of the                     Spanish 4V project (TIN2013-46238-C4-2-R).
URI of the lexical entry. For lexical senses, the
domain is included in the URI.                                                               References
   Modeling. For the modeling process, we regard                        J. Bosque-Gil et al. (2015). Applying the OntoLex
each term in a set of translations as a specific sense                     Model to a Multilingual Terminological Resource.
of a lexical entry, a sense that is mapped to a concept                    In Proc. of ESWC 2015. Springer.
in a particular domain. This allows us to have a
                                                                        J. McCrae et al. (2012). Interchanging lexical
unique lexical entry red (network), for instance,
                                                                          resources on the semantic web. Language
which occurs both in the lexicon Internet i societat
                                                                          Resources and Evaluation, vol. 46.
de la informació as in the lexicon Indústria
electrònica i dels materials elèctrics, with different                  D. Vila-Suero et al. (2014). Publishing Linked Data
senses that we extract from each domain lexicon.                          on the Web: the Multilingual Dimension. In P.
This results in a number of RDF lexica that matches                       Cimiano & P. Buitelaar (Eds.) Towards the
the number of languages available in TERMCAT                              Multilingual Semantic Web. Springer.
data, and each lexical entry will have a different
number of senses depending on its use across                            5 http://openrefine.org/index.html
                                                                        6 http://lexinfo.net/
4 http://linguistic.linkeddata.es/apertium/                             7 http://linguistic.linkeddata.es/terminesp/