The ESSOT System Goes Wild: an Easy Way
                For Translating Ontologies

                   Mihael Arcan1 , Mauro Dragoni2 , and Paul Buitelaar1
          1
            Insight Centre for Data Analytics, National University of Ireland, Galway
                  [firstname.lastname]@insight-centre.org
          2
            FBK- Fondazione Bruno Kessler, Via Sommarive 18, 38123 Trento, Italy
                                     dragoni@fbk.eu

        Abstract. To enable knowledge access across languages, ontologies that are of-
        ten represented only in English need to be translated into different languages.
        Since manual multilingual enhancement of domain-specific ontologies is very
        time consuming and expensive, smart solutions are required to facilitate the trans-
        lation task for the language and domain experts. For this reason, we present ES-
        SOT, an Expert Supporting System for Ontology Translation, which support ex-
        perts in accomplishing the multilingual ontology management task.3 Differently
        than the classic document translation, ontology label translation faces highly spe-
        cific vocabulary and lack contextual information. Therefore, ESSOT takes ad-
        vantage of the semantic information of the ontology for translation improvement
        of the ontology labels.


1    Introduction
Currently, most of the semantically structured data, i.e. ontologies or taxonomies, have
labels stored in English only. Although, the increasing amount of ontologies offers an
excellent opportunity to link this knowledge together, non-English users may encounter
difficulties when using the ontological knowledge represented in English only [1]. Fur-
thermore, applications in information retrieval or knowledge management, using mono-
lingual ontologies are limited to the language in which the ontology labels are stored.
Therefore, to make the ontological knowledge accessible beyond the language borders,
these monolingual resources need to be enhanced with multilingual information [2]. For
this reason, we engage a statistical machine translation (SMT) system, which takes into
consideration the domain of the ontology to be translated. As ontologies may change
over time, having in place an SMT system adaptable to an ontology can therefore be
very beneficial. One of the main challenges in ontology translation are labels built out
of only a few words, which do not often express enough semantic information to guide
the SMT system to translate them into the targeted domain. This can be observed in
domain-unadapted SMT systems, e.g. Google Translate,4 where an ambiguous expres-
sion, like vessel stored in a medical ontology, is translated into a generic domain as
Schiff 5 (en. ship) in German, but not into the targeted medical domain as Gefäß.
    In this demo, we present our proposal for addressing the ontology translation task
in a real-world settings. We will show the software modules composing the system,
3
   This demo paper is submitted as support of the accepted In-Use paper at ISWC 2016, in
order to give the opportunity of showing more details on how the system works and how it
has been used in different real-world settings. A read-only version, but with all functionali-
ties, of the instance described in this paper is available at https://dkmtools.fbk.eu/moki/3 5/essot/
4
   https://translate.google.com/ 5 Translation performed on 06.07.2016
their functionalities and how they can be exploited as web services. Finally, we will
provide information on how to engage the different components and how to prepare a
local instance of the ESSOT platform.

2   System Implementation
Based on the lexical and semantic overlap with the ontology labels our proposed system
identifies from a large English monolingual corpus the most relevant sentences contain-
ing the labels to be translated. The goal is to translate the ontology labels within the
textual context of the targeted domain, rather than in isolation. For instance, with this
selection approach, we aim to retain relevant sentences, where the English word vessel
or injection belongs to the medical domain, but not to the technical domain.
Statistical Machine Translation For the translation approach, ESSOT engages the
widely used Moses toolkit [3]. For a broader domain coverage of the SMT system
we merged several parallel corpora, e.g. DGT (translation memories generated by the
Directorate-General for Translation) [4], Europarl [5] and MultiUN corpus [6] among
others, into one parallel data set necessary to train an SMT system. Due to the increasing
amount of parallel data, ESSOT supports translations of English ontology labels into
all (24) official languages of the European Union [7].
Query Expansion for Sentence Selection In order to improve the translation of ontol-
ogy labels, we select from the concatenated corpus only those source sentences, which
are most relevant to the labels to be translated [8]. The first criterion for relevance is the
n-gram overlap between a label and a source sentence coming from the generic corpus.
Once we obtain sentences with the targeted labels, we follow the idea of extending the
semantic information of the labels using Word2Vec for computing distributed repre-
sentations of words [9]. The technique is based on a neural network that analyses the
textual data provided as input, in our experiment ontology labels and source sentences,
and outputs a list of semantically related words. Each input string is vectorized and
compared to other vectorized sets of words in a multi-dimensional vector space, which
was trained with Word2Vec on Wikipedia articles.
    To further improve the disambiguation of relevant sentences, the related words of
the label are concatenated with the related words of its direct parent in the ontology
hierarchy. Given a label and a source sentence from the generic corpus, related words
are extracted from both of them, and used as entries of the vectors to calculate the cosine
similarity. Finally, we translate the most similar source sentence with the targeted label
and extract its translation once the translation approach is done.
User Facilities The ESSOT system integrates facilities supporting a collaborative
translation of domain-specific ontologies in order to satisfy the requirements of the
multilingual ontology enhancement from a user perspective. The system focuses on
supporting two distinct experts groups: domain experts and language experts. Domain
experts are in charge of the modelling aspect of ontologies (i.e. creation of concepts, in-
dividuals, properties, and the relationships between them). On the other hand, language
experts are responsible for managing the labels associated with each entities by evalu-
ating their correctness and, eventually, by providing a more fine-grained adaptation of
the ontology with respect to the domain it represents.
    The full set of facilities in ESSOT include: (i) Experts Views, which are in charge
of presenting all information to experts in an effective manner; (ii) Approval and Dis-
cussion components, which are managing the collaborative workflow of entity editing
                            Fig. 1. ESSOT General Architecture.

by informing and providing experts with information necessary for understanding the
status of each entity within the ontology; and (iii) the Translator Connector that is re-
sponsible of invoking the machine translation service, called OTTO [10], for providing
a list of suggestions for translating the entity labels.

3 ESSOT in Action: What we Will Show During The Demo
The main part of the demo will be related to (i) the presentation of the general features of
the platform, (ii) how a platform instance can be obtained and installed on local servers,
and (iii) which are the mandatory parameters that have to be set for making the platform
working on own servers. Furthermore, among the full set of features implemented into
the ESSOT platform, our demo will focus on the ones described below.
Usability of The Tool. We will show how the process for translating an ontology works
and how the user facilities can be used by the different type of experts in a collaborative
way. In particular, we will focus on the “Approval Workflow” and how all the actors
involved in the process of translating ontologies are notified about the multilingual en-
hancement of each entity. In addition to that, we will demonstrate how the underlying
machine translation components suggests candidate translations to the experts and how
such suggestions can be selected for their inclusion the ontology.
Plug-and-Play of Translation Models. A first more technical demonstration is related
to the plug-and-play facility of the platform for creating and connecting different ma-
chine translation models and/or services. We will show how developers can configure
the platform in simple steps by connecting it to machine translation models stored lo-
cally or to external translation services (i.e. Microsoft Bing).
Usage of ESSOT as Web Service. Finally, the machine translation service integrated
into the ESSOT platform can be queried from third party applications by exploiting
the available RESTful interface.6 We will show how the service works, which are the
expected inputs and the structure of the output.
6
    http://server1.nlp.insight-centre.org/otto/rest_service.html
4    Conclusion
This paper is aimed at showing ESSOT for multilingual management of semantically
structured data, i.e. ontologies or taxonomies. The system is based on an approach to
identify the most relevant source sentences from a large generic parallel corpus, giv-
ing the possibility to automatically translate highly specific ontology labels in context
without particular in-domain parallel data. The demonstrated approach reduces the am-
biguity of expressions in the selected sentences, which consequently generates better
translations of ontology labels. As an ongoing work, we further focus on improving the
extraction of the lexical knowledge stored in ontologies. Additionally, we plan to enable
knowledge enrichment for existing multilingual ontologies.

Acknowledgement
This publication has emanated from research conducted with the financial support of
Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.

References
 1. Gómez-Pérez, A., Vila-Suero, D., Montiel-Ponsoda, E., Gracia, J., Aguado-de Cea, G.:
    Guidelines for multilingual linked data. In: Proceedings of the 3rd International Conference
    on Web Intelligence, Mining and Semantics, ACM (2013)
 2. Gracia, J., Montiel-Ponsoda, E., Cimiano, P., Gómez-Pérez, A., Buitelaar, P., McCrae, J.:
    Challenges for the multilingual web of data. Web Semantics: Science, Services and Agents
    on the World Wide Web 11 (2012)
 3. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B.,
    Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open
    source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting
    of the ACL on Interactive Poster and Demonstration Sessions, Stroudsburg, PA, USA (2007)
 4. Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez, M., Schlüter, P., Przybyszewski,
    M., Gilbro, S.: An overview of the european union’s highly multilingual parallel corpora.
    Language Resources and Evaluation 48(4) (2014) 679–707
 5. Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Conference
    Proceedings: the tenth Machine Translation Summit, AAMT (2005)
 6. Eisele, A., Chen, Y.: Multiun: A multilingual corpus from united nation documents. In
    Tapias, D., Rosner, M., Piperidis, S., Odjik, J., Mariani, J., Maegaard, B., Choukri, K., Chair),
    N.C.C., eds.: Proceedings of the Seventh conference on International Language Resources
    and Evaluation, European Language Resources Association (ELRA) (5 2010) 2868–2872
 7. Arcan, M., Dragoni, M., Buitelaar, P.: ESSOT: an expert supporting system for ontology
    translation. In Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S., eds.: Natural
    Language Processing and Information Systems - 21st International Conference on Applica-
    tions of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24,
    2016, Proceedings. Volume 9612 of Lecture Notes in Computer Science., Springer (2016)
    60–73
 8. Arcan, M., Turchi, M., Buitelaar, P.: Knowledge portability with semantic expansion of
    ontology labels. In: Proceedings of the 53rd Annual Meeting of the Association for Compu-
    tational Linguistics, Beijing, China (July 2015)
 9. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
    vector space. ICLR Workshop (2013)
10. Arcan, M., Asooja, K., Ziad, H., Buitelaar, P.: Otto – ontology translation system. In: ISWC
    2015 Posters & Demonstrations Track. Volume 1486., Bethlehem (PA), USA (2015)