<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards the Generation of Semantically Enriched Multilingual Components of Ontology Labels</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thierry Declerck</string-name>
          <email>declerck@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dagmar Gromann</string-name>
          <email>dgromann@wu.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DFKI GmbH, Language Technology Department</institution>
          ,
          <addr-line>Stuhlsatzenhausweg 3, D-66123 Saarbruecken</addr-line>
          ,
          <institution>Germany Vienna University of Economics and Business Nordbergstrasse 15</institution>
          ,
          <addr-line>1090 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontologies often contain multilingual textual information in annotation properties, such as rdfs:label and rdfs:comment. While the motivation for using such annotation properties is to provide a human readable description of abstract conceptualization of the domain, we notice that the importance of appropriate natural language use and representation is often neglected. The same can be observed with resources on the Web, such as multilingual taxonomies. Terms often lack consistency and completeness, hampering also an accurate automated natural language processing of such text. We propose a pattern-based transformation of terms in labels, thereby also supporting a multilingual alignment of (sub)components of labels. The source data for our approach is an ontology we derived from an industry classification taxonomy, which we improve as regards consistency and completeness and apply to the process of lexicalization.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology Labels</kwd>
        <kwd>Multilingualism</kwd>
        <kwd>Terms and Sub-Terms</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Nowadays, it has been increasingly realized that the process of ontology
construction is inevitably linked to natural language and related to this
development multilingualism is progressively gaining center stage in ontology
engineering. There are various possibilities to add natural language strings to ontologies.
These strings can be part of RDF URI references, identifying ontological
resources (e.g. natural language string used in rdf:ID), a fragment (e.g. natural
language string in rdf:about statements) or marking empty property elements
(kind of leaf nodes in a graph, using the rdf:resource statement). Natural
language strings equally represent the content of the RDF annotation
properties rdfs:label and rdfs:comment, which provide information on ontological
resources in a human-readable format.</p>
      <p>Herein, we focus on the content of annotation properties. This choice has been
partially motivated by the fact that these properties qualify for the inclusion of
terminological information, which can be realized in form of longer natural
language strings. Additionally, labels and comments locally support multilinguality
by means of language tags of RDF literals, i.e., xml:lang, whereas this is not
the case for RDF URI references.</p>
      <p>Analyzing the content of annotation properties in multilingual ontologies,
we registered that their realization frequently hampers an accurate automatic
linguistic and semantic processing. This type of processing is vital to a large
number of ontology-based tasks, such as machine translation, information
extraction, cross-lingual ontology mapping. Thus, we investigate if and how
crosslingual preprocessing and linguistic harmonization of terms in ontology labels
can be of avail for such processing. At the same time, these initial steps support
a multilingual alignment of subcomponents of labels, leading to more fine-grained
multilingual resources associated with ontology elements.</p>
      <p>Our experimental results are based on the analysis of labels and comments of
an ontology we derived from the Global Industry Classification Standard (GICS)
taxonomy1 in English, German, and Spanish. The GICS taxonomy consists of
four meta-levels, namely, sector, industry group, industry, sub-industry. These
four categories represent the top nodes of the ontology. Each leaf node, i.e., each
sub-industry, contains a detailed definition. All classes are indexed by integers,
which also indicate the hierarchical structure of the taxonomy: the descending
line "10" (Energy), "1010" (Energy), “101010" (Energy Equipment &amp; Services)
and "10101010" (Oil &amp; Gas Drilling) represents the first complete branch of the
hierarchical tree of the classification scheme2.</p>
      <p>The investigation was triggered by our observation that applying baseline
Machine Translation (MT) tools, such as Google Translate, to terms used in
GICS produces substantially different terms in target languages than provided
by the corresponding languages in GICS. For example, only a partial Spanish
translation was obtained for the German compound ellipsis "Eigentums- und
Unfallversicherungen", resulting in "Propiedad y accidente", whereas the correct
translation should be "Seguro de Propiedad y Accidente" (Property and casualty
insurance).</p>
      <p>
        As regards structure, related work will be presented in section 2.
Preprocessing steps and corrective patterns for the purpose at hand will be discussed in
section 3. Deriving subcomponents of ontology labels for multilingual alignment
will be the focus of section 4. Finally, the resulting ontology will be lexicalized
by means of lemon [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] prior to concluding remarks.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Research in various areas such as multilingual ontology acquisition [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
crosslingual ontology mapping [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], ontology lexicalization [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], linguistic enrichment
of labels [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], ontology engineering from text [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and ontology localization [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
1 http://www.standardandpoors.com/indices/gics/en/us
2 The definition associated with the leaf concept ID 10101010 is "Drilling contractors
or owners of drilling rigs that contract their services for drilling wells."
can be observed. All of these approaches highlight the importance of ontologies
labeled in different languages and techniques of acquiring them. While [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] seems
to be the closest approach to our investigation, the major difference lies in the
fact that [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (and in fact also [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]) addresses only the language data included
in RDF URI reference statements. Consequently, they are not concerned with
natural language processing of (possibly lengthy) multilingual natural language
strings, but only with finding equivalents of reference expressions in various
lexical resources.
      </p>
      <p>
        Current and future results of our work might best be compared to
stateof-the-art research in the field of lexico-syntactic patterns, which are part of
ontology design patterns3 and mostly used for learning ontologies from natural
language text (e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). For this purpose and for the approach we apply to the
analysis of the content of ontology labels, many different linguistic processes, such
as tokenization, lemmatization, shallow parsing are used, also often combined
with statistical machine learning techniques to learn ontologies from large sets
of documents, e.g. Text2Onto [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The major problem of such patterns is low
precision and over-generalization, which [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] try to overcome by restricting their
main approach to three sets of patterns.
      </p>
      <p>
        The creation of ontologies from text (e.g. [
        <xref ref-type="bibr" rid="ref12 ref2">12, 2</xref>
        ]) or other resources such as
thesauri (e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) and taxonomies has been a thriving research topic as of late.
However, the use of multilingual information as a means of coherence and
consistency check of ontology labels calls for further investigation. Our work seems
to open the possibility to offer better proposals for the use of more consistent
terminology in labels associated with ontology elements in a cross-lingual setting.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Initial Processing Steps and Cross-Lingual Corrective</title>
    </sec>
    <sec id="sec-4">
      <title>Patterns</title>
      <p>We concentrate in this experiment on multilingual aspects in the GICS ontology
we derived from the original taxonomy, having in mind the potential for an
improved translation base for terms in this domain and for Information Extraction
in documents describing among others activities of companies. Initially, we
focused on labels in the three languages English, German, and Spanish, but have
already experimented with Russian labels.</p>
      <p>To remedy the deficient translatability of GICS labels, we investigated the
transformation of the surface realization of the contained terms. In order to
achieve a better readability of the ontology by engineers and users and better
prepare labels to automatic processing, we transform non-lexical symbols to
lexical correspondents, apply lexico-syntactic patterns to resolve compound ellipses,
and complement labels based on constituency discrepancies across languages,
i.e., missing constituents in one or more languages.</p>
      <p>Replacing non-lexical items by their lexical correspondents refers to
punctuation and ampersands. Duplicate occurrences of punctuation such as ",." are</p>
      <sec id="sec-4-1">
        <title>3 http://ontologydesignpatterns.org</title>
        <p>corrected. Ampersands occur 159 times in the English taxonomy, the
coordination word "and" not being used at all, while the German version features
117 occurrences and the Spanish only uses the coordination marker "y". The
ampersand character serves to represent coordination, but automated linguistic
decomposition of terms containing ampersands is not supported by off-the shelf
NLP tools. As a rather straight-forward step the ampersand was replaced by
"and" and "und" (DE).</p>
        <p>At a more complex level we transform so called compound ellipses in GICS
labels in fully lexicalized strings. Elliptical compounds represent the outcome of
a deletion process of identical constituents in either the right or the left part
of the coordination. For instance, the hyphenated German compound
"Erdo¨lund Erdgasfo¨rderung" (Oil and Gas Drilling) is transformed to "Erdo¨lfo¨rderung
und Erdgasfo¨rderung" (Oil Drilling and Gas Drilling). This transformation is
not trivial as it requires both the analysis of the compounds and the resolution
of the ellipsis, attaching the constituent "Fo¨rderung" to "Erd¨ol" in the
example above. This process necessitated the use and adaptation of a morphological
analysis component and the generation of ellipsis grammars, which are both
implemented in the NooJ4 finite state framework. Examples of the lexico-syntactic
patterns implemented in NooJ are provided below.
[Examples of Resolution Patterns of Elliptical Coordinations]</p>
        <sec id="sec-4-1-1">
          <title>DE: &lt;NN1&gt;hyphen und &lt;NN2+NN3&gt; resolved to &lt;NN1+NN3&gt; und &lt;NN2+NN3&gt;</title>
          <p>EN: &lt;NN1&gt; and &lt;NN2&gt; &lt;NN3&gt; resolved to &lt;NN1&gt; &lt;NN3&gt; and &lt;NN2&gt; &lt;NN3&gt;</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>ES: &lt;NN1&gt; &lt;ADJA1&gt; y &lt;ADJA2&gt; resolved to &lt;NN1&gt; &lt;ADJA1&gt; y &lt;NN1&gt; &lt;ADJA2&gt;</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>DE: &lt;NN1+NN2&gt; und hyphen&lt;NN3&gt; resolved to &lt;NN1+NN2&gt; und &lt;NN1+NN3&gt; EN: &lt;NN1&gt; &lt;NN2&gt; and &lt;NN3&gt; resolved to &lt;NN1&gt; &lt;NN2&gt; and &lt;NN1&gt; &lt;NN3&gt; ES: &lt;NN1&gt; y &lt;NN2&gt; de &lt;NN3&gt; resolved to &lt;NN1&gt; de &lt;NN3&gt; y &lt;NN2&gt; de &lt;NN3&gt;</title>
          <p>The presence of the German hyphen compound triggers the resolution of
ellipses into coordinated structures in labels for other languages attached to the
same concept. For instance, the German example above triggers the
transformation of the English label "Oil and Gas Drilling" to "Oil Drilling and Gas Drilling"
and of the Spanish label "Perforacio´n de Pozos Petrol´iferos y Gas´iferos" to
"Perforacio´n de Pozos Petrol´iferos y Perforaci´on de Pozos Gas´iferos". The resolution
not only concerns single nouns, but also nominal phrases, e.g. "Perforacio´n de
Pozos", and adjectival phrases. As our algorithm requires the presence of a
German hyphen, terms such as "Commercial Services and Supplies" (related to the
German "Gewerbliche Dienste und Betriebsstoffe") are not resolved and are also
not supposed to be resolved. All definitions attached to GICS terms confirm our
approach to ellipsis resolution. Further examples of resolution in all three
languages are as follows.
[Annotation Results of NooJ Processing applied to German, English and Spanish]</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4 http://www.nooj4nlp.net/pages/nooj.html</title>
        <p>&lt;EL TYPE="Energiezubeho¨r#und#Energiedienst"&gt;Energiezubeho¨r und -dienste&lt;/EL&gt;
&lt;ELLLL TYPE="Grosshandel#und#Einzelhandel"&gt;Gross- und Einzelhandel&lt;/ELLLL&gt;
&lt;EL TYPE="Energy#Equipment#and#Energy#Services"&gt;Energy Equipment and</p>
        <sec id="sec-4-2-1">
          <title>Services&lt;/EL&gt;</title>
          <p>&lt;EFOURD TYPE="Oil#:#Exploration#and#Oil#:#Production#and#Gas#:#Exploration#and
#Gas#:#Production"&gt;Oil and Gas Exploration and Production&lt;/EFOURD&gt;
&lt;EL TYPE="Equipos#de#Energı´a#y#Servicios#de#Energı´a"&gt;Equipos y Servicios de</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Energı´a&lt;/EL&gt;.</title>
          <p>&lt;ELLL TYPE="Productos#Madereros#y#Productos#Papeleros"&gt;Productos Madereros y</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Papeleros&lt;/ELLL&gt;</title>
          <p>At times the authors of the industry classification apply a colon to structure
terms, such as Metalle &amp; Bergbau: Diverse (Diversified Metals and Mining).
Frequently, these constructs can only be resolved using prepositions instead of
compounding, because terms such as Heiwerkerausru¨stungseinzelhandel (Home
Improvement Retail) do not exist. Structures using colons could only be observed
in German labels of GICS.</p>
          <p>As a final preprocessing step we evaluated complementing labels on the basis
of a cross-lingual comparison. The German "Integrierte Erdo¨l- und
Erdgasbetriebe" lacks any equivalent of "betrieb" (company) in the English or Spanish
version. Despite the fact that the taxonomy is about business activities, the
word company does virtually not occur in the English or Spanish designations of
concepts, only in definitions. For the sake of completeness, we decided to
complement the English and Spanish label with the equivalent of the missing term
taken from sibling concepts in the same sector or definitions. In this case, we add
"companies" and "empresas" on the basis of the assumption that multilingual
labels associated with concepts should, where feasible, have the same amount
and quality of information.</p>
          <p>The presented algorithm ports all terms to a shared surface realization and
depicts the different but aligned language specific realizations. While the
patterns for resolving general ellipsis can be applied to other sources, such as the
Industry Classification Benchmark (ICB)5, the second case of terms separated
by colon seems to be specific to GICS. Currently the algorithm has been
implemented for the indicated languages, however, we have performed experiments
with their utilization for other not closely related languages, such as Russian.
Many lexico-syntactic patterns can be applied directly to the Russian
designations, such as the compound "Хранение и транспортировка нефти и газа"
(Storage and Transportation of Oil and Gas) can be resolved to "Хранение
нефти и транспортировка нефти и Хранение газа и транспортировка газа"
(Oil Storage and Oil Transportation and Gas Storage and Gas Transportation).</p>
          <p>The representation of the fact that we modified the original terms (or labels)
remains to be an issue. Indicating the modification is important to the authors
of the taxonomy as well as people analyzing data. As a tentative step, for this</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>5 http://www.icbenchmark.com/</title>
        <p>purpose we have introduced the annotation property "preprocessed" to clarify
that we have adapted the original content of labels and definitions.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Multilingual Alignment and Sub-Term Structures</title>
      <p>Performing initial preprocessing steps facilitates the multilingual alignment of
terms and components of terms. For the purpose of multilingual alignment, we
have extensively analyzed and utilized existing hierarchical relations and
definitions. In a second step we create relations to indicate sub-term relations in the
actual ontology. By creating an additional terminological resource, we derive a
second subsumption hierarchy focusing on sub-term relations, which is supposed
to facilitate Information Extraction based on the ontology we created.
4.1</p>
      <p>Term Alignment
Within the taxonomic structure of GICS we are able to establish relations
between (sub)terms along the line of class hierarchies. GICS is structured along
four major meta-categories in sector, industry group, industry, and sub-industry.
Terms used in a super-class can thus be used for comparing a term in one
language with the terms of other languages. Not only the line hierarchy is interesting
for us but equally siblings in the hierarchy provide vital information.</p>
      <p>Lexica and lexical resources created in the initial processing are now utilized
to create multilingual alignments of the terminology contained in the taxonomy.
We utilize lemmas of the normalized labels to facilitate the multilingual
alignment as represented by the NooJ output illustrated below.
[Example of NooJ Annotation Result]
&lt;TYPE="Integrierte#Erdoelbetriebe#und#Integrierte#Erdgasbetriebe"&gt;</p>
      <sec id="sec-5-1">
        <title>Integrierte Erdoel- und Erdgasbetriebe&lt;/&gt;</title>
        <p>The associated lexical information in NooJ tells us in this case that
"Integrierte" is the adjectival form derived from the verb "integrieren" (to integrate).
The lemma of the head of the compound noun "betriebe" being then "Betrieb"
(company). Thereby, we are able to establish term relations on the basis of the
hierarchy, such as depicted below for the GICS class "101020".
[Example of Term Alignment]
"de" =&gt; "Erdoel, Erdgas und nicht erneuerbare Brennstoffe",
"en" =&gt; "Oil, Gas and Consumable Fuels",
"es" =&gt; "Petroleo, Gas y Combustibles",</p>
        <p>Term pairs may vary strongly across different sectors within one
classification. For instance, "Leisure products" equals "Freizeitartikel" in German, while
"Agricultural Products" corresponds to "Landwirtschaftliche Produkte". Once
"product" is aligned with "Artikel", in a different sector it maps to "Produkte".
Nevertheless, this fact does not hamper automating the alignment process, which
has been done on the basis of a Java tool, porting the preprocessed labels to the
subsumption hierarchy of the ontology. At times, this initial alignment can lead
to multiple mappings of terms depicted in Fig. 1.</p>
        <p>The interesting point about the example in Fig. 1 is the different
conceptualization across languages. The German multi-word term corresponds to the
single word expression "Material" in English and Spanish, which constitutes a
challenge for cross-lingual alignment as there seems to be no equivalent for the
three German expressions in the other languages.</p>
        <p>In such cases other term pairs within the same sector are analyzed as regards
re-occurrence of terms. If no equivalence can be detected, the definition has to
be searched. Should the terms be only contained in one label, then additional
resources, such as bilingual dictionaries or other multilingual industry
classifications, might be consulted. However, in other cases clear misalignments occur,
such as "Betriebsstoffe" in German being aligned to "Professional Services" (en)
and "Servicios Profesionales" (es). As the same designation is part of another
sub-industry in the sector, the incorrect alignment can be corrected on the basis
of the existing correct alignment to "Professionelle Dienste". The definition in
each language further confirms this alignment.</p>
        <p>Our special focus is on terminal nodes in the original taxonomy as they
contain detailed definitions, which further facilitates the cross-lingual alignment
and validation of alignment correctness. As a tentative approach, we use
lexicosyntactic patterns again to extract some basic information contained in
definitions, exemplified by one pattern in German below. The extracted information
as well as manually derived alignments from definitions are both used to validate
the previously described alignments of designations of taxonomic concepts.
[Pattern for Extraction of Information in Definitions]</p>
      </sec>
      <sec id="sec-5-2">
        <title>German:</title>
        <p>&lt;NP1&gt;, die sich mit &lt;NP2&gt;, &lt;NP3&gt;und OR oder &lt;NP4&gt;
von &lt;NP5&gt; t¨atig sind OR bescha¨ftigen.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Definition "Pharmazeutika": "Unternehmen, die in der Erforschung,</title>
      </sec>
      <sec id="sec-5-4">
        <title>Entwicklung oder Herstellung von Pharmazeutika ta¨tig sind."</title>
        <p>Definitions provide further information to facilitate the construction of proper
terms and term alignments. &lt;NP1&gt; represents a synonym of the word company,
e.g. manufacturer, producer, provider, whereas the other noun phrases relate to
business activities. One Spanish example is the industry of Transportes, which
has the industry group Transporto Aereo and sub-industry Lineas Aereas
referring to the former term explicitly in its definition.</p>
        <p>Analyzing siblings creates relations that would otherwise not be evident. For
instance, "Building Products" might not be related to "Aerospace and Defense"
in any other domain. Within this sector, however, they are related as it regards
the manufacturing of aerospace and defense equipment. The extracted term pairs
of the definitions allow us to add these additional information to the label to
strive towards completeness of information.</p>
        <p>Terms aligned in this section are represented in the GICS ontology as
annotation properties with the respective xml:lang property. Initial preprocessing and
the correct alignment of terms serve to improve the overall quality of the natural
language representation of the ontology. The alignment of terms equally helps
to reveal inconsistencies or in other words improve the consistency of ontology
labels.
4.2</p>
        <p>Sub-Term Relations
At this point our ontology consists of five main classes according to the
taxonomic structure, the four meta-levels and an additional class "Company". The
latter features a "hasBusinessActivitiy" object property to the main class
"SubIndustry" so that upon instantiating a company various activities can be added.
In addition, all taxonomic categories have a subClass relationship to the
respective meta-category.</p>
        <p>Creating sub-term relations introduces an additional structure not originally
part of the GICS taxonomy, which is why we have decided to create an
additional OWL-DL resource dedicated to terminology and terminological relations.
For "isSubTermOf" relations it might be worth considering a transitive
characteristic, that is: "P(x,y) and P(y,z) implies P(x,z)"6, so each term y isSubTermOf
x, z isSubTermOf y, which implies that z isSubTermOf x. This allows us to state
that "Trucks" is a subterm of "Heavy Trucks" and at the same time of "Farm
Machinery and Heavy Trucks". This type of decomposition abides by the
terminological principles presented in ISO704:2009.
6 http://www.w3.org/TR/2004/REC-owl-guide-20040210/
#PropertyCharacteristics</p>
        <p>In order to account for the terminological relations and levels, pseudo-categories,
i.e., categories not originally part of the taxonomy and generated for
terminological reasons, have to be introduced to the original hierarchy. This is due to
the fact that terminological relations focus on hypernymic, meronymic relations.
For example, the subcategories of Energy all refer to either Energy, Oil, Gas,
or Consumable Fuels, all of which have to be introduced to the terminological
structure.</p>
        <p>The decomposition of e.g. "Oil Equipment and Gas Equipment and Oil
Services and Gas Services" centers around the constituent and divides the term at
the second "and". Accordingly, the definition of sub-industries has to be adapted
to the changed concept and added to the terminological entry. Information
extracted from definitions in the previous step are added to the terminology in
order to enlarge the contained vocabulary.</p>
        <p>A terminological representation of these natural language labels of an
ontology provides a highly beneficial overview of contained terms, their sub-terms
and relations between them. This facilitates duplicity and consistency
evaluations of labels. In combination with part of speech, morphological, and syntactic
information represented in lemon, there are various application scenarios from
facilitating the creation of new labels to machine translation.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Lexicalizing Ontology Labels</title>
      <p>Several approaches and models seek to provide a lexicon-ontology interface to
reduce the complexity of the ontology, while at the same time providing full
lexical information on the natural language representation of ontologies.</p>
      <p>
        The lemon model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was developed within the Monnet project7 and
represents textual and linguistic information contained in ontologies as external RDF
resource and establishes semantics by means of relating entries to the ontology,
i.e., the relation represents a means to disambiguate words. It adapts the main
principles of the Lexical Markup Framework (LMF) standardized in ISO 24613
and unites it with the core features of LexInfo in order to elaborate a specific
ontology-lexicon model. Lexicon objects describe syntactic and morpho-syntactic
properties, which are related to entities of the ontology via sense objects.
Subsequent to applying state labels to the entry, i.e., preferred, alternative, hidden
reference, the lexical sense links to the lexical entry, which might be decomposed
to its individual elements.
      </p>
      <p>Lexicons based on lemon can be created automatically by means of the lemon
generator8. The following lexicon was created on the basis of the seed ontology,
without any preprocessing and term alignment. As can be seen, decomposition
of the term "Energy Equipment &amp; Services" fails due to the ampersand and the
ellipsis.</p>
      <sec id="sec-6-1">
        <title>7 http://www.monnet-project.eu</title>
      </sec>
      <sec id="sec-6-2">
        <title>8 http://monnetproject.deri.ie/lemonsource/</title>
        <p>[lemon decomposition of "Energy Equipment &amp; Services"]
&lt;lemon:decomposition xmlns:ns0=
"http://www.w3.org/1999/02/22-rdf-syntax-ns#" ns0:parseType="Collection"&gt;
&lt;lemon:Component rdf:about="unknown:/GICS__en/</p>
        <p>Energy%2BEquipment%2B%26%2BServices#comp"&gt;
&lt;lemon:element rdf:resource="unknown:/GICS__en/Energy"/&gt;
&lt;/lemon:Component&gt;
&lt;lemon:Component rdf:about="unknown:/GICS__en/</p>
        <p>Energy%2BEquipment%2B%26%2BServices#comp2"&gt;
&lt;lemon:element rdf:resource="unknown:/GICS__en/Equipment"/&gt;
&lt;/lemon:Component&gt;
&lt;lemon:Component rdf:about="unknown:/GICS__en/</p>
        <p>Energy%2BEquipment%2B%26%2BServices#comp3"&gt;
&lt;lemon:element rdf:resource="unknown:/GICS__en/Services"/&gt;
&lt;/lemon:Component&gt;
&lt;/lemon:decomposition&gt;</p>
        <p>The application of off-the-shelf NLP tools to labels in fact negatively
influences the efficiency of an automated lemon based lexicalization process of
labels, as most commonly used tools are not in the position to handle such
types of (mainly nominal) ellipsis. Considering the fact that ontology labels to a
large extend only consist of nouns and noun compounds, the issue is a vital one.
We apply the process of lexicalization to the annotation property rdfs:label
available in all languages covered in the GICS ontology, namely German,
English, Spanish. For this purpose we use lemon for the representation of linguistic
information added to these labels and linking to the original ontology elements.</p>
        <p>Lexicalization supports the decomposition of terms into sub-terms, that is
it facilitates the application of patterns to detect cross-lingual alignments at
the level of components of terms/labels. The linguistic information in the lemon
representation is being used for consolidation. However, we consider the
decomposition of terms to be part of the terminological level, thus, introducing the
terminological resource for GICS in section 4. The example below shows the
encoding of constituency and part-of-speech information subsequent to our initial
preprocessing and term alignment process.
[Constituency and Part-Of-Speech Information of "Energy Equipment and Energy
Services" in lemon]
&lt;lemon:entry&gt;
&lt;lemon:LexicalEntry rdf:about="unknown:/lexicon__en/Energy+Equipment+and+Energy+Services"&gt;
&lt;lemon:sense&gt;
&lt;lemon:LexicalSense rdf:about="unknown:/lexicon__en/Energy%2BEquipment%2Band%
2BEnergy%2BServices#sense"&gt;
&lt;lemon:reference rdf:resource="http://www.semanticweb.org/ontologies/2012/8/GICS.owl#GICS101010"/&gt;
&lt;/lemon:LexicalSense&gt;
&lt;/lemon:sense&gt;
&lt;lemon:canonicalForm&gt;
&lt;lemon:Form rdf:about="unknown:/lexicon__en/Energy+Equipment+and+Energy+Services#form"&gt;
&lt;lemon:writtenRep xml:lang="en"&gt;Energy Equipment and Energy Services&lt;/lemon:writtenRep&gt;
&lt;/lemon:Form&gt;
&lt;/lemon:canonicalForm&gt;
&lt;lemon:phraseRoot&gt;
...
&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/NN"/&gt;
...
&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/NNS"/&gt;
...
&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/NP"/&gt;
...
&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/CC"/&gt;
...
&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/NN"/&gt;
...
&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/NP"/&gt;
...</p>
        <p>&lt;lemon:constituent rdf:resource="http://monnetproject.deri.ie/tags/penn/node/NP"/&gt;
...
&lt;/lemon:entry&gt;</p>
        <p>Due to space constraints the example only provides an English version,
however, the same improved results can be observed in German and Spanish. The
above example provides that lemon was in the position to decompose the term
and provide part-of-speech information, using the Penn Treebank Notation. The
lexical sense contains the link to the ontology and the original label as
"writtenRep", followed by information on individual elements of the term. This use case
is supposed to show that that such type of preprocessing and term alignment
has beneficial effects on ontology labels.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Concluding Remarks and Future Work</title>
      <p>We have preprocessed the labels of an ontology we derived from the GICS
taxonomy, for the time being in English, German, and Spanish. We showed a
patternbased approach to resolving compound ellipses, which can be generalized across
resources, such as the Industry Classification Benchmark (ICB). Thereby, we
created terms initially not contained in the resource and thus, inaccessible to
ontology-based tasks, such as Information Extraction. We aligned the terms
across all three languages. Terms contained in definitions were extracted and
additionally aligned to increase the overall quality and validate existing alignments.
Furthermore, the normalized and aligned terms were included in a
terminological resource in OWL-DL to provide explicit sub-term relations and decompose
complex, long labels. Lexicalizing the derived ontology with its processed labels
as opposed to the initial ontology served to exemplify the usefulness of such
(pre)processing of labels.</p>
      <p>As regards future work, we are currently investigating the applicability of
our pattern-based approach to other language families than Romance languages.
One further approach that might be interesting is the automation of the creation
of a terminological resource for the ontology, similar to the idea of the lemon
generator.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kless</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lindenthal</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebensohn</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Method of Re-Engineering a Thesaurus into an Ontology</article-title>
          . In: Donelly,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Guizzardi</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds):
          <source>Formal Ontology in Information Systems - Proceedings of the Seventh International Conference (FOIS</source>
          <year>2012</year>
          ), pp.
          <fpage>133</fpage>
          -
          <lpage>146</lpage>
          . IOS Press, Amsterdam (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Serra</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girardi</surname>
          </string-name>
          , R.:
          <article-title>A Process for Extracting Non-Taxonomic Relationships of Ontologies from Text</article-title>
          .
          <source>Intelligent Information Management</source>
          <volume>3</volume>
          ,
          <fpage>119</fpage>
          -
          <lpage>124</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D. F. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Using lexicosyntactic Ontology Design Patterns for Ontology Creation and Population</article-title>
          .
          <source>In Proceecdings of the Workshop on Ontology Patterns</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voelker</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Text2Onto - A Framework for Ontology Learning and Datadriven Change Discovery</article-title>
          .
          <source>In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB)</source>
          , Alicante, Spain (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Klaussner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhekova</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Lexico-Syntactic Patterns for Automatic Ontology Building</article-title>
          .
          <source>In: Proceedings of the International Conference Recent Advances in Natural Language Processing</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nichols</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bond</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanaka</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fujita</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flickinger</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Multilingual Ontology Acquisition from Multiple MRDs</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Ontology Learning and Population</source>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montlie-Ponsoda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Ontology Lexicalization: The lemon Perspective</article-title>
          . In: Slodzian,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Valette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>AussenacGilles</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Condamines</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rothenburger</surname>
            ,
            <given-names>B</given-names>
          </string-name>
          . (eds.):
          <source>Workshop Proceedings of the 9th International Conference on Terminology and Artificial Intelligence</source>
          , pp-
          <volume>33</volume>
          -36, Paris, France, INALCO, Paris (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lendvai</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Towards a standardized linguistic annotation of the textual content of labels in knowledge representation systems</article-title>
          .
          <source>In: LREC 2010- The seventh international conference on Language Resources and Evaluation. Interna- tional Conference on Language Resources and Evaluation (LREC-10)</source>
          , Malta (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Aussenac-Gilles</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szulman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Despres</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Terminae Method and Platform for Ontology Engineering from Texts</article-title>
          .
          <source>In Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge</source>
          . IOS Press, pp.
          <fpage>199</fpage>
          -
          <lpage>223</lpage>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Mej´ıa, M.E.,
          <string-name>
            <surname>Montiel-Ponsoda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Aguado de Cea, G.,
          <article-title>Go´mez-P´erez, A.: Ontology Localization</article-title>
          . In: Sua´
          <article-title>rez-</article-title>
          <string-name>
            <surname>Figueroa</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,G´omez
          <article-title>-P´erez,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Motta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds): Ontology Engineering in a Networked World. pp.
          <fpage>171</fpage>
          -
          <lpage>191</lpage>
          , Springer Berlin Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Sullivan</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Using Pseudo Feedback to Improve CrossLingual Ontology Mapping</article-title>
          .
          <source>In: Proceedings of the 8th Extended Semantic Web Conference (ESWC</source>
          <year>2011</year>
          ),
          <source>LNCS 6643</source>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>351</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. de Cea, G.A.,
          <string-name>
            <surname>G´</surname>
          </string-name>
          <article-title>omez-P´erez,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.M.</surname>
          </string-name>
          ,
          <article-title>Sua´rez-</article-title>
          <string-name>
            <surname>Figueroa</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          :
          <article-title>Natural Language-Based Approach for Helping in the Reuse of Ontology Design Patterns</article-title>
          .
          <source>In: Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns (EKAW</source>
          <year>2008</year>
          ), Acitrezza, Italy (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sua</surname>
          </string-name>
          <article-title>´rez-</article-title>
          <string-name>
            <surname>Figueroa</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Go´</surname>
            mez-P´erez,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Ferna´ndez-Lo´pez, M.: The NeOn Methodology for Ontology Engineering</article-title>
          . In: Su´
          <article-title>arez-</article-title>
          <string-name>
            <surname>Figueroa</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Go´</surname>
            mez-P´erez,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (eds): Ontology Engineering in a Networked World. pp.
          <fpage>9</fpage>
          -
          <lpage>34</lpage>
          , Springer Berlin Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Cimiano</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitelaar</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCrae</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sintek</surname>
            <given-names>M.:</given-names>
          </string-name>
          <article-title>LexInfo: A declarative model for the lexiconontology interface</article-title>
          .
          <source>Journal of Web Semanics</source>
          , Vol.
          <volume>9</volume>
          , No.
          <issue>1</issue>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>51</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vertan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , v.Hahn,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>Challenges fort he Multilingual Semantic Web</article-title>
          .
          <source>In Proceedings of the International Workshop on Semantic web Technologies for Machine Translation</source>
          , in conjunction with MT-
          <string-name>
            <surname>Summit</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phuket</surname>
          </string-name>
          , Thailand (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>