<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combining three Ways of Conveying Knowledge: Modularization of Domain, Terminological, and Linguistic Knowledge in Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thierry Declerck</string-name>
          <email>declerck@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dagmar Gromann</string-name>
          <email>dgromann@wu.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DFKI GmbH, Language Technology Department</institution>
          ,
          <addr-line>Stuhlsatzenhausweg 3, D-66123 Saarbruecken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vienna University of Economics and Business</institution>
          ,
          <addr-line>Nordbergstrasse 15, 1090 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recently, an overall trend towards increasing complexity of ontologies could be observed, not only in terms of domain modeling, where the complexity should correspond to the information to be modeled, but also as regards the addition of further information, which could be modeled as external resources to the domain model and linked to its relevant elements. This concerns the addition of terminological and linguistic information to the description of classes and properties of ontologies. To respond to this development, we propose a functional approach to the modularization of ontologies, based on terminological, linguistic, and conceptual functions each module ful lls. Only the conceptual elements and their structural properties should remain in the domain model, whereas the formalized terminology and linguistics are described in independent modules referencing the domain models. We provide examples of such complexity in Knowledge Representation systems, discuss related work, and present our approach to modularization in detail.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology</kwd>
        <kwd>terminology</kwd>
        <kwd>linguistics</kwd>
        <kwd>lexicon</kwd>
        <kwd>LabelNet</kwd>
        <kwd>SKOS</kwd>
        <kwd>TBX</kwd>
        <kwd>TMF</kwd>
        <kwd>lemon</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Nowadays, ontologies in general not only contain domain knowledge but further
information central to various tasks of ontology-based systems. For instance,
terminological and linguistic details are substantially di erent in nature from the
former and usually encoded in labels adjoined to IDs of classes and properties.</p>
      <p>There is a growing realization among many researchers that it might not
be the best practice to encapsulate such information within the description of
classes and properties of domain ontologies. Proposals have already been made
for the separation of terminology and lexicon from domain ontologies and for
strategies on the linking of this information to the elements of the domain model
in a more principled way [1{6]. Our approach to modularization can be
considered functional, as it is based on the functions the terminological and linguistic
elements used in the context of domain models ful ll. Several tasks such as
supporting Information Systems (IS), semantic annotation, lexicographic
applications, translation, localization among many others bene t from encapsulated
and reusable functions as presented herein.</p>
      <p>The need to cull content of labels in ontologies has increased with more
possibilities to linguistically process labels, adding linguistic annotations to their
textual content and thereby more complexity to the ontology. As a result,
reusability and sharing of the information accumulated is considerably impeded
since navigation through the entire ontology is required in order to nd
linguistically annotated terms that are relevant to ontology-driven applications.</p>
      <p>Therefore, following a series of similar proposals [1{3], extending and
specifying some points made, we suggest a strict modularization of domain ontologies
in a class hierarchy, a terminology, and a linguistic component, all represented
in RDF/OWL and related to each other by means of the Simple Knowledge
Organization Scheme of the W3C (SKOS) and similar linking mechanisms. Thus, a
lexical entry can be used by several terminologies, terms of which are employed
in di erent speci c ontologies.</p>
      <p>The proposed model largely facilitates the detection of interrelations among
ontologies, rendering the formation of new ontologies on the basis of existing
independently built ones faster and less complicated, because the model strips
ontologies to their core and most essential elements. It equally aims at more
compact terminologies and lexicons used in relation with domain modeling, since
variants of these can be more easily detected and collapsed onto harmonized sets.
Thus, our three-module system represents a mechanism for increasing exibility
in reusing ontologies as well as domain-speci c lexicons and terminologies.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Steadily Growing Complexity of Ontologies</title>
      <p>A class de ned in the RadLex ontology3 serves to exemplify the growing
complexity in ontologies. As can be seen in the example below, the class RID 13218
contains all information about its superordinate class and the related properties.
Furthermore, information on natural language expressions associated with the
class (synonym, NonEnglish Name, Preferred Name, ORIG Preferred Name,
Definition) as well as other knowledge sources, i.e., FMAID 67112, were accumulated
to form one single ontology class. The knowledge source refers to the
Foundational Model of Anatomy (FMA)4. Upon looking at the entry in the FMA
ontology, it can quickly be inferred that elements have just been duplicated, such as
the de nition, synonym, the (German) Non-English part and the label (preferred
name).
3 Version 3, http://bioportal.bioontology.org/ontologies/2027?p=terms
4 The URL for the indicated ID is http://bioportal.bioontology.org/ontologies/
44507/?p=terms&amp;conceptid=fma\%3AImmaterial_anatomical_entity
&lt;name&gt;RID13218&lt;/name&gt;
&lt;type&gt;anatomy_metaclass&lt;/type&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;FMAID&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;67112&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;Synonym&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;immaterial physical anatomical
entity&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;Non-English_name&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;immaterielles korperliches
anatomisches Wesen&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;Preferred_name&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;immaterial anatomical entity&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;ORIG_Preferred_Name&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;immaterial anatomical entity&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;Definition&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;Physical anatomical entity which is a
three-dimensional space, surface, line or point associated with a
material anatomical entity. Examples: body space, surface of heart,
costal margin, apex of right lung, anterior compartment of
right arm.&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;</p>
      <p>&lt;slot_reference&gt;Is_A&lt;/slot_reference&gt;
4 &lt;value value_type="class"&gt;RID13441&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;Has_Subtype&lt;/slot_reference&gt;
&lt;value value_type="class"&gt;RID13221&lt;/value&gt;
&lt;value value_type="class"&gt;RID13250&lt;/value&gt;
&lt;value value_type="class"&gt;RID13291&lt;/value&gt;
&lt;value value_type="class"&gt;RID13307&lt;/value&gt;
&lt;value value_type="class"&gt;RID15845&lt;/value&gt;
&lt;value value_type="class"&gt;RID13217&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;own_slot_value&gt;
&lt;slot_reference&gt;:ROLE&lt;/slot_reference&gt;
&lt;value value_type="string"&gt;Concrete&lt;/value&gt;
&lt;/own_slot_value&gt;
&lt;superclass&gt;RID13441&lt;/superclass&gt;
&lt;/class&gt;
[Example of growing complexity in ontologies by means of a RadLex class.]</p>
      <p>It seems that the RadLex ontology in this particular case reuses many
elements of FMA, as the focus of RadLex is rather on phenomena that can be
observed in correlation with speci c organs and not the organs themselves.
While this integration of terminological and linguistic knowledge in the eld of
anatomy is obviously a good move, re-using established terminology, it appears
that it could be more bene cial to provide this pool of information independently
from the ontologies modeling the domain. Clear links between the original
ontology and terms used as well as linguistic data substantially improve the level of
re-usability and readability of semi-structured or de nitional natural language
expressions across a large number of ontologies (or taxonomies).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Models</title>
      <p>
        Several approaches and models emphasize the importance of separating
conceptual, terminological, and lexical information. Some concentrate on the
terminological aspect [
        <xref ref-type="bibr" rid="ref6 ref9">6, 9</xref>
        ], while others focus on the lexical aspect [
        <xref ref-type="bibr" rid="ref10 ref4">10, 4</xref>
        ]. Buitelaar
et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] propose a model called LexInfo and suggest adding lexical,
morphosyntactic, and chunking information to the labels of ontology classes. The authors
design an OWL representation scheme for this set of linguistic information and
its linking to ontology classes. LexInfo supports in this among other aspects the
ontology-based semantic annotation of text.
      </p>
      <p>
        The Terminae [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] model suggests having two distinct, but interlinked high
levels of classes within ontologies: one for the hierarchy of concepts (and
associated relations), and one for (a list of) terms that point to the concepts they
denote. Thus, the concept level world gets cleaner and, for example, the very
cumbersome manner of encoding synonyms and other related terms as it is done
in RadLex (see RadLex example above) can be avoided, since synonyms are
encoded on the terminological level of the ontology. One major advantage of this
approach is that a subset of a terminology can more easily be identi ed and
reused in other (domain) ontologies. Reymont et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] provide an example of the
application of Terminae in the automotive domain. We note that in Terminae
the lemma and part-of-speech information is encoded within the term classes.
      </p>
      <p>
        A third approach, suggesting the merging of LexInfo and Terminae is CTL
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. CTL applies the full model of LexInfo to each word in a term. Thereby it
completely takes lexical information out of the descriptions of both domain and
term classes. This leads to three layers of description within the ontology, where
a meta-class has three main subclasses describing domain-class, terminology, and
linguistic hierarchies. The linguistic layer is based on and extends LexInfo.
However, CTL neither proposed a formalization nor an implementation, but instead
generally described such an approach. Both Terminae and CTL accumulate the
di erent modules (meta-classes) in one ontology, which supports an internal view
on the interaction between them, rendering linking of terms to other ontologies
more di cult.
      </p>
      <p>
        Some approaches emphasize the added bene t of a combination of all three
modules for speci c tasks (e.g. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). Bodenreider [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] makes use of existing
terminologies, ontologies, and lexicons for text mining in biomedicine. The emphasis
here is on already existing not perfectly compatible resources and the speci c
task of text mining.
      </p>
      <p>
        All approaches above agree that natural language processing and subsequent
linguistic annotation of the terms used in labels are necessary. In order to ensure
interoperability and re-usability, we use standardized models. The
Terminological Markup Framework (TMF), de ned in ISO 16642, ensures the re-usability
of terminological data across applications and the TermBase eXchange (TBX)
format of ISO 30042 represents a best practice for the practical exchange of
terminology. In line with ISO 704, we take a concept oriented approach towards
terminology, de ning terminology as concepts and their designations in a speci c
domain. Consequently, a term is a verbal designation denoting a general concept
in a speci c domain. The lemon model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we discuss below proposes a way to
obtain the results of natural language processing and annotation in a modular
RDF representation.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Modularization of Ontology Labels</title>
      <p>We propose LabelNet, a model that modularizes each lexical, linguistic, and
terminological function related to ontology labels, establishing a net of interlinked
terms with highly detailed information at each level. Term entries in a separate
OWL-DL encoded TBX- and TMF-compliant terminology relate semantically to
corresponding ontology classes or other conceptual elements and represent the
terminological information in detail. Each token5 of every term entry links to
a lexical entry, i.e., to a lemma6, syntactic information, and possible additional
resources such as further ontologies. Fig. 1 exempli es the structure of
LabelNet and shows how each of its modules can be interlinked using SKOS. The
example data has been taken from an ontology based on the Belgian National
Bank (BNB) taxonomy. Time concepts are linked to the W3C time ontology,
e.g., \more than one year" is an interval.</p>
      <p>
        The lexical entries are represented by using partially the lemon model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
which is described in the next section. The semantics of the list of tokens
contained in a term is established by referring to the ontology elements on the basis
of the term ID in the terminological entries.
      </p>
      <p>By separating the several layers into modules we achieve a more complete and
highly detailed perspective of ontology labels. The separation of lexical entries
and terms into lexicons and terminologies provides a higher degree of re-usability.
In addition, it facilitates a number of computations over these labels, such as
the usage of a certain lemma in terms pointing to concepts/role IDs.
lemon provides a model that can encode lexical information, using among others
RDF, URIs and linking mechanisms, so that language data can be exchanged
for example in the Linguistic Linked Open Data cloud7. The model aims at
a strict separation of `world knowledge' (describing domain objects that are
5 Tokens can be de ned as all meaningful elements in a text that result from the
process of tokenization, i.e., breaking up text into words, phrases, symbols or other
meaningful elements. The ordered collection box in Fig. 1 contains lists of tokens as
they appear in the terms used in the exempli ed labels.
6 A lemma represents the canonical form of a set of words called lexemes. For example,
accrue is the lemma of accrued, accruing, accrues, etc.
7 http://linguistics.okfn.org/resources/llod/
referenced by lexical objects) from `word knowledge' (describing lexical objects).
It is itself modular, having a core component that can be supplemented with a set
of modules to be used, extended, or ignored as required as illustrated in Fig. 2.
For example, a morpho-syntactic module can be attached to the core, specifying
speci c values for words used in the term, such as gender (feminine, masculine,
neuter), number (singular, plural, dual) and case (nominative, accusative, etc).
As this model in essence enables the creation of a lexicon for a given ontology, it
is called an ontology lexicon model. lemon as such does not provide an explicit
terminological level and refers directly from the lexical entry (in lemon a lexical
entry represents the whole content of a label) to an ontology element. In contrast,
LabelNet stresses the need and the practicability of a terminological level, we
re-use only the non-referential part of the lemon model.
4.2</p>
      <sec id="sec-4-1">
        <title>Lexicon Module</title>
        <p>While lemon o ers a highly interesting perspective, we think that there are still
some shortcomings, or possible improvements. A rst case is the fact that lemon
supports tokenization of terms included in labels, but not the establishment of
the relation between a token represented as a standalone lexical information
and the terms in which it can occur. Consequently, we propose an extension
that allows for a single lemma to include the information that it is part of a
term, in the position speci ed by the tokenization process. Thereby, the word
\Verbindlichkeiten" (German for amounts payable or liabilities ), for example,
will be linked to a (possibly) substantial number of terms used in various domains
(see Fig. 1). In doing so, we can generate a new kind of WordNet, taking into
account the inclusion of relevant words in a category of terms. Adopting the
idea of lemon, we model only lexical and linguistic information in this separate
module, linking to semantic values on the basis of the term ID, which itself links
to an ontology element.</p>
        <p>As a matter of fact, lemon entries allow only one semantic reference. The
lemon model represents the content of labels of one ontology at a time. But
frequently one and the same term is used in di erent (even related)
ontologies/taxonomies. In this case, two or more lemon entries would be required,
leading to redundant lexical/linguistic information only di ering in the entry
point to elements of di erent ontologies. One entry pointing to many ontologies
represents a more e cient approach. This would also ease generalization over
the semantics of such terms.</p>
        <p>In case di erent terms are used in concepts of di erent ontologies, but a
skos:exactMatch can be established between these concepts, lemon does not
provide the means to express the lexical semantic relationship between these
terms. As a result, SKOS has to be used as a linking means between those
concepts, thereby indirectly establishing the lexical semantic relationship, such
as synonymy, between di erent terms.</p>
        <p>Apart from linking di erent entries or elements of individual modules, certain
constraints need to be re ected. For example, in German and English only the
plural of "Verbindlichkeit/liability" might be used within the context of nancial
reporting. One possibility in lemon would be to select only terms in which the
word "Verbindlichkeit" appears in its plural form "Verbindlichkeiten". Another
possibility, which has our preference, would be to associate a feature structure
with the lemma we have extracted from the tokens of the ontology labels, in
which additional linguistic information can be encoded. Keeping thus the basic
lexicon small, i.e., containing mainly lemmas, and using well-de ned feature
structures as labels for the edges going from one lemma to a more complex term
containing the lemma. We suggest having the constraints expressed in SKOS,
linking between a lemma and a term (see Fig. 1):
lemma:Verdbindlichkeit -&gt; [plural, feminine, nominative case] -&gt; t1(T3)</p>
        <p>The above line expresses that only the plural and nominative form of
\Verbindlichkeit", which is feminine, can be used in combination with a term (at least
the term \T3"') related to a business reporting ontology.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Terminology in OWL-DL</title>
        <p>
          Terminologies as such consist of terms denominating concepts, their de nitions
and concept relations. In case of SKOS, these elements are utilized towards
building controlled vocabularies, whereas the TermBase eXchange (TBX) format of
ISO TC 37 can be described as discourse-oriented terminology [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In controlled
vocabularies, terms have to be classi ed as preferred, synonyms being mapped to
preferred terms for retrieval purposes. In case of the discourse-related resources,
many synonyms are permitted and the attribute \preferred" can be assigned
for a prescriptive usage. Wright et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] state that terminologies always relate
to special language, \designating multiple preferred terms subject to multiple
pragmatic constraints". Thus, the former di ers from the latter in that it
represents varying conceptual information and semantics with a focus on information
retrieval, whereas discourse-oriented terminological resources are more adequate
for the purpose at hand.
        </p>
        <p>
          In our model the terminology is supposed to be reusable for other tasks
such as translation, ontology population, ontology building, ontology evolution
to name but a few. Instead of using status attributes such as preferred,
alternative, and hidden, TBX allows for the use of subset information such as project,
application, customer to clarify the di erence between synonyms [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Terminologies provide greater multiplicity than only rdfs:labels. Terms
and natural language information acquired for and within the process of ontology
building are often lost in the nal representation due to a required univocity of
each label. Constructing a net of ontology labels and their synonyms acquired
in the building of ontologies and extraction of information results in a
domainspeci c, formalized, and reusable resource for ontologies.</p>
        <p>Another reason for transferring natural language information from the
ontology to terminologies can be found in its ability to represent conceptual relations
di erent from ontological relations and thus, enhance the representation of
information with linguistic details. For example, a nancial reporting ontology
classi es liquid assets as sibling of key balance sheet gures, the latter of which
being the parent to assets. In contrast, hypernymic relations in the terminology
see assets as top node, whereas liquid assets is one of its children.</p>
        <p>
          TBX is an XML-encoded markup language for the interchange of
terminological information. Due to reasons of cardinality and variation its transformation
to RDF, i.e., SKOS, turns out to be di cult as described in detail in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Instead
of mapping TBX to RDF a member of the OWL family of languages is more
adequate to the task. The cardinality of OWL-Lite, however, is restricted to 0 and
1, which in case of many term entries might constitute a problem to be solved
with OWL-DL and its ability to allow arbitrary values for cardinality. All core
elements of the terminology are children of the top node owl:datcat to signify
that all subclasses are data categories and interlinked by means of properties
such as unionOf and owl:equivalentClass. A detailed description of
rendering TBX in OWL-DL would go beyond the scope of this paper, a representation
of terminology in OWL-DL is to be found in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
4.4
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Step by Step to Modularized Ontologies</title>
        <p>Our architectural decisions and selections have been described above, but the
speci cation of the process of obtaining each resource and achieving
modularization has yet to be detailed. The main input to building the initial ontology
is nancial information, such as annual reports of companies, reporting
standards (e.g. IFRS, GAAP, XBRL, etc.), stock exchange websites. We extract
details from the named sources and build an initial ontology. Furthermore, the
extracted information represents the input for the terminology, where all
synonyms are depicted. On the basis of the ontology and the terminology, the lexicon
is established. So at the core of the following steps lies the formalization of the
extracted knowledge in a domain ontology representing our input.
1. Extract labels/terms and linguistic analysis of terms (tokenization,
lemmatization, morphological analysis, tagging, parsing, etc).
2. Extract all lemmas, create or map to an existing lemma in a (multilingual)
lexicon to collect all lemmas that are used in all possible labels of all possible
ontologies.
3. Encode lemmas in lemon. Add a data structure on top of each lemma, which
lists all the tokens in all labels in which the lemma is reproduced. This
linking also re ects the morpho-syntactic features of the token according to
its analysis.
4. Record all morpho-syntactic and lexico-syntactic information and patterns
in the corresponding addition to the linguistic module.
5. All identical labels are stored as a unique element in a terminology container.</p>
        <p>Specify term entries as to their conceptual relations and establish proper
de nitions or adapt de nitions existing in the ontology.
6. Each lemon represented term is associated with a data structure, i.e.,
terminology, that points to a variety of ontology elements in which those terms
have been introduced.
7. Eliminate all the labels and other linguistic information from the ontology,
attening class entries to domain speci c details.</p>
        <p>As a result, we have two interlinked ontologies of lemmas and terms as used
in ontologies/taxonomies. Thereby, we obtain a subset of language data, which
is used in domain ontologies. This can be used in order to analyze textual
documents and to annotate them semantically, populate ontologies, or support
translations with semantics to name but a few. On the other hand, we have a means
for testing ontology mapping or merging.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Linking all Modules</title>
      <p>The main linking device between ontologies is SKOS, such as the linking between
the nancial reporting ontology and the time ontology in the example provided
in Fig. 1. Especially with multilingual ontologies the individual concepts and
their matching by means of SKOS is important. Oftentimes, the pivotal role of
English as a source language leads to translations of labels instead of proper
localizations. In case of nancial reporting standards it is indispensable to take
local legal and political regulations a ecting the standard into consideration,
as the Belgian reporting standard in French might di er substantially from the
reporting standard used in France, especially in the use and interpretation of
applied French terms.</p>
      <p>By conceptualizing the knowledge in each language individually, the ontology
is actually created in each language and not simply translated. Thereby, we are in
the position of linking for example the English concept pfs_AmountsPayableMore
OneYear to the corresponding Italian concept itcc-ci_DebitiEsigibiliOltre
EsercizioSuccessivo by employing skos:exactMatch, which implicitly links
the term \Debiti Esigibili Oltre l'Esercizio Successivo" to the English term. For
existing monolingual ontologies this proposal might serve as a method for
merging several monolingual ontologies by establishing links.</p>
      <p>The domain ontology represents the starting point for the linking, containing
the initial SKOS links to the terminology, as the terminology might be treated
as ontology represented in OWL-DL. From the terminology references to the
lexicon holding all individual lemmas can be established. At the same time the
terminology represents the interface to lexico- and morpho-syntactic patterns as
well as syntactical information as such and all tokens, the result from the process
of tokenization.</p>
      <p>One part of the linking process is the representation of lexico- and
morphosyntactic patterns and information to support the evolution and extension of
existing domain ontologies. Thereby, the construction of new labels is largely
facilitated on the basis of the structure of existing labels.</p>
      <p>
        Syntactic information is represented by combining tokens and dependency
information of individual terms. Basically, syntactic categories are determined
on the basis of part of speech tagging and phrasal categories are used for syntactic
labels. For example N-NP = (length=1, token[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]=N, head=token[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) represents
the term \Verbindlichkeiten", which has the syntactic category \Noun" and the
phrasal category \Noun Phrase" with a length of one and token1. For the purpose
of standardization, these categories are mainly taken from the ISOcat database8.
      </p>
      <p>Especially for information extraction in combination with ontology
evolution the representation of lexico-syntactic patterns is essential, such as
lexicosyntactic ontology design patterns9 and the famous Hearst patterns. One
example for their use is the recognition of relations among entities during
information extraction. The following sentence has been taken from the
International Financial Reporting Standard (IFRS): \The statement of nancial
position (sometimes called the balance sheet) includes an entity's assets, liabilities
and equity as of the end of the reporting period"10. The lexico-syntactic
equivalence &lt;NP class&gt; call in passive &lt;NP class&gt; relation between \statement of
nancial position" and \balance sheet" enables us to realize that both terms
point to the same ontology concept as synonyms, however, including a
description of their di erence in the de nition of the terminology. The Hearst pattern
[NP0] [VBG include] [NP1] [NP2]... indicates that \assets, liabilities and
equity" can be modeled as subClassOf \statement of nancial position".
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Directions</title>
      <p>Modular and encapsulated domain, linguistic, and lexical functions for
knowledge modeling enable the support of several IS-related as well as Natural
Language Processing (NLP)-driven tasks. Each modularized resource, i.e., ontology,
terminology, or lexical information, can either be used as part of the interlinked
model we presented or as individual resource for other purposes. One aspect for
further improvement certainly is the linking device between the modules, which
could be optimized towards an enhanced interoperability with other systems and
among the resources themselves.</p>
      <p>Acknowledgements. The DFKI part of this work has been supported by the
Monnet project (Multilingual ONtologies for NETworked knowledge), co-funded
by the European Commission with Grant No. 248458, and by the TrendMiner
project, co-funded by the European Commission with Grant No. 287863.
8 http://www.isocat.org/
9 http://ontologydesignpatterns.org/wiki/Submissions:LexicoSyntacticODPs
10 http://www.ifrs.org/Home.htm</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wunner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arcan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Riain</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A Similarity Measure based on Semantic, Terminological and Linguistic Information</article-title>
          . In: Shvaiko,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.F</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 6th International Workshop on Ontology Matching</source>
          . Bonn,
          <string-name>
            <surname>Germany</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lendvai</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Towards a standardized linguistic annotation of the textual content of labels in knowledge representation systems</article-title>
          .
          <source>In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC '10)</source>
          , pp.
          <volume>3836</volume>
          {
          <issue>3839</issue>
          ,
          <string-name>
            <surname>ELRA</surname>
          </string-name>
          , Valetta, Malta (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Roche</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calberg-Challot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Damas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rouard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Ontoterminology: A new paradigm for terminology</article-title>
          . In: Dietz,
          <string-name>
            <surname>J.L.G</surname>
          </string-name>
          . (ed.)
          <source>International Conference on Knowledge Engineering and Ontology Development</source>
          . pp.
          <volume>321</volume>
          {
          <issue>326</issue>
          , Funchal - Madeira, Portugal (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spohr</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Linking Lexical Resources and Ontologies on the Semantic Web with Lemon</article-title>
          .
          <source>In: The Semantic Web: Research and Applications. Volume 6643 LNCS</source>
          , pp.
          <fpage>245</fpage>
          -
          <lpage>259</lpage>
          . Springer, Berlin, Germany (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Aussenac-Gilles</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szulman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Despres</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Terminae Method and Platform for Ontology Engineering from Texts</article-title>
          .
          <source>In: Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge</source>
          . IOS Press, pp.
          <volume>199</volume>
          {
          <fpage>223</fpage>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Reymonet</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aussenac-Gilles</surname>
          </string-name>
          , N.:
          <article-title>Modelling ontological and terminological resources in OWL-DL</article-title>
          . In:Buitelaar,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.S.</given-names>
            ,
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          R (eds)
          <source>OntoLex</source>
          <year>2007</year>
          , ISWC Workshop. Busan,
          <string-name>
            <surname>South-Korea</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Lexical, terminological and ontological resources for biological text mining</article-title>
          . In: Ananiadou,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>McNaught</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds)
          <article-title>Text mining for biology and biomedicine</article-title>
          , p.
          <fpage>43</fpage>
          -
          <lpage>66</lpage>
          ,
          <string-name>
            <surname>Artech</surname>
            <given-names>House</given-names>
          </string-name>
          , London, England (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Summers</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Crosswalking from Terminology to Terminology: Leveraging Semantic Information across Communities of Practice</article-title>
          . In: Witt,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sasaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Teich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Calzolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Wittenburg</surname>
          </string-name>
          , P. (eds)
          <article-title>Uses and usage of language resourcerelated standards</article-title>
          , LREC, Marrakech, Morocco (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Reymonet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aussenac-Gilles</surname>
          </string-name>
          , N.:
          <article-title>Ontology based information retrieval: an application to automotive diagnosis</article-title>
          . In: Nyberg,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Frisk</surname>
          </string-name>
          , E.m Krisander,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Aslund</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds) International Workshop on Principles of Diagnosis, pp.
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          , Stockholm, Sweden (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sintek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Towards linguistically grounded ontologies</article-title>
          . In: Aroyo,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Traverso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ciravegna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            , Hyvonen, E.,
            <surname>Mizoguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Oren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Sabou</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bontas</given-names>
            <surname>Simpler</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.P. (eds) ESWC</surname>
          </string-name>
          <year>2009</year>
          . pp.
          <volume>111</volume>
          {
          <issue>125</issue>
          , Springer Berlin/Heidelberg, Heraklion, Crete, Greece (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>