<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierluigi D'Amadio Paola Velardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica</institution>
          ,
          <addr-line>via Salaria 113, Roma, Italy velardi ,damadio @di.uniroma1.it</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>$EVWUDFW This paper describes a methodology to detect the emergence (or the disappearance) of concepts through the observation of natural language communications (NLC). NLC are the documents, e-mails, written communications of any kind, that the members of a web community produce, access, and exchange for their purposes. The emergence of a new concept is suggested by the repetitive and consistent use of certain terms, while its intended meaning and appropriate conceptualization is obtained through a combination of text mining and algebraic methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Building a glossary of terms is often the first step to model emerging knowledge
domains and to favor interoperability between widely distributed communities of
interest, who upload, exchange and share relevant information through the web. Modeling
web communities in the IT society is significant for several reasons (Flake et al.
2002), that span from socio-cultural aims like the discovery of interdisciplinary
connections, to more practical applications like the development of focused search
engines, information filtering and information integration tools.</p>
      <p>However, glossaries capture a static portion of a reality that can be instead highly
dynamic, especially when modeling emerging domains. They are conceived and built
as an “a priori” agreement on common terms, a “frozen” picture of the knowledge and
competences of a community, that might suffer from a shortage of up-to date
descriptions (Staab, 2002) (Heflin and Hendler 2000). On the other side, glossary building is
a time consuming task, involving human effort to identify the relevant terms, agree on
their meaning, and (in WKHVDXUD) structure terms according to some taxonomic
ordering. In other terms, glossary creation is a consensus building process, often painful
and tedious. There is an inherent risk in re-opening the process again and again.</p>
      <p>The idea that we propose in this paper is that glossaries should be, as much as
possible, VHOI HYROYLQJ, continuously capturing the emergence of new concepts in dynamic
web communities. The key to obtain this is WR VLPXODWH the process of consensus
building in humans, through a constant monitoring of natural language
communications (NLC). NLC are the documents, e-mails, written communications of any kind,
that the members of a web community produce, access, and exchange for their
purposes. The emergence of a new concept is suggested by the repetitive and consistent use
of certain terms in NLC. The simulation of consensus can be achieved through VWDWL
VWLFDO LQGLFDWRUV, aimed at selecting terms with certain distributional properties across
the set of observed NLC.</p>
      <p>This paper describes a methodology aimed at implementing the view of a
selfevolving Glossary, detecting the emergence (or the disappearance) of concepts
through the observation of natural language communications. Experiments have been
made in several domains (art, tourism, web-learning, economy and finance), but in this
paper we concentrate on an experiment related with the modeling of a web community
organized through a Network of Excellence, INTEROP1, on enterprise
interoperability. Partners in INTEROP are academic and industrial institutions belonging to
different research areas, grouped in three domains of expertise: Ontology, Enterprise
Modeling, Architecture and Platforms. One of the main objectives of INTEROP is to
model partner’s competences in a Knowledge Map, indexed through a structured
taxonomy of interoperability concepts. The KMap2 aims at drawing a picture of the
status of research in interoperability and to keep this picture up-to-date in the future. This
provided us with an ideal test-bed for our methodology.</p>
      <p>&amp;ROOHFWLQJ (YLGHQFHV
The first step of the procedure is to collect a wide number of documents in written
form, which should represent at best ZKDW LV FRPPXQLFDWHG DQG H[FKDQJHG among the
members of a community. This is a partly manual, partly automated step, and its
complexity and involved effort strongly depends upon the community under consideration.
For the purpose of the self-evolving Glossary, documents must be stored with an
attached information about the source, authority and date of the acquired document. We
have not developed a specific document warehouse architecture, since this depends
upon the community document collection strategy and organization methods. In
INTEROP, a collaborative platform in Zope/Plone has been adopted by the network
partners (accessible from the INTEROP web site), which is also used to store
documents and related metadata.</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://interop-noe.org/</title>
      <p>2 details on the K-map can be found on the INTEROP web platform</p>
      <p>([WUDFWLRQ RI D 'RPDLQ /H[LFRQ
A GRPDLQ OH[LFRQ L is a list of terms t commonly used within a given community of
interest. The purpose of this phase is to automatically extract simple and multi-word
expressions from the documentation collected in phase 1. Terminological FDQGLGDWHV
are multi-word strings with a precise syntactic structure (e.g: compounds,
adjective+compound, etc) and certain distributional properties across the domain documents.
Examples in various fields are the following: in enterprise interoperability: HQWHUSULVH</p>
      <sec id="sec-2-1">
        <title>LQWUD RUJDQL]DWLRQDO LQWHJUDWLRQ, in tourism: JRXUPHW UHVWDXUDQW, in computer ne</title>
        <p>tworks: SDFNHW VZLWFKLQJ SURWRFRO, in art techniques: FKLDURVFXUR. Statistical and
natural language processing (NLP) tools are used for automatic extraction of terms (details
are in (Navigli and Velardi, 2004)).</p>
        <p>Statistical techniques are specifically aimed at simulating human consensus
in accepting new domain terms. Only terms uniquely and consistently3 found in
domain-related documents, and not found in other domains used for contrast, are
selected as candidates for the domain lexicon.</p>
        <p>([WUDFWLRQ RI 'HILQLWLRQV
Once an initial lexicon is extracted, the subsequent phase is to obtain a list of (one or
more) definitions for each term.</p>
        <p>Extraction of definitions, as well as the subsequent step, which is glossary parsing,
relies on a model of well-formed “ definitory” sentences, that we describe through a set
of UHJXODU H[SUHVVLRQV. Regular expressions, discussed later in a dedicated section,
have several purposes:
x To VHOHFW definitory sentences from those that are not. For example, many
definitory sentences have the pattern “ t is a Y” , but using this pattern
causes the extraction of a huge amount of non-definitory sentences, for
example: ³.QRZOHGJH PDQDJHPHQW LV D FRQWUDGLFWLRQ LQ WHUPV EHLQJ D KDQJR
YHU IURP DQ LQGXVWULDO HUD ZKHQ FRQWURO PRGHV RI WKLQNLQJ ´ Regular
expressions, along with statistical indicators, are used to prune this noise.
x To SUHIHU definitory sentences with a precise structure often used by
professional lexicographers, i.e. one that describes the meaning of a term by
means of its kind (the so-called JHQXV or K\SHUQ\P ) followed by a
modifier (what GLIIHUHQWLDWHV the concept from its kind, the GLIIHUHQWLD). For
example: “ .QRZOHGJH PDQDJHPHQW LV WKH V\VWHPDWLF PDQDJHPHQW RI YLWDO
NQRZOHGJH DQG LWV DVVRFLDWHG SURFHVVHV RI FUHDWLQJ JDWKHULQJ RUJDQL
]LQJ GLIIXVLRQ´ where the kind is ³V\VWHPDWLF PDQDJHPHQW´ A non-well
3 Consistency of use across documents is measured through an entropy based measure called
domain consensus
4 In this paper NLQGBRI JHQXV and K\SHUQ\P will be used interchangeably to indicate the
category to which a concept belongs.</p>
        <p>formed definition, where no kind is provided, is: “ 7KH FRUH LVVXH RI NQR</p>
      </sec>
      <sec id="sec-2-2">
        <title>ZOHGJH PDQDJHPHQW LV WR SODFH NQRZOHGJH XQGHU PDQDJHPHQW UHPLW WR</title>
        <p>JHW YDOXH IURP LW´ where no kind is explicitly provided.
x To SDUVH definitory sentences in RUGHU to extract the NLQG information, and
possibly more.</p>
        <p>([WUDFWLQJ 'HILQLWLRQV IURP *ORVVDULHV
Google recently provided a new search feature, called “ GHILQH:” which can be used to
search definitions of terms on web glossaries. However, using this search facility in an
unconstrained way may cause the retrieval of a large number of often noisy (not
pertinent to the domain) definitions. We defined the following algorithm to select pertinent
definitions:</p>
        <p>1) From the set of word components forming the extracted lexicon L of a domain
D, learn a probabilistic model of the domain, i.e. assign a probability of occurrence to
each word component. More precisely, let L be the lexicon of extracted terms, LT the
set of word components appearing in L, and let
((3(Z))</p>
        <p>IUHT(Z)
¦ IUHT(Z )
be the estimated probability of w in D, where wLT and the frequencies are
computed in L. For example, if L=[GLVWULEXWHG V\VWHP LQWHJUDWLRQ LQWHJUDWLRQ PHWKRG]
then LT=[GLVWULEXWHG, V\VWHP, LQWHJUDWLRQ, PHWKRG] and E(P(LQWHJUDWLRQ))=2/5
2) Search the terms in L using the Google “ GHILQH ´ feature. Select only those
definitions def(t), tL, with the following features:</p>
        <p>a) Domain pertinence: Let Wt be the set of words in def(t). Let W’t Wt be
the subset of words in def(t) belonging to LT. Compute:</p>
        <sec id="sec-2-2-1">
          <title>ZHLJK GHI W</title>
          <p>¦ ( 3 Z ORJ 1W QWZ
Z: W Z/7
where Nt is the number of
definitions extracted for the term t, and QWZ is the number of such definitions
including the word w. The log factor, called LQYHUVH GRFXPHQW IUHTXHQF\ in the information
retrieval literature, reduces the weight of words that have a very high probability of
occurrence in any definition (e.g. V\VWHP).</p>
          <p>Definitions are ordered according to their weight. The first k definitions are selected,
according to a threshold computed for each t5: ZHLJK GHI W t -W</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5 We omit the details for sake of brevity</title>
      <p>b) Well formedness: apply a final filter to select those def(t) matching the “ JHQXV
GLIIHUHQWLD” style, expressed through a set regular expressions described in detail in
section 2.3.</p>
      <p>To compute the performance of this method in the worst ambiguity conditions, we
selected 10 very ambiguous single-word terms in the INTEROP single word lexicon
LT (including over 1000 words). Three evaluators marked the relevant and not
relevant definitions (wrt the domain, i.e. enterprise interoperability). The inter-annotators
agreement was 84%, since the task is inherently complex and subjective. We
considered only the definitions marked in the same way by at least two annotators.
7DEOH</p>
      <p>Evaluation of definition selection algorithm.</p>
      <p>([WUDFWLQJ 'HILQLWLRQV IURP 1/&amp;
As remarked in the introduction, the Dynamic Glossary needs continuous updates, as
new terms and new fields emerge and are accepted within communities of interest.
Definitions of new terms in well established communities and a new terminology in an
emerging community are not found in glossaries, simply because of their novelty. But
it is often the case that the inventors of these terms, or their initial users, provide a
definition in their communications to the reference community. For example, the term
“ IHGHUDWHG RQWRORJ\” appeared only in 2001 in scientific literature (Stumme and
Maedche 2001), but the first explicit definition is in a paper6 dated 2004, that rephrases
the concept of IHGHUDWHG RQWRORJ\ proposed in a less explicit way in (Stumme and
Maedche 2001) “ )HGHUDWHG RQWRORJLHV DUH GLVWULEXWHG FRQQHFWHG RQWRORJLHV VRPHZKDW</p>
      <sec id="sec-3-1">
        <title>DQDORJRXV WR IHGHUDWHG GDWDEDVHV” .</title>
        <p>Identifying definitions in texts is much more complicated than choosing “ good”
definitions in glossaries. Definitions are buried in texts, and they cannot be recognized
by means of simple regular expressions, like “ X is a Y” , since as remarked at the
beginning of this section, these would produce an unacceptable amount of noise. We
devised the following procedure:</p>
        <p>Let L’ be the list of terms in L for which no definition was found in the previous
glossary search. For each t in L’, do the following:
1) Extract from the community-provided documents first, and from the web after
(only in case of unsuccessful search), a set of sentences including t. This
implies some amount of pre-processing, like the treatment of various format, like
KWPO, GRF and SGI. In case of web search, it is also necessary to handle
limitations imposed by most search engines to multiple queries.</p>
        <p>A first filtering is applied, using regular expressions that match patterns like “ W</p>
      </sec>
      <sec id="sec-3-2">
        <title>LV” “ W GHILQHV” “ W UHIHUV” etc.</title>
        <p>2) A second filter selects sentences which include, besides t, some of the words in
LT (the set of word components appearing in L). The same probabilistic filter
as in step 2a) of previous section is applied, with a small variation:</p>
        <p>¦ ( 3 Z ORJ 1W QWZ</p>
        <p>Z: W Z/7
ZHLJK GHI W D ¦ ( 3 Z
Z/7 Z W
The additional sum in this formula assigns a higher weight to those sentences
including some of the components of the term t to be defined, e.g. “ 6FKHPD LQ
WHJUDWLRQ is &gt;the process by which schemata from heterogeneous databases are
conceptually integrated into a single cohesive schema.@”
3) Finally, the well-formedness criterion of previous section 2b is applied.</p>
        <p>Terms are again selected according to a varying threshold, but, in this case, the
threshold must be tuned for high recall, rather than high precision. In fact, for some
terms, there might be very few definitions in literature and it is important to capture
the majority of them.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6 http://www.meteck.org/AspectsOntologyIntegration.pdf</title>
      <p>This section adds further details on the definition and use of regular expressions. We
use regular expressions8 to select well-formed sentences and to extract kind-of
relations from natural language definitions. The components of a regular expression are
fixed words or word sequences, part of speech and syntactic chunks.</p>
      <p>At first, sentence FKXQNV (e.g. noun phrases NP, prepositional phrases PP, etc.) are
identified using an available syntactic parser, the TreeTagger9. For example, the
following regular expression is used to verify the well formedness criterion:
7 In INTEROP an initial glossary relative to educational objectives has been acquired and evaluated. The
interested reader might access on the web site the deliverable 10.1 to learn the details of this process. A
second, large scale (1800 terms) interoperability glossary has been acquired and will be fully evaluated by
the end of year 2 of the project.
8 http://www.oreilly.com/catalog/regex/chapter/ch04.html
9 TreeTagger is available at</p>
      <p>U = "^(PP)?(NP)+"</p>
      <p>This regular expression (see subsequent examples) prescribes a sentence structure
at the chunk level: a definitory sentence is formed by a facultative prepositional phrase
(^(PP)?) followed by the PDLQ QRXQ SKUDVH (NP), followed by anything else (+).</p>
      <p>When a sentence matches the well formedness and probabilistic criteria described
in previous section, other regular expressions are applied to extract additional
information.</p>
      <p>For example, the following regular expression at the word level is applied (with
others) on the main NP to separate candidate definitions from non-definitions in step 1
of section 2.3.2:</p>
      <p>S A Refers|Referring)\\sto\\s(((a|the)\\s)?(type|kind)\\sof\\s)?(.*)" If a sentence
is selected as being a definition, additional regular expressions are used to extract
from the main NP the NLQGBRI (K\SHUQ\P information.</p>
      <p>For example, consider the regular expression
U = "^(A|D)?((V|C|,|J|N|R)*)(N)".</p>
      <p>Symbols in r1 are part of speech tags (POS), e.g. article (A), verb (V), adjective
(J), etc.</p>
      <p>A sentence matching both U and U is:</p>
      <sec id="sec-4-1">
        <title>GRPDLQ PRGHO: “ ,Q WKH WUDGLWLRQDO VRIWZDUH HQJLQHHULQJ SHUVSHFWLYH D SUHFLVH UHS</title>
      </sec>
      <sec id="sec-4-2">
        <title>UHVHQWDWLRQ RI VSHFLILFDWLRQ DQG LPSOHPHQWDWLRQ FRQFHSWV WKDW GHILQH D FODVV RI H[LVW</title>
      </sec>
      <sec id="sec-4-3">
        <title>LQJ V\VWHPV ´</title>
        <p>When parsing with the TreeTagger we obtain:
6\QWDFWLF &amp;KXQNV: (PP 13 PP CNP RVP NP PP)
326: (PAJNNN AJ1 PNCNNWVANPJN)
The application of U returns:
K\SHUQ\P: representation
The bold POS (1) represents the fragment selected as the hypernym.
We then learn that:</p>
        <p>model 
domain
Appendix I highlights in bold the hypernym extracted from selected definitions.
Table 3 shows the performances in three domains.
o representa tion</p>
        <p>Precision and recall of the hypernymy extraction task in three domains.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Precision</title>
      <p>Recall</p>
      <p>Art
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html We
augmented TreeTagger with regular expressions that capture named entities of locations,
organizations, products, persons, and time expressions. This allows us to capture other relations besides
hypernymy, but this research is still in progress.</p>
      <p>&amp;UHDWLRQ RI D 7D[RQRP\
Parsing definitions allows it to structure the terms in T in taxonomic order. However,
ordering terms according to the hypernyms extracted from definitions has well-known
drawbacks. An interesting paper (Ide and Véronis, 1993) provides an analysis of
typical problems found when attempting to extract (manually or automatically) hypernymy
relations from natural language definitions, e.g. attachments too high in the hierarchy,
unclear choices for more general terms, or-conjoined hypernyms, absence of
hypernym, circularity, etc. These problems are more or less evident – especially
overgenerality – when analysing the term trees forest generated on the basis of glossary
parsing.</p>
      <p>To reduce these problems, we proceeded as follows:
1) First, we arrange the terms in T taxonomically according to simple VWULQJ LQFOX
VLRQ. String inclusion is a very reliable indicator of a taxonomic relation, though
it does not capture all possible relations. This step produces a forest of
subtrees.
2) Then, we use hypernymy information extracted from definitions to capture
additional taxonomic relations between terms DW WKH VDPH OHYHO RI JHQHUDOLW\ (e.g.
in the example above: UHSUHVHQWDWLRQ PRGHO VFKHPD RQWRORJ\ NQRZOHGJH</p>
      <sec id="sec-5-1">
        <title>GDWD LQIRUPDWLRQ).</title>
        <p>3) If terms have more than one selected definition, or have or-conjoined heads in
the main NP, more than one hypernym is extracted by the algorithm of section
2.3. However, we select only hypernyms belonging to the set of domain relevant
words LT. Hence for example, NQRZOHGJH has the following hypernyms: LQIRU
PDWLRQ, IDFW DQG UHODWLRQVKLS and PHDQLQJ. Only the first is selected.
4) After step 3, component terms of the sub-trees STi have one or more hypernym
associated. Given a term t: tltr (where tl and tr are left and right components of t,
e.g. t=HQWHUSULVH DSSOLFDWLRQ LQWHJUDWLRQ, tl =HQWHUSULVH DSSOLFDWLRQ, tr
=LQWHJUDWLRQ) we verify whether there is a multi-word term t’ : t’ lt’ r in the
taxonomy such that tr=t’ r and either WO NLQGB RI o WO or WO NLQGB RI o WO (e.g.
if t=VHUYLFH</p>
      </sec>
      <sec id="sec-5-2">
        <title>LQWHJUDWLRQ and t’ =DSSOLFDWLRQ LQWHJUDWLRQ, it holds that</title>
        <p>VHUYLFH NLQGB RI o DSSOLFDWLRQ ,
and
therefore</p>
        <p>VHUYLFH _ int HJUDWLRQ  _o DSSOLFDWLRQ _ int HJUDWLRQ ).</p>
        <p>Appendix II shows a small fragment of the complete INTEROP taxonomy10 (the
sub-trees rooted in LQWHJUDWLRQ) At the end of Appendix II we also show an excerpt of
the detected hypernymy relations, used in step 4.</p>
        <p>Ordering terms taxonomically is a highly subjective task, therefore is not easy to
evaluate the output of this phase. Golden standard are not available, especially in
subdomains. However, we did a small experiment: given the initial LQWHJUDWLRQ, LQWHURS
HUDELOLW\ and V\VWHP taxonomy, our method was able to detect 25 hypernymy relations,
e.g.
10 the taxonomy includes 1800 terms belonging to the three main domains of INTEROP, e.g.</p>
        <p>ontology, enterprise modeling, architectures and platforms.
tive. For example, in WordNet there is a direct hyperonymy relation between sense #1
of VFKHPD and sense#1 of UHSUHVHQWDWLRQ.</p>
        <p>The evaluation showed that there are around 33% matches with respect to a
“ golden standard” taxonomy like WordNet, but on the other side, WordNet is a
general purpose ontology, and some of the not-corresponding relations detected by our
methodology seem still very reasonable in the interoperability domain, as the reader
may verify evaluating the detected kind_of links in Appendix II. Notice that, as
expected, the major problem is the over-generality of certain hypernymy links (e.g.
everything is a “ system” ).</p>
        <p>In any case, our purpose here is not to fully overcome problems that are inherent
with the conceptually complex task of building a domain concept hierarchy. At the
end of this process we obtain, a forest of trees where nodes (the concepts) are named
as the corresponding terms in natural language, and the only semantic relation is
hypernymy, even though ongoing research for extracting additional relations is
progressing. Discrepancies and inconsistencies can be corrected by a team of human
specialists, who will verify and rearrange the nodes of the sub-tree forest.
$FNQRZOHGJHPHQWV
This work has been supported by the INTEROP network of Excellence
IST-2003508011.
11 http://www.wordnet.princeton.edu WordNet is the most widely used and cited lexicalized
computational ontology
(Heflin and Hendler, 2000) Heflin, J. and Hendler, J. '\QDPLF 2QWRORJLHV RQ WKH
:HE In: Proceedings of the Seventeenth National Conference on Artificial
Intelligence (AAAI-2000).</p>
        <p>(Kleinberg 1998) Kleinberg, J. $XWKRULWDWLYH VRXUFHV LQ D K\SHUOLQNHG HQYLURQ
PHQW. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.</p>
        <p>(Navigli and Velardi, 2004) Navigli, R. &amp; Velardi, P. (2004). /HDUQLQJ 'RPDLQ</p>
      </sec>
      <sec id="sec-5-3">
        <title>2QWRORJLHV IURP 'RFXPHQW :DUHKRXVHV DQG 'HGLFDWHG :HE 6LWHV. Computational</title>
        <p>Linguistics, MIT press, (50)2.</p>
        <p>(Staab, 2002) S. Staab, (PHUJHQW 6HPDQWLFV, IEEE Intelligent Systems, v.17 n.1,
p.78-86, January 2002</p>
        <p>(Stumme and Maedche 2001) 4. G Stumme, A Maedche, 2QWRORJ\ 0HUJLQJ IRU
)HGHUDWHG 2QWRORJLHV RQ WKH 6HPDQWLF :HE, Workshop on Ontologies and
Information Sharing, IJCAI
"$#&amp;% ’)(* #+-,&amp;* +-#. Def: A where the vertical boxes depict the workflow of core processes, and the horizontal
* ./)01 23* . Def: a containing a sequenced set of all groups/segments which relate to a functional business
(5436 * 7829#. Def: A body of designed for high reuse, with specific plugpoints for the functionality required
boxes depict business subsystems that control the lifecycles of key business objects
Weight : 0.1444115
area (or multi-functional business area) and applying to all messages defined for that area (or areas)
Weight : 0.12572457
for a particular system
Weight : 0.10959378
Def: A framework is an extensible structure for describing a set of concepts, methods, technologies, and
cultural changes necessary for a complete product design and manufacturing process
Weight : 0.07710117
Def: We use the term framework to refer to a structured collection of software building blocks that can be
used and customized to develop components, assemble them into an application, and run the application
Weight : 0.07184533
Def: A logical structure for classifying and organizing complex information
Weight : 0.059092086
Def: A set of object classes that provide a collection of related functions for a user or piece of software
Weight : 0.055604726
Def: The software environment tailored to the needs of a specific domain
Weight : 0.046193704
Def: A component that allows its functionality to be extended by writing plug-in modules ("framework
extensions")
(other definition follow...)
Example 1: selection of appropriate definitions from glossaries: “ IUDPHZRUN”
(selected sentences underlined, selected hypernym in bold)
$SSHQGL[ , 6HOHFWLRQ RI GHILQLWLRQV IURP ZHE
ZDUHKRXVHV
DQG GRFXPHQW
(% * +-23* % 49: Def: ontology alignment refers to the , where both the source and target ontology are known and
29+* 49/)23* .’)#.(5431 Def:Ontology alignment is the</p>
        <p>Def: Ontology ontology alignment is not valuable for its own sake, but is worthwhile only in the service of
some other function that requires it
Weight:0.03227434
mappings between the two ontologies are used as source for explanation
Weight:0.03170026
tional elements of heterogenous sytems
Weight:0.026186492
Def:Ontology alignment is a foundational problem area for semantic interoperability
Weight:0.0204144
Def:ontology alignment is extreme: terms from different ontologies are always assumed to mean different
things by default, and all ontology mapping is done by humans (implicitly, by putting them into the same
col- umn of a report)
Weight:0.020371715
Def:Ontology alignment is also crucial for reusing the existing ontologies and for facilitating their
interoperability
Weight:0.01861836
Def:Ontology alignment is also very relevant in a Semantic Web context
Weight:0.016911233
(other definition follow...)
Example 2: selecting definitory from non-definitory sentences in free texts: “ RQWRORJ\
DOLJQPHQW” (selected sentences underlined, selected hypernym in bold)
$SSHQGL[ ,, $Q H[FHUSW RI VXE WUHHV H[WUDFWHG IURP WKH
,17(523 GRPDLQ
content_integration</p>
        <p>multilingual_content_integration
enterprise_information_integration</p>
        <p>legacy_enterprise_information_integration</p>
        <p>intelligent_information_integration
ontology_based_integration
business_process_support_integration
database_integration
data_automatic_integration</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>