<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Ontologies for the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Maedche</string-name>
          <email>ama@aifb.uni-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steffen Staab</string-name>
          <email>sst@aifb.uni-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute AIFB, University of Karlsruhe</institution>
          ,
          <addr-line>76128 Karlsruhe, Germany, Ontoprise GmbH, Haid-und-Neu Strasse 7, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1999</year>
      </pub-date>
      <abstract>
        <p>1. ONTOLOGIES FOR THE SEMANTIC WEB</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>tures for more conventional knowledge acquisition [7].
the fully automatic acquisition of knowledge by machines
proved extremely benecial for the knowledge acquisition
In contrast, in the Web environment that we encounter
combines knowledge acquisition with machine learning,
feedThis objective in mind, we have built an architecture that
Our notion of Ontology Learning aims at the integration of
ing on the resources that we nowadays nd on the syn tactic
chine learning techniques [33]. The drawback of these
aptask was the integration of knowledge acquisition with
maelaborated on methodologies for knowledge acquisition or
when building Web ontologies, the structured knowledge or
workbenches for dening kno wledge bases. A method that
meaning than the | very seminal | integration
architec[20] for the construction of ontologies for the Semantic Web.
ontology learning as semi-automatic with human
intervention of ontologies, in particular machine learning. Because
gineers had dealt with over the last two decades when they
a multitude of disciplines in order to facilitate the
construcWeb, viz. free text, semi-structured text, schema denitions
from which they induced their rules.
proaches, e.g. the work described in [21], however, was their
tion, adopting the paradigm of balanced cooperative modeling
data base is rather the exception than the norm. Hence,
inthat we ended up with were similar to what knowledge
entelligent means for an ontology engineer takes on a dieren t
In fact, these problems on time, diÆculty and condence
remains in the distant future, we consider the process of
rather strong focus on structured knowledge or data bases,
uation giving the ontology engineer a wealth of coordinated
hensive and transportable machine understanding.
Therethat structure underlying data for the purpose of
compreontology import, extraction, pruning, renemen t, and
evaltured, semi-structured and fully structured data in order to
fore, the success of the Semantic Web depends strongly on
plary techniques in the ontology learning cycle that we have
tionaries, or from legacy ontologies, and refer to some others
reverse engineering of ontologies from database schemata or
mentary disciplines that feed on dieren t types of
unstructhe proliferation of ontologies, which requires fast and easy
learning from XML documents.
work and architecture, we show in this paper some
exemthat need to complement the complete architecture, such as
Ontology Learning greatly facilitates the construction of
quisition bottleneck.
ontologies by the ontology engineer. The vision of ontology
tools for ontology modeling. Besides of the general
frameThe Semantic Web relies heavily on the formal ontologies
process. Our ontology learning framework proceeds through
engineering of ontologies and avoidance of a knowledge
acimplemented in our ontology learning environment,
Textlearning that we propose here includes a number of
complesupport a semi-automatic, cooperative ontology engineering
To-Onto, such as ontology learning from free text, from
dicoped our ontology engineering workbench, OntoEdit, we had
Web.
tologies still remains a tedious, cumbersome task resulting
Though ontology engineering tools have become mature
cheap and fast construction of domain-specic on tologies is
over the last decade (cf. [9]), the manual acquisition of
oneasily in a knowledge acquisition bottleneck. Having
develtions like
to face exactly this issue, in particular we were given
quescrucial for the success and the proliferation of the Semantic
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission by the authors.</p>
      <p>Semantic Web Workshop 2001 Hongkong, China</p>
      <p>Copyright by the authors.
ogy extraction phase major parts of the target ontology
(DTDs), etc. Thereby, modules in our framework serve
difto its prime purpose. Fourth, ontology renemen t prots
merging existing structures or dening mapping rules
bescope.
the following v e steps (cf. Figure 1):
needs to be pruned in order to better adjust the ontology
For instance, [26] describe how ontological structures
conare modeled with learning support feeding from web
docferent steps in the engineering cycle, which here consists of
the prime target application serves as a measure for
valiagain in this cycle, e.g. for including new domains into the
uments. Third, this rough outline of the target ontology
First, existing ontologies are imported and reused by
tained in Cyc are used in order to facilitate the
construcat a ne gran ularity (also in contrast to extraction). Fifth,
tion of a domain-specic ontology. Second, in the
ontolfrom the given domain ontology, but completes the ontology
dating the resulting ontology [31]. Finally, one may revolve
tween existing structures and the ontology to be established.
constructed ontology or for maintaining and updating its</p>
      <p>Semi-structured data may nally require and approaches.3
dieren t techniques: Structured data and meta data require
In the following we elaborate on our ontology learning
framework. Thereby we approach dieren t techniques for
prot from both.</p>
      <p>Text-To-Onto environment.
with this wealth. Hence, there comes the need for a range of
dieren t types of data, showing parts of our architecture, its
as corresponding references may be found in Section 9.
reverse engineering approaches, free text may contribute to
ontology learning directly or through information extraction
A general overview of ontology learning techniques as well
current status, and parts that may complement our current
as our own F-Logic based extensions of RDF(S). In addition,
primitives comprise:
cated graphical means for manual modeling and rening the
between the ontology engineering tool and the input
(ofontology engineer in importing existing ontology primitives,
plication debugging can be generated and then accessed via
ing the epistemological level rather than a particular
representation languages, such as OIL and DAML-ONT, as well
performed by the ontology engineer. Here, we oer
sophistinected with OntoEdit.
portal, we found that there was this large conceptual bridge
ultimately determined the target ontology. Into this void we
terface to support the ontology engineering process manually
schemata, databases on the Web, and Web ontologies, which
toEdit. However, given the task of constructing a knowledge
have positioned new components of our ontology learning
arten legacy data), such as Web documents, Web document
As core to our approach we have built a graphical user
inresentation language. However, the ontological structures
The sophisticated ontology engineering tools we knew, e.g.
chitecture (cf. Figure 2). The new components support the
extracting new ones, pruning given ones, or rening with
additional ontology primitives. In our case, the ontology
our F-Logic inference engine, that is directly con- SilRi4,
executable representations for constraint checking and
aptems [9], would oer capabilities roughly comparable to
Onthe Protege modeling environment for knowledge-based
sysnal on tology. Dieren t views are oered to the user
targetbuilt there may be exported to standard Semantic Web
reptured provisioning and accessing of data [29, 30]. Knowledge
ten unknown, terrain. For instance, a knowledge portal
tions (DTDs), and free texts. Still worse, signican t parts of
possibly unforeseen a particular target appli- applications1,
structed from database schemata, a given product thesaurus
(like BMEcat), XML documents and document type
denitremendous eorts for engineering the conceptual structures
ative shopping in conjunction with manuals, reports and
edge, establishing means for providing new knowledge and
portals that structure Web content and that allow for
structhat underly existing warehouse databases, product
cataful input for the construction of the ontology. However, in
portals are information intermediaries for knowledge
accessedge portal consists of the tasks of structuring the
knowlthe portal lies in integrating legacy information as well as
cation remains the touchstone for a given ontology. In our
practice one needs comprehensive in order to deal support2
sions. Correspondingly, ontology structures must be
conThough ontologies and their underlying data in the
Seopinions about current electronic products. The creation of
in constructing and maintaining the ontology in vast,
ofA considerable part of development and maintenance of
accessing the knowledge contained in the portal.
ing and sharing on the Web. The development of a
knowlthese (meta-)data change extremely fast and, hence, require
mantic Web are envisioned to be reusable for a wide range of
may focus on the electronics sector, integrating
comparThus, very dieren t types of (meta-)data might be
usea regular update of the corresponding ontology parts.
logues, user manuals, test reports and newsgroup
discusthe background ontology for this knowledge portal involves
case, we have been dealing with ontology-based knowledge
3. AN ARCHITECTURE FOR ONTOLOGY</p>
      <p>LEARNING
a taxonomy of concepts with multiple inheritance
(heterarchy) HC;
4. COMPONENTS FOR LEARNING</p>
      <p>ONTOLOGIES
links between these entities. An existing ontology denition
number of sets of concepts, relations, lexical entries, and
vary from one type of input to the next, there is also
conprocessed input data. While specic algorithms ma y greatly
As described above an ontology may be described by a
siderable overlap concerning underlying learning approaches
various algorithms working on this denition and the
pre(including L; C; R; A; F; G) may be acquired using HC; HR;
4.1 Management component
4.2 Resource processing component
4.3 Algorithm Library
Lexicon 1</p>
      <p>...</p>
      <p>Lexicon n</p>
      <p>O2</p>
      <p>Ontology</p>
      <p>Engineer
5. IMPORT &amp; REUSE
6.1 Lexical Entry &amp; Concept Extraction
with respect to user requirements plays a major role for the
crete Semantic Web application, e.g. log les of user queries
tion as for renemen t. However, during renemen t one must
(cf. reference [11] in survey, Table 1). They have introduced
In principle, the same algorithms may be used for
extracor generic user data. Adapting and rening the ontology
nections into the ontology, while extraction works more often
renemen t phase may use data that comes from the
conconsider in detail the existing ontology and the existing
contarget ontology and the support of its evolving nature. The
than not practically from scratch.
eling of the overall ontology (or at least of very signican t
tion. While extracting serves mostly for cooperative
modexists rather on a sliding scale than by a clear-cut
distincRening plays a similar role as extracting. Their dierence
chunks of it), the renemen t phase is about ne tuning the
A prototypical approach for renemen t (though not for
acceptance of the application and its further development.
extraction!) has been presented by Hahn &amp; Schnattinger
a methodology for automating the maintenance of
domainextraction of ontologies considerably pull the lever of the
son et. al. [26] have described strategies that leave the user
There are at least two dimensions to look at the
problem of pruning. First, one needs to clarify how the pruning
scale into the imbalance where out-of-focus concepts reign.
concept or a relation) aects the rest. For instance,
Peterfor the domain model on the one hand appears to be
pracof its focus. The import &amp; reuse of ontologies as well as the
application data. Given a set of application-specic
docuof particular parts of the ontology (e.g., the removal of a
Second, one may consider strategies for proposing ontology
ments there are several strategies for pruning the ontology.
balance between completeness and scarcity of the domain
iting with regard to expressiveness. Hence, what we strive
targeting the scarcest model on the other hand is overly
limTherefore, we pursue the appropriate diminishing of the
onitems that should be either kept or pruned. We have
investically inmanagable and computationally intractable, and
of terms (cf. reference [15] in survey, Table 1).</p>
      <p>We aim at a model that captures a rich conceptualization
model. It is a widely held belief that targeting completeness
tology in the pruning phase.</p>
      <p>They are based on absolute or relative counts of frequency
A common theme of modeling in various disciplines is the
of the target domain, but that excludes parts that are out
tigated several mechanisms for generating proposals from
with a coherent ontology (i.e. no dangling or broken links).
for is the balance between these two, which is really working.</p>
      <p>7. PRUNING THE ONTOLOGY
9. RELATED WORK
plore and determine the right aggregation level of adding a
the ontology engineer as locatedIn, viz. events are located
properties, such as subPropertyOf(hasDoubleRoom,hasRoom)
in an area (thus extending L and F ). The user may add the
relation to the ontology, the user may browse the hierarchy
extracted relations to the ontology by drag-and-drop. To
exin dening appropriate subPropertyOf relations between
(thereby extending HR).
view on extracted properties as given in the left part of
Figure 4. This view may also support the ontology engineer</p>
      <p>8. REFINING THE ONTOLOGY
that one does not need perfect or optimal support for
coimportance of methods like ontology pruning and crawling of
itive import). However, it is not yet clear how the semantics
ture OIL or DAML-ONT with axioms, A) will require new
Semantic Web, because it propels the construction of
dooperative modeling of ontologies. At least according to our
First, with the XML-based namespace mechanisms the
noexperience \cheap" methods in an integrated environment
far restricted our attention in ontology learning to the
contion of an ontology with well-dened boundaries, e.g. only
may yield tremendous help for the ontology engineer.</p>
      <p>While a number of problems remain with the single
discithe Semantic Web to succeed. We have presented a
comprehensive framework for Ontology Learning that crosses the
Semantic Web may yield an \amoeba-like" structure
regarddenitions that are in one le, will disappear. Rather, the
other and import each other (cf. e.g. the DAML-ONT
primmain ontologies, which are needed fastly and cheaply for
of these structures will look like. In light of these facts the
plines, some more challenges come up regarding the
particceptual structures that are (almost) contained in RDF(S)
Ontology Learning may add signican t leverage to the
boundaries of single disciplines, touching on a number of
ular problem of Ontology Learning for the Semantic Web.
proper. Additional semantic layers on top of RDF (e.g.
fuchallenges. Table 1 gives a survey of what types of
techontologies will drastically increase still. Second, we have so
ing ontology boundaries, because ontologies refer to each
means for improved ontology engineering with axioms, too!
and engineering environment. The good news however is
niques should be included in a full-edged on tology learning</p>
      <p>Information Systems, 1(1), 1992.
[1] H. Assadi. Construction of a regional ontology from
Proceedings of Learning Language in Logic Workshop
Italy, 1998.
clustering method for verb frames and ontology
(LLL-2000), Lisbon, Portugal, 2000, 2000.
text and its use within a documentary system. In
[3] Paul Buitelaar. CoreLex: Systematic Polysemy and
Learning from parsed sentences with inthelex. In
[4] A. Doan, P. Domingos, and A. Levy. Learning Source
and corpus resources to sublanguages and applications,
Underspecic ation. PhD thesis, Brandeis University,
Descriptions for Data Integration. In Proceedings of
Proceedings of the International Conference on Formal
acquisition architectures. Journal of Intelligent
Translation, 8(1):175{201, 1993.
of selectional patterns in a sublanguage. Machine
acquisition. In LREC workshop on adapting lexical
Ontology and Information Systems - FOIS’98, Trento,
the International Workshop on The Web and
Granada, Spain, 1998.</p>
      <p>Mathematical Foundations. Springer, Berlin
[6] D. Faure and C. Nedellec. A corpus-based conceptual
[7] B. Gaines and M. Shaw. Integrated knowledge
[5] F. Esposito, S. Ferilli, N. Fanizzi, and G. Semeraro.</p>
      <p>Databases (WebDB-2000), 2000.
[2] R. Basili, M. T. Pazienza, and P. Velardi. Acquisition
Department of Computer Science, 1998.
[8] B. Ganter and R. Wille. Formal Concept Analysis:
10. CHALLENGES
11. REFERENCES
schemas into conceptual schemas. In M. Rusinkiewicz,
129{144, 1998.
ontology acquisition from a corporate intranet. In
[18] A. Mikheev and S. Finch. A workbench for nding
extraction from an on-line dictionary. In Proceedings
[10] U. Hahn and M. Romacker. Content management in
of Fusion ’99, Sunnyvale CA, July 1999, 1999.</p>
      <p>Proceedings of KAW-99, Ban, Canada , 1999.
the design and evolution of protege-2000. In
Engineering, pages 115 { 122, Houston, 1994. IEEE
[16] J.-U. Kietz and K. Morik. A polynomial approach to
M. Musen. Knowledge modeling at the millennium |
[17] A. Maedche and S. Staab. Discovering conceptual
[11] U. Hahn and K. Schnattinger. Towards text
Intelligence, LNAI, 2000.</p>
      <p>Press.</p>
      <p>Machine Learning, 14(2):193{218, 1994.
structure in text. In In Proceedings of the 5th
International Conference on Computational
http://www-db.stanford.edu/SKC/publications.html.
| ANLP’97, March 1997, Washington DC, USA,
(ICGI-2000), to appear: Lecture Notes in Articial
[12] M.A. Hearst. Automatic acquisition of hyponyms from
editor, 10th International Conference on Data
large text corpora. In Proceedings of the 14th
[14] P. Johannesson. A method for transforming relational
Heidelberg - New York, 1999.</p>
      <p>Conference on Applied Natural Language Processing
International Conference on Grammar Inference
knowledge engineering. In Proc. of AAAI ’98, pages
Linguistics. Nantes, France, 1992.
[13] J. Jannink and G. Wiederhold. Thesaurus entry
[15] J.-U. Kietz, A. Maedche, and R. Volz. Semi-automatic
relations from text. In Proceedings of ECAI-2000. IOS
the constructive induction of structural knowledge.</p>
      <p>Data &amp; Knowledge Engineering, 35:137{159, 2000.
automatically transformed to text knowledge bases.
the syndikate system | how technical documents are
[9] E. Grosso, H. Eriksson, R. Fergerson, S. Tu, and
Press, Amsterdam, 2000.
Features used Prime purpose Papers
Syntax Extract Buitelaar [3], Assadi [1] and Faure &amp;
Nedellec [6]
Esposito et al. [5]
Table 1: Classication of On tology Learning Approaches
base
SemiKnowledge
schemata
Relational
schemata
structured
Dictionary
Data Correlation Relations Reverse engineering Johannesson [14] and Tari et al. [32]
Extract
Relations Reverse engineering
Logic Extract
Washington, USA, pages 2.1{2.10, 1998.</p>
      <p>Proceedings of the Conference on Applied Natural
Approach to Lexical Relationships. PhD thesis,
Algorithm and Tool for Automated Ontology Merging
Texas. MIT Press/AAAI Press, 2000.</p>
      <p>Fifth International Congress on Terminology and
[25] N. Fridman Noy and M. A. Musen. PROMPT:
methods, and applications. Academic Press, London,
[24] G. Neumann, R. Backofen, J. Baur, M. Becker, and
CACM, 38(11):39{41, 1995.
[28] S. Schlobach. Assertional mining in description logics.</p>
      <p>Storey, S. R. Tilley, and K. Wong. Reverse
and Alignment. In Proceedings of the 17th National
http://SunSITE.Informatik.RWTH[20] K. Morik. Balanced cooperative modeling. Machine
[27] P. Resnik. Selection and Information: A Class-based
USA, 1997.</p>
      <p>Engineering: A Roadmap. In Proceedings of the 22nd
University of Pennsylania, 1993.
pages 372{379, 1997.</p>
      <p>Knowledge acquisition and machine learning: Theory,
real world german text processing. In ANLP’97 |
(ICSE-2000), Limerick, Ireland. Springer, 2000.
[19] G. Miller. Wordnet: A lexical database for English.
between terms from technical corpora. In Proc. of the
[22] E. Morin. Automatic acquisition of semantic relations
Conf. on Articial Intelligenc e (AAAI’2000), Austin,
International Conference on Software Engineering
[23] H. A. Mueller, J. H. Jahnke, D. B. Smith, M.-A.
[21] K. Morik, S. Wrobel, J.-U. Kietz, and W. Emde.</p>
      <p>Language Processing, pages 208{215, Washington,
In Proceedings of the 2000 International Workshop on
1993.
[26] B. Peterson, W.A. Andersen, and J. Engel. Knowledge
C. Braun. An information extraction core system for
Learning, 11:217{235, 1993.
large ontologies. In Proc of KRDB 1998, Seattle,
Knowledge Engineering - TKE’99, 1999.</p>
      <p>Description Logics (DL2000), 2000.
bus: Generating application-focused databases from
Programming
Inductive
Concept Induction, Relations
A-Box mining
Association rules
Pattern-Matching
tion
Naive Bayes
Classication Syntax, Semantics
Page rank Tokens
Information extrac- Syntax
Frequency-based
Doan et al. [4]
Maedche &amp; Staab [17]
Kietz et al. [15]
[28]
al. [15]
Kietz &amp; Morik [16] and Schlobach
Jannink &amp; Wiederhold [13]
Hearst [12], Wilks [35] and Kietz et
Morin [22]
Schnattinger &amp; Hahn [11]</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>