=Paper= {{Paper |id=Vol-201/paper-26 |storemode=property |title=Enabling Data Mining Systems to Semantic Web Applications |pdfUrl=https://ceur-ws.org/Vol-201/06.pdf |volume=Vol-201 |dblpUrl=https://dblp.org/rec/conf/swap/Lisi06 }} ==Enabling Data Mining Systems to Semantic Web Applications== https://ceur-ws.org/Vol-201/06.pdf
                       Enabling Data Mining Systems to
                         Semantic Web Applications
                                                        Francesca A. Lisi
                                   Dipartimento di Informatica, Università degli Studi di Bari,
                                             Via E. Orabona 4, I-70125 Bari, Italy
                                                     Email: lisi@di.uniba.it


   Abstract— Semantic Web Mining can be considered as Data          of frequent pattern discovery [3]. It implements a framework
Mining (DM) for/from the Semantic Web. Current DM systems           for learning Semantic Web rules [4] which adopts AL-log
could serve the purpose of Semantic Web Mining if they were         [5] as the Knowledge Representation and Reasoning (KR&R)
more compliant with, e.g., the standards of representation for
ontologies and rules in the Semantic Web and/or interoperable       setting and Inductive Logic Programming (ILP) [6] as the
with well-established tools for Ontological Engineering (OE) that   methodological apparatus.
support these standards. In this paper we present a middleware,        Semantic Web Mining [7] is a new application area which
SW ING, that integrates the DM system AL-Q U I N and the OE         aims at combining the two areas of Semantic Web [8] and Web
tool Protégé-2000 in order to enable AL-Q U I N to Semantic Web   Mining [9] from a twofold perspective. On one hand, the new
applications. This showcase suggests a methodology for building
Semantic Web Mining systems.                                        semantic structures in the Web can be exploited to improve
                                                                    the results of Web Mining. On the other hand, the results of
                       I. I NTRODUCTION                             Web Mining can be used for building the Semantic Web. Most
    Data Mining (DM) is an application area arisen in the 1990s     work in Semantic Web Mining simply extends previous work
at the intersection of several different research fields, notably   to the new application context. E.g., Maedche and Staab [10]
Statistics, Machine Learning and Databases, as soon as devel-       apply a well-known algorithm for association rule mining to
opments in sensing, communications and storage technologies         discover conceptual relations from text. Indeed, we argue that
made it possible to collect and store large collections of scien-   Semantic Web Mining can be considered as DM for/from the
tific and commercial data [1]. The abilities to analyze such data   Semantic Web. Current DM systems could serve the purpose
sets had not developed as fast. Research in DM can be loosely       of Semantic Web Mining if they were more compliant with,
defined as the study of methods, techniques and algorithms for      e.g., the standards of representation for ontologies and rules in
finding models or patterns that are interesting or valuable in      the Semantic Web and/or interoperable with well-established
large data sets. The space of patterns if often infinite, and the   tools for Ontological Engineering (OE) [11], e.g. Protégé-2000
enumeration of patterns involves some form of search in one         [12], that support these standards.
such space. Practical computational constraints place severe           In this paper we present a middleware, SW ING, that inte-
limits on the subspace that can be explored by a data mining        grates AL-Q U I N and Protégé-2000 in order to enable Seman-
algorithm. The goal of DM is either prediction or description.      tic Web applications of AL-Q U I N. This solution suggests a
Prediction involves using some variables or fields in the           methodology for building Semantic Web Mining systems, i.e.
database to predict unknown or future values of other variables     the upgrade of existing DM systems with facilities provided
of interest. Description focuses on finding human-interpretable     by interoperable OE tools.
patterns describing data. Among descriptive tasks, data sum-           The paper is structured as follows. Section II and III
marization aims at the extraction of compact patterns that          briefly introduce AL-Q U I N and Protégé-2000 respectively.
describe subsets of data. There are two classes of methods          Section IV presents the middleware SW ING. Section V draws
which represent taking horizontal (cases) and vertical (fields)     conclusions and outlines directions of future work.
slices of the data. In the former, one would like to produce
summaries of subsets, e.g. producing sufficient statistics or                      II. T HE DM SYSTEM AL-Q U I N
logical conditions that hold for subsets. In the latter case, one      The system AL-Q U I N [2] (a previous version is described
would like to describe relations between fields. This class of      in [13]) supports a variant of the DM task of frequent pattern
methods is distinguished from the above in that rather than         discovery. In DM a pattern is considered as an intensional
predicting the value of a specified field (e.g., classification)    description (expressed in a given language L) of a subset
or grouping cases together (e.g. clustering) the goal is to find    of r. The support of a pattern is the relative frequency of
relations between fields. One common output of this vertical        the pattern within r and is computed with the evaluation
data summarization is called frequent (association) patterns.       function supp. The task of frequent pattern discovery aims at
These patterns state that certain combinations of values occur      the extraction of all frequent patterns, i.e. all patterns whose
in a given database with a support greater than a user-defined      support exceeds a user-defined threshold of minimum support.
threshold. The system AL-Q U I N [2] supports the DM task           The blueprint of most algorithms for frequent pattern discovery
                                                                            form of DATALOG [16] that is obtained by using ALC concept
                                                                            assertions essentially as type constraints on variables. The
                                                                            portion K of B which encompasses the whole Σ and the
                                                                            intensional part (IDB) of Π is considered as background
                                                                            knowledge. The extensional part of Π is partitioned into
                                                                            portions Ai each of which refers to an individual ai of Cref .
                                                                            The link between Ai and ai is represented with the DATALOG
                                                                            literal q(ai ). The pair (q(ai ), Ai ) is called observation.
                                                                               The language L = {Ll }1≤l≤maxG of patterns allows for
                                                                            the generation of AL-log unary conjunctive queries, called
                                                                            O-queries. Given a reference concept Cref , an O-query Q to
                                                                            an AL-log knowledge base B is a (linked and connected)1
                                                                            constrained DATALOG clause of the form
 Fig. 1.   Organization of the hybrid knowledge bases used in AL-Q U I N.
                                                                                   Q = q(X) ← α1 , . . . , αm &X : Cref , γ1 , . . . , γn
                                                                            where X is the distinguished variable and the remaining
                                                                            variables occurring in the body of Q are the existential
is the levelwise search [3]. It is based on the following                   variables. Note that αj , 1 ≤ j ≤ m, is a DATALOG literal
assumption: If a generality order  for the language L of                   whereas γk , 1 ≤ k ≤ n, is an assertion that constrains a
patterns can be found such that  is monotonic w.r.t. supp,                 variable already appearing in any of the αj ’s to vary in the
then the resulting space (L, ) can be searched breadth-first               range of individuals of a concept defined in B. The O-query
starting from the most general pattern in L and by alternating                                   Qt = q(X) ← &X : Cref
candidate generation and candidate evaluation phases. In
particular, candidate generation consists of a refinement step              is called trivial for L because it only contains the constraint
followed by a pruning step. The former derives candidates                   for the distinguished variable X. Furthermore the language
for the current search level from patterns found frequent in                L is multi-grained, i.e. it contains expressions at multiple
the previous search level. The latter allows some infrequent                levels of description granularity. Indeed it is implicitly defined
patterns to be detected and discarded prior to evaluation thanks            by a declarative bias specification which consists of a finite
to the monotonicity of .                                                   alphabet A of DATALOG predicate names and finite alphabets
   The variant of the frequent pattern discovery problem which              Γl (one for each level l of description granularity) of ALC
is solved by AL-Q U I N takes concept hierarchies into account              concept names. Note that the αi ’s are taken from A and γj ’s
during the discovery process [14], thus yielding descriptions               are taken from Γl . We impose L to be finite by specifying
of a data set r at multiple granularity levels up to a maximum              some bounds, mainly maxD for the maximum depth of search
level maxG. More formally, given                                            and maxG for the maximum level of granularity.
                                                                               The support of an O-query Q ∈ Ll w.r.t an AL-log
  • a data set r including a taxonomy T where a reference
                                                                            knowledge base B is defined as
    concept Cref and task-relevant concepts are designated,
                                 l
  • a multi-grained language {L }1≤l≤maxG of patterns                         supp(Q, B) =| answerset(Q, B) | / | answerset(Qt , B) |
                   l
  • a set {minsup }1≤l≤maxG of minimum support thresh-
                                                                            where Qt is the trivial O-query for L. The computation of
    olds
                                                                            support relies on query answering in AL-log. Indeed, an
the problem of frequent pattern discovery at l levels of                    answer to an O-query Q is a ground substitution θ for the
description granularity, 1 ≤ l ≤ maxG, is to find the set                   distinguished variable of Q. An answer θ to an O-query Q is
F of all the patterns P ∈ Ll frequent in r, namely P ’s with                a correct (resp. computed) answer w.r.t. an AL-log knowledge
support s such that (i) s ≥ minsupl and (ii) all ancestors of               base B if there exists at least one correct (resp. computed)
P w.r.t. T are frequent. Note that a pattern Q is considered                answer to body(Q)θ w.r.t. B. Therefore proving that an O-
to be an ancestor of P if it is a coarser-grained version of P .            query Q covers an observation (q(ai ), Ai ) w.r.t. K equals to
   In AL-Q U I N (AL-log Q Uery I Nduction) the data set r is               proving that θi = {X/ai } is a correct answer to Q w.r.t.
represented as an AL-log knowledge base B and structured                    Bi = K ∪ Ai .
as illustrated in Figure 1. The structural subsystem Σ is based                The system AL-Q U I N implements the aforementioned lev-
on ALC [15] and allows for the specification of knowledge in                elwise search method for frequent pattern discovery. In par-
terms of classes (concepts), binary relations between classes               ticular, candidate patterns of a certain level k (called k-
(roles), and instances (individuals). In particular, the TBox T             patterns) are obtained by refinement of the frequent patterns
contains is-a relations between concepts (axioms) whereas the               discovered at level k − 1. In AL-Q U I N patterns are ordered
ABox M contains instance-of relations between individuals                   according to B-subsumption (which has been proved to fulfill
(resp. couples of individuals) and concepts (resp. roles) (as-
sertions). The relational subsystem Π is based on an extended                 1 For the definition of linkedness and connectedness see [6].
the abovementioned condition of monotonicity [13]). The
search starts from the most general pattern in L and iterates
through the generation-evaluation cycle for a number of times
that is bounded with respect to both the granularity level l
(maxG) and the depth level k (maxD).
   Since AL-Q U I N is implemented with Prolog, the internal
representation language in AL-Q U I N is a kind of DATALOGOI
[17], i.e. the subset of DATALOG6= equipped with an equational
theory that consists of the axioms of Clark’s Equality Theory
augmented with one rewriting rule that adds inequality atoms
s 6= t to any P ∈ L for each pair (s, t) of distinct terms
occurring in P . Note that concept assertions are rendered as
membership atoms, e.g. a : C becomes c C(a).
               III. T HE OE TOOL P ROT ÉG É -2000
                                                                                      Fig. 2.   Architecture of the OWL Plugin for Protégé-2000.
   Protégé-20002 [18] is the latest version of the Protégé line
of tools, created by the Stanford Medical Informatics (SMI)
group at Stanford University, USA. It has a community of
thousands of users. Although the development of Protégé has
historically been mainly driven by biomedical applications, the                Protégé-2000’s form editor, where users can select alternative
system is domain-independent and has been successfully used                    user interface widgets for their project. The user interface
for many other application areas as well. Protégé-2000 is a                  consists of panels (tabs) for editing classes, properties, forms
Java-based standalone application to be installed and run in a                 and instances.
local computer. The core of this application is the ontology                      Protégé-2000 has an extensible architecture, i.e. an architec-
editor. Like most other modeling tools, the architecture of                    ture that allows special-purpose extensions (aka plug-ins) to be
Protégé-2000 is cleanly separated into a model part and a                    easily integrated. These extensions usually perform functions
view part. Protégé-2000’s model is the internal representation               not provided by the Protégé-2000 standard distribution (other
mechanism for ontologies and knowledge bases. Protégé-                       types of visualization, new import and export formats, etc.),
2000’s view components provide a Graphical User Interface                      implement applications that use Protégé-2000 ontologies, or
(GUI) to display and manipulate the underlying model.                          allow configuring the ontology editor. Most of these plug-
   Protégé-2000’s model is based on a simple yet flexible                    ins are available in the Protégé-2000 Plug-in Library, where
metamodel [12], which is comparable to object-oriented and                     contributions from many different research groups can be
frame-based systems. It basically can represent ontologies                     found. One of the most popular in this library is the OWL
consisting of classes, properties (slots), property characteristics            Plugin [19].
(facets and constraints), and instances. Protégé-2000 provides                  As illustrated in Figure 2, the OWL Plugin extends the
an open Java API to query and manipulate models. An                            Protégé-2000 model and its API with classes to represent the
important strength of Protégé-2000 is that the Protégé-2000                OWL3 specification. In particular it supports RDF(S), OWL
metamodel itself is a Protégé-2000 ontology, with classes that               Lite, OWL DL (except for anonymous global class axioms,
represent classes, properties, and so on. For example, the de-                 which need to be given a name by the user) and significant
fault class in the Protege base system is called :STANDARD-                    parts of OWL Full (including metaclasses). The OWL API
CLASS, and has properties such as :NAME and :DIRECT-                           basically encapsulates the internal mapping and thus shields
SUPERCLASSES. This structure of the metamodel enables                          the user from error-prone low-level access. Furthermore the
easy extension and adaption to other representations.                          OWL Plugin provides a comprehensive mapping between its
   Using the views of Protégé-2000’s GUI, ontology designers                 extended API and the standard OWL parsing library Jena4 . The
basically create classes, assign properties to the classes, and                presence of a secondary representation of an OWL ontology
then restrict the properties facets at certain classes. Using                  in terms of Jena objects means that the user is able to invoke
the resulting ontologies, Protégé-2000 is able to automatically              arbitrary Jena-based services such as interfaces to classifiers,
generate user interfaces that support the creation of individuals              query languages, or visualization tools permanently. Based on
(instances). For each class in the ontology, the system creates                the above mentioned metamodel and API extensions, the OWL
one form with editing components (widgets) for each property                   Plugin provides several custom-tailored GUI components for
of the class. For example, for properties that can take single                 OWL. Also it can directly access DL reasoners such as
string values, the system would by default provide a text field                RACER [20]. Finally it can be further extended, e.g. to support
widget. The generated forms can be further customized with                     OWL-based languages like SWRL5 .

  2 The distribution of interest to this work is 3.0 (February 2005), freely     3 http://www.w3.org/2004/OWL/
                                                                                 4 http://jena.sourceforge.net
available at http://protege.stanford.edu/ under the Mozilla open-
source license.                                                                  5 http://www.w3.org/Submission/SWRL/
                                                                   the parameter settings. One of these findings is the pattern
                                                                   Q2 which turns out to be frequent because it has support
                                                                   supp(Q2 , BCIA ) = 13% (≥ minsup2 ). This has to be read
                                                                   as ’13 % of Middle East countries speak an Indoeuropean
                                                                   language’.                                                 ♦




              Fig. 3.   Architecture and I/O of SW ING.




       IV. E NABLING AL-Q U I N TO S EMANTIC W EB
           APPLICATIONS WITH P ROT ÉG É -2000

   To enable AL-Q U I N to Semantic Web applications we have
developed a software component, SW ING, that assists users
of AL-Q U I N in the design of Semantic Web Mining sessions.
As illustrated in Figure 3, SW ING is a middleware because it
interoperates via API with the OWL Plugin for Protégé-2000
to benefit from its facilities for browsing and reasoning on
OWL ontologies.                                                                 Fig. 4.   SW ING: step of concept selection.

Example IV.1. The screenshots reported in Figure 4, 5, 6
and 7 refer to a Semantic Web Mining session with SW ING
for the task of finding frequent patterns in the on-line CIA
World Fact Book6 (data set) that describe Middle East coun-
tries (reference concept) w.r.t. the religions believed and the
languages spoken (task-relevant concepts) at three levels of
granularity (maxG = 3). To this aim we define LCIA as the set
of O-queries with Cref = MiddleEastCountry that can be
generated from the alphabet A= {believes/2, speaks/2}
of DATALOG binary predicate names, and the alphabets
Γ1 = {Language, Religion}
Γ2 = {IndoEuropeanLanguage, . . . , MonotheisticReligion, . . .}
Γ3 = {IndoIranianLanguage, . . . , MuslimReligion, . . .}
of ALC concept names for 1 ≤ l ≤ 3, up to maxD = 5.
Examples of O-queries in LCIA are:
                                                                                Fig. 5.   SW ING: step of relation selection.
Qt = q(X) ← & X:MiddleEastCountry
Q1 = q(X) ← speaks(X,Y) &
       X:MiddleEastCountry, Y:Language                                A wizard provides guidance for the selection of the (hybrid)
Q2 = q(X) ← speaks(X,Y) &                                          data set to be mined, the selection of the reference concept
       X:MiddleEastCountry, Y:IndoEuropeanLanguage                 and the task-relevant concepts (see Figure 4), the selection
Q3 = q(X) ← believes(X,Y)&                                         of the relations - among the ones appearing in the relational
       X:MiddleEastCountry, Y:MuslimReligion                       component of the data set chosen or derived from them -
where Qt is the trivial O-query for LCIA , Q1 ∈ L1CIA , Q2 ∈       with which the task-relevant concepts can be linked to the
L2CIA , and Q3 ∈ L3CIA . Note that Q1 is an ancestor of Q2 .       reference concept in the patterns to be discovered (see Figure
   Minimum support thresholds are set to the following values:     5 and 6), the setting of minimum support thresholds for each
minsup1 = 20%, minsup2 = 13%, and minsup3 = 10%.                   level of description granularity and of several other parameters
After maxD = 5 search stages, AL-Q U I N returns 53 fre-           required by AL-Q U I N. These user preferences are collected
quent patterns out of 99 candidate patterns compliant with         in a file (see ouput file *.lb in Figure 3) that is shown in
                                                                   preview to the user at the end of the assisted procedure for
  6 http://www.odci.gov/cia/publications/factbook/                 confirmation (see Figure 7).
                                                                        IndoEuropeanLanguage @ Language.
                                                                        IndoIranianLanguage @ IndoEuropeanLanguage.
                                                                        MonotheisticReligion @ Religion.
                                                                        MuslimReligion @ MonotheisticReligion.

                                                                        and membership assertions such as
                                                                        ’IR’:AsianCountry.
                                                                        ’Arab’:MiddleEastEthnicGroup.
                                                                        <’IR’,’Arab’>:Hosts.
                                                                        ’Persian’:IndoIranianLanguage.
                                                                        ’ShiaMuslim’:MuslimReligion.
                                                                        ’SunniMuslim’:MuslimReligion.

                                                                        that define taxonomies for the concepts Country,
                                                                        EthnicGroup, Language and Religion. Note that Middle
                                                                        East countries (concept MiddleEastCountry) have been
                 Fig. 6.   SW ING: editing of derived relations.        defined as Asian countries that host at least one Middle
                                                                        Eastern ethnic group. In particular, Iran (’IR’) is classified
                                                                        as Middle East country.
                                                                          Since    Cref =MiddleEastCountry,         the    DATALOG
                                                                        database is partitioned according to the individuals
                                                                        of MiddleEastCountry. In particular, the observation
                                                                        (q(’IR’), AIR ) contains DATALOG facts such as
                                                                        language(’IR’,’Persian’,58).
                                                                        religion(’IR’,’ShiaMuslim’,89).
                                                                        religion(’IR’,’SunniMuslim’,10).

                                                                        concerning the individual ’IR’.                               ♦
                                                                           The output file *.db contains the input DATALOG database
                                                                        eventually enriched with an intensional part. The editing of
                                                                        derived relations (see Figure 6) is accessible from the step of
                                                                        relation selection (see Figure 5).
       Fig. 7.    SW ING: preview of the language bias specification.
                                                                        Example IV.3. The output DATALOG database cia exp1.db
                                                                        for Example IV.1 enriches the input DATALOG database
                                                                        cia exp1.edb with the following two clauses:
A. A closer look to the I/O                                             speaks(Code, Lang)← language(Code,Lang,Perc),
   The input to SW ING is a hybrid knowledge base that                                   c Country(Code), c Language(Lang).
consists of an ontological data source - expressed as a OWL             believes(Code, Rel)←religion(Code,Rel,Perc),
file - and a relational data source - also available on the Web                          c Country(Code), c Religion(Rel).
- integrated with each other.
                                                                        that define views on the relations language and religion
Example IV.2. The knowledge base BCIA for the Semantic                  respectively. Note that they correspond to the constrained
Web Mining session of Example IV.1 integrates an OWL                    DATALOG clauses
ontology (file cia exp1.owl) with a DATALOG database (file
                                                                        speaks(Code, Lang)← language(Code,Lang,Perc) &
cia exp1.edb) containing facts7 extracted from the on-line
                                                                                            Code:Country, Lang:Language.
1996 CIA World Fact Book. The OWL ontology8 contains
                                                                        believes(Code, Rel)←religion(Code,Rel,Perc) &
axioms such as
                                                                                            Code:Country, Rel:Religion.
AsianCountry @ Country.
                                                                        and represent the intensional part of ΠCIA .                  ♦
MiddleEastEthnicGroup @ EthnicGroup.
MiddleEastCountry ≡                                                        The output file *.lb contains the declarative bias specifica-
    AsianCountry u ∃Hosts.MiddleEastEthnicGroup.                        tion for the language of patterns and other directives.

  7 http://www.dbis.informatik.uni-goettingen.de/Mondial/               Example IV.4. With reference to Example IV.1, the content
mondial-rel-facts.flp                                                   of cia exp1.lb (see Figure 7) defines - among the other
  8 In the following we shall use the corresponding DL notation         things - the language LCIA of patterns. In particular the first
5 directives define the reference concept, the task-relevant         ...
concepts and and the relations between concepts.          ♦          hierarchy(c Language,3,c IndoEuropeanLanguage,
                                                                        [c IndoIranianLanguage, c SlavicLanguage]).
  The output files *.abox n and *.tbox are the side effect of
                                                                     hierarchy(c Language,3,c UralAltaicLanguage,
the step of concept selection as illustrated in the next section.
                                                                        [c TurkicLanguage]).
Note that these files together with the intensional part of the
                                                                     hierarchy(c Religion,3,c MonotheisticReligion,
*.db file form the background knowledge K for AL-Q U I N.
                                                                        [c ChristianReligion, c JewishReligion, c MuslimReligion]).
B. A look inside the step of concept selection                       for the layer T 3 .                                              ♦
   The step of concept selection deserves further remarks                                                                        OI
because it actually exploits the services offered by Protégé-        Note that the translation from OWL to DATALOG          is
2000. Indeed it also triggers some supplementary computation         possible because we assume that all the concepts are named.
aimed at making a OWL background knowledge Σ usable                  This means that an equivalence axiom is required for each
by AL-Q U I N. To achieve this goal, it supplies the following       complex concept in the knowledge base. Equivalence axioms
functionalities:                                                     help keeping concept names (used within constrained DATA -
                                                                     LOG clauses) independent from concept definitions.
   • levelwise retrieval w.r.t. Σ
   • translation of both (asserted and derived) concept asser-
                                                                                             V. C ONCLUSION
      tions and subsumption axioms of Σ to DATALOGOI facts
The latter relies on the former, meaning that the results of the        The middleware SW ING supplies several facilities to AL-
levelwise retrieval are exported to DATALOGOI (see output            Q U I N, primarily facilities for compiling OWL down to
files *.abox n and *.tbox in Figure 3). The retrieval problem        DATALOG. Note that DATALOG is the usual KR&R setting
is known in DLs literature as the problem of retrieving all the      for ILP. In this respect, the pre-processing method proposed by
individuals of a concept C [21]. Here, the retrieval is called       Kietz [22] to enable ILP systems to work within the framework
levelwise because it follows the layering of T : individuals of      of the hybrid KR&R system CARIN [23] is related to ours
concepts belonging to the l-th layer T l of T are retrieved all      but it lacks an application. Analogously, the method proposed
together.                                                            in [24] for translating OWL to disjunctive DATALOG is far
                                                                     too general with respect to the specific needs of our applica-
Example IV.5. The DATALOGOI rewriting of the concept                 tion. Rather, the proposal of interfacing existing reasoners to
assertions derived for T 2 produces facts like:                      combine ontologies and rules [25] is more similar to ours in
c AfroAsiaticLanguage(’Arabic’).                                     the spirit. Furthermore, SW ING follows engineering principles
...                                                                  because it promotes the reuse of existing systems (AL-Q U I N
c IndoEuropeanLanguage(’Persian’).                                   and Protégé-2000) and the adherence to standards (either
...                                                                  normative - see OWL for the Semantic Web - or de facto - see
c UralAltaicLanguage(’Kazak’).                                       DATALOG for ILP). Finally the resulting artifact overcomes the
...                                                                  capabilities of the two systems when considered stand-alone.
c MonotheisticReligion(’ShiaMuslim’).                                In particular, AL-Q U I N was originally conceived to deal with
c MonotheisticReligion(’SunniMuslim’).                               ALC ontologies. Since OWL is equivalent to SHIQ [26] and
...                                                                  ALC is a fragment of SHIQ [21], the middleware SW ING
c PolytheisticReligion(’Druze’).                                     allows AL-Q U I N to deal with more expressive ontologies and
...                                                                  to face Semantic Web applications.
                                                                        For the future we plan to extend SW ING with facilities for
that are stored in the file cia exp1.abox 2.                         extracting information from semantic portals and for present-
   The file cia exp1.tbox contains a DATALOGOI rewriting             ing patterns generated by AL-Q U I N.
of the taxonomic relations of T such as:
hierarchy(c Language,1,null,[c Language]).                                                     R EFERENCES
hierarchy(c Religion,1,null,[c Religion]).                  [1] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds.,
                1                                               Advances in Knowledge Discovery and Data Mining. AAAI Press/The
for the layer T and                                             MIT Press, 1996.
                                                            [2] F. Lisi and F. Esposito, “ILP Meets Knowledge Engineering: A Case
hierarchy(c Language,2,c Language,                              Study.” in Inductive Logic Programming, ser. Lecture Notes in Artificial
   [c AfroAsiaticLanguage, c IndoEuropeanLanguage, . . .]).     Intelligence, S. Kramer and B. Pfahringer, Eds. Springer, 2005, vol.
hierarchy(c Religion,2,c Religion,                              3625, pp. 209–226.
                                                            [3] H. Mannila and H. Toivonen, “Levelwise search and borders of theories
   [c MonotheisticReligion, c PolytheisticReligion]).           in knowledge discovery,” Data Mining and Knowledge Discovery, vol. 1,
               2                                                no. 3, pp. 241–258, 1997.
for the layer T and                                         [4] F. Lisi and F. Esposito, “An ILP Perspective on the Semantic Web,”
                                                                in Semantic Web Applications and Perspectives 2005, P. Bouquet and
hierarchy(c Language,3,c AfroAsiaticLanguage,                   G. Tumarello, Eds. CEUR Workshop Proceedings, 2005, http://ceur-
   [c AfroAsiaticLanguage]).                                    ws.org/Vol-166/.
 [5] F. Donini, M. Lenzerini, D. Nardi, and A. Schaerf, “AL-log: Integrating
     Datalog and Description Logics,” Journal of Intelligent Information
     Systems, vol. 10, no. 3, pp. 227–252, 1998.
 [6] S. Nienhuys-Cheng and R. de Wolf, Foundations of Inductive Logic
     Programming, ser. Lecture Notes in Artificial Intelligence. Springer,
     1997, vol. 1228.
 [7] G. Stumme, A. Hotho, and B. Berendt, “Semantic Web Mining: State of
     the art and future directions,” Journal of Web Semantics, vol. 4, no. 2,
     pp. 124–143, 2006.
 [8] T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,”
     Scientific American, vol. May, 2001.
 [9] R. Kosala and H. Blockeel, “Web Mining Research: A Survey,”
     SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest
     Group (SIG) on Knowledge Discovery & Data Mining, ACM, vol. 2,
     2000. [Online]. Available: citeseer.ist.psu.edu/kosala00web.html
[10] A. Maedche and S. Staab, “Discovering Conceptual Relations from
     Text,” in Proceedings of the 14th European Conference on Artificial
     Intelligence, W. Horn, Ed. IOS Press, 2000, pp. 321–325.
[11] A. Gómez-Pérez, M. Fernández-López, and O. Corcho, Ontological
     Engineering. Springer, 2004.
[12] N. F. Noy, R. Fergerson, and M. Musen, “The Knowledge Model of
     Protégé-2000: Combining Interoperability and Flexibility.” in Knowledge
     Acquisition, Modeling and Management, ser. Lecture Notes in Computer
     Science, R. Dieng and O. Corby, Eds. Springer, 2000, vol. 1937, pp.
     17–32.
[13] F. Lisi and D. Malerba, “Inducing Multi-Level Association Rules from
     Multiple Relations,” Machine Learning, vol. 55, pp. 175–210, 2004.
[14] J. Han and Y. Fu, “Mining multiple-level association rules in large
     databases,” IEEE Transactions on Knowledge and Data Engineering,
     vol. 11, no. 5, 1999.
[15] M. Schmidt-Schauss and G. Smolka, “Attributive concept descriptions
     with complements,” Artificial Intelligence, vol. 48, no. 1, pp. 1–26, 1991.
[16] S. Ceri, G. Gottlob, and L. Tanca, Logic Programming and Databases.
     Springer, 1990.
[17] G. Semeraro, F. Esposito, D. Malerba, N. Fanizzi, and S. Ferilli, “A
     logic framework for the incremental inductive synthesis of Datalog the-
     ories,” in Proceedings of 7th International Workshop on Logic Program
     Synthesis and Transformation, ser. Lecture Notes in Computer Science,
     N. Fuchs, Ed. Springer, 1998, vol. 1463, pp. 300–321.
[18] J. Gennari, M. Musen, R. Fergerson, W. Grosso, M. Crubézy, H. Eriks-
     son, N. F. Noy, and S. W. Tu, “The evolution of Protégé: An environment
     for knowledge-based systems development.” International Journal of
     Human-Computer Studies, vol. 58, no. 1, pp. 89–123, 2003.
[19] H. Knublauch, M. Musen, and A. Rector, “Editing Description Logic
     Ontologies with the Protégé OWL Plugin.” in Proceedings of the 2004
     International Workshop on Description Logics (DL2004), ser. CEUR
     Workshop Proceedings, V. Haarslev and R. Möller, Eds., vol. 104, 2004.
[20] V. Haarslev and R. Möller, “Description of the RACER System and
     its Applications.” in Working Notes of the 2001 International Descrip-
     tion Logics Workshop (DL-2001), ser. CEUR Workshop Proceedings,
     C. Goble, D. McGuinness, R. Möller, and P. Patel-Schneider, Eds.,
     vol. 49, 2001.
[21] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-
     Schneider, Eds., The Description Logic Handbook: Theory, Implemen-
     tation and Applications. Cambridge University Press, 2003.
[22] J. Kietz, “Learnability of description logic programs,” in Inductive Logic
     Programming, ser. Lecture Notes in Artificial Intelligence, S. Matwin
     and C. Sammut, Eds., vol. 2583. Springer, 2003, pp. 117–132.
[23] A. Levy and M.-C. Rousset, “Combining Horn rules and description
     logics in CARIN,” Artificial Intelligence, vol. 104, pp. 165–209, 1998.
[24] U. Hustadt, B. Motik, and U. Sattler, “Reducing SHIQ-description
     logic to disjunctive datalog programs.” in Principles of Knowledge
     Representation and Reasoning: Proceedings of the Ninth International
     Conference (KR2004), D. Dubois, C. Welty, and M.-A. Williams, Eds.
     AAAI Press, 2004, pp. 152–162.
[25] U. Assmann, J. Henriksson, and J.Maluszynski, “Combining safe rules
     and ontologies by interfacing of reasoners.” in Principles and Practice
     of Semantic Web Reasoning, ser. Lecture Notes in Computer Science,
     J. Alferes, J. Bailey, W. May, and U. Schwertel, Eds. Springer, 2006,
     vol. 4187, pp. 33–47.
[26] I. Horrocks, P. Patel-Schneider, and F. van Harmelen, “From SHIQ
     and RDF to OWL: The making of a web ontology language,” Journal
     of Web Semantics, vol. 1, no. 1, pp. 7–26, 2003.