=Paper=
{{Paper
|id=Vol-1665/paper5
|storemode=property
|title=Induction of Terminological Cluster Trees
|pdfUrl=https://ceur-ws.org/Vol-1665/paper5.pdf
|volume=Vol-1665
|authors=Giuseppe Rizzo,Claudia d'Amato,Nicola Fanizzi,Floriana Esposito
|dblpUrl=https://dblp.org/rec/conf/semweb/0001dFE16
}}
==Induction of Terminological Cluster Trees==
<pdf width="1500px">https://ceur-ws.org/Vol-1665/paper5.pdf</pdf>
<pre>
     Induction of Terminological Cluster Trees
       Preliminaries, Model, Method and Perspectives

                       Giuseppe Rizzo, Claudia d’Amato,
                      Nicola Fanizzi, and Floriana Esposito

                      LACAM – Dipartimento di Informatica
                    Università degli Studi di Bari “Aldo Moro”
                    Campus – Via Orabona 4, 70125, Bari, Italy
                           http://lacam.di.uniba.it


      Abstract. In this paper, we tackle the problem of clustering individ-
      ual resources in the context of the Web of Data, that is characterized
      by a huge amount of data published in a standard data model with
      a well-defined semantics based on Web ontologies. In fact, clustering
      methods offer an effective solution to support a lot of complex related
      activities, such as ontology construction, debugging and evolution, tak-
      ing into account the inherent incompleteness underlying the representa-
      tion. Web ontologies already encode a hierarchical organization of the
      resources by means of the subsumption hierarchy of the classes, which
      may be expressed explicitly, with proper subsumption axioms, or it must
      be detected indirectly, by reasoning on the available axioms that de-
      fine the classes (classification). However it frequently happens that such
      classes are sparsely populated as the hierarchy often reflect a view of the
      knowledge engineer prior to the actual introduction of assertions involv-
      ing the individual resources. As a result, very general classes are often
      loosely populated, but this may happen also to specific subclasses, mak-
      ing it more difficult to check the types of a resource (instance checking),
      even through reasoning services. Among the large number of algorithms
      proposed in the Machine Learning literature, we propose a clustering
      method that is able to organize groups of resources hierarchically. Specif-
      ically, in this work, we introduce a conceptual clustering approach that
      combines a distance measure between individuals in a knowledge base
      in a divide-and-conquer solution that is intended to elicit ex post the
      underlying hierarchy based on the actual distributions of the instances.


1   Introduction and Motivation

With the growth of the Web of Data, along with the Linked Data initiative [12],
a large number of datasets/vocabularies are being published on the ground of a
standard data model and connected within a uniform semantic space exploiting
RDF and the Web infrastructure. However, this huge quantity of data is also
known as inherently uncertain, for being often inconsistent/conflictual, vague
and, especially, incomplete [14].
    Web ontologies already encode a hierarchical organization of the data by
means of the subsumption hierarchy of the classes, which may be expressed
explicitly, with proper subsumption axioms, or it has to be determined indirectly,
by reasoning on the available axioms defining the various classes (classification
task). However it frequently happens that such classes are sparsely populated
as the hierarchy often reflects a view of the knowledge engineer prior to the
actual introduction of assertions involving the individual resources. As a result,
very general classes are often loosely populated, but this may happen also to
specific subclasses, hindering or, at least, making it difficult to check the types
of a resource (instance checking), even employing reasoning services. This may
have a strong negative impact on services based on query answering, which often
only rely on explicit assertions regarding the data.
    Machine Learning methods can be employed to elicit implicit pieces of knowl-
edge from the datasets, also in the face of such cases of inherent incompleteness,
and provide non-standard inference services [17]. In particular, in this paper we
will address conceptual clustering [18] as a means to exploit the data to detect
further classes that may arise with the underlying hierarchical structure. Specif-
ically, we introduce an approach that combines semantic distance measures over
the space of individuals together with a divide-and-conquer solution that is in-
tended to elicit in retrospection an additional class hierarchy to reflect the actual
distributions of the instances.
    Clustering is an unsupervised learning task aiming at partitioning a collec-
tion of objects into subsets or clusters, so that those within each cluster are
more closely related to one another than to the objects assigned to different
clusters [1]. In the Web of Data context, clustering can enable the definition of
new emerging concepts (concept formation) on the grounds of those employed in
a knowledge base; supervised methods can exploit these clusters to induce new
concept definitions or to refine existing ones (ontology evolution); intensionally
defined groupings may speed-up the task of search and discovery; clustering may
also induce criteria for ranking the retrieved resources [10, 9].
    An object is usually described by a fixed set of feature (attribute values)
and the most common notion of similarity between the objects is expressed in
terms of a distance function based on this set; for example, datasets made up of
objects described by tuples of numeric features and (extensions of) the Euclidean
distance are adopted to determine object and cluster similarity.
    An important difference among the various clustering techniques is related
to the type of membership that is adopted. In the simplest (crisp) case, e.g. k-
Means [16], cluster membership can be exclusive: each object belongs exactly to
one cluster. Extensions, such as fuzzy c-means [3] or EM [7], admit an overlap
between the clusters and a degree of membership (responsibility) of each object
to a cluster. Further extensions include non-flat clustering structures such as
those induced via hierarchical clustering [1].
    In this work, moving on to conceptual clustering, we will require that classes
with an intensional definition may account for and define these clusters. To
this purpose the algorithms should be able to exploit a background knowledge
that can be expressed using expressive representation languages1 , such as De-
scription Logics. Again, in such methods, clustering requires the definition of
(dis)similarity measure between the set of the objects to be clustered.
    We propose a generalized solution based on logical trees [4], namely Termi-
nological Cluster Trees, as an extension of terminological decision trees [8]. They
adopt a pseudo-metric defined over the space of individuals as a criterion to sep-
arate groups of objects rather than information gain or other notions of purity
devised for supervised concept learning and inductive classification methods, ter-
minological decision trees. The proposed solution provides intensional definitions
that can be used for describing the individuals in the cluster, and unlike other
methods [13, 11], does not have to resort to complex approximations such as the
most specific concept [2] as the representative of individuals on the conceptual
level. Even more so, terminological cluster trees can determine autonomously
the number of clusters to be generated, which is a required parameter for several
other methods having a strong impact on the quality of the partitions obtained.
    The rest of the paper is organized as follows: the next section illustrates
the basic notions about the underlying Description Logics foundations for the
intended representation and reasoning services; Sect. 3 introduces the problem
of clustering individuals of a knowledge base while Sect. 4 presents the approach
required for inducing terminological cluster trees. Finally, Sect. 5 illustrates the
conclusions and some possible extensions.


2   Basics
In the following, we will borrow notation and terminology from Description
Logics (DLs) [2] as the representation and reasoning services for the knowledge
bases in the Web of Data ultimately rely on such a family of languages. Hence
we will use the terms concept (description) and role as synonyms of class and
property 2 , respectively.
    In these languages, a domain is modeled through atomic concepts (classes)
NC and roles (relations) NR , which can be used to build complex descriptions re-
garding individuals (instances, objects), by using specific operators (complement,
conjunction and disjunction between concepts) that depend on the adopted lan-
guage. A knowledge base is a couple K = (T , A) where the TBox T contains
axioms concerning concepts and roles (typically inclusion axioms such as C v D)
and the ABox A contains assertions, i.e. axioms regarding the individuals (C(a),
resp. R(a, b)). The set of individuals occurring in A is denoted by Ind(A).
    The semantics of individuals, concepts, and roles is defined through interpre-
tations. An interpretation is a couple I = (∆I , ·I ) where ∆I is the domain of
the interpretation and ·I is a mapping such that, for each individual a, aI ∈ ∆I ,
for each concept C, C I ⊆ ∆I and for each role R, RI ⊆ ∆I ×∆I . The semantics
1
  In the past, various fragments of First-Order Logic have been adopted, such as
  Clausal Logics, especially in Inductive Logic Programming.
2
  Datatype properties, i.e. roles ranging on concrete domains will not be considered
  in this study.
of complex descriptions descends from the interpretation of the primitive con-
cepts/roles and of the operators employed, depending on the adopted language.
I satisfies an axiom C v D (C is subsumed by D) when C I ⊆ DI and an
assertion C(a) (resp. R(a, b)) when aI ∈ C I (resp. (aI , bI ) ∈ RI ). I is a model
for K iff it satisfies each axiom/assertion α in K, denoted with I |= α. When α
is satisfied w.r.t. these models, we write K |= α.
    We will be interested in the instance-checking inference service: given an
individual a and a concept description C determine if K |= C(a). Due to the
Open World Assumption (OWA), answering to a class-membership query is more
difficult w.r.t. ILP settings where the closed-world reasoning is the standard
form. Indeed, one may not be able to prove the truth of either K |= C(a) or
K |= ¬C(a), as there may be possible to find different interpretations that satisfy
either cases.


3   Conceptual Clustering for DL Knowledge Bases

As we are targeting conceptual clustering for DL Knowledge Bases, the problem,
in a simple formulation, may be formalized as follows:

Definition 3.1 (conceptual clustering – flat case).

Given:
     – a knowledge base K = hT , Ai
     – a set of training individuals TI ⊆ Ind(A)
Find:
     – a partition of TI in n pairwise disjoint clusters {C1 , . . . , Cn }
     – for each i = 1, . . . , n, a concept description Di that accounts for Ci , i.e.
       such that ∀a ∈ Ci :
        1. K |= Di (a)
        2. K |= ¬Dj (a) ∀j ∈ {1, . . . , n}, j 6= i

    Note that in this setting the number of clusters (n) is not required as a pa-
rameter. Condition 2. may be relaxed (e.g. K 6|= Dj (a)) to allow some overlap
between the clusters/concepts that may be further extended towards probabilis-
tic clustering methods and models [1].
    This problem can be regarded as a recursive one, as each cluster, in its
turn, might yield its internal partitioning and each sibling sub-cluster would be
characterized intensionally by its sub-class.
    The decision on whether to partition recursively a given cluster or not gener-
ally depends on cohesion metrics, assessing a measure of intra-cluster similarity
(within the cluster) and the w.r.t. the inter-cluster dissimilarity (w.r.t. the sib-
ling partitions).
4     Terminological Cluster Trees

The notion of terminological cluster tree extends logical clustering trees intro-
duced in [6] and learned through C0.5, a system derived from Tilde [4]. The
induction of the model combines elements of logical decision trees induction (i.e.
the approach based on recursive partitioning and exploiting of refinement oper-
ator for specializing concept descriptions) with other elements of instance-based
learning (i.e. the employment of a distance measure over the instance space).

Definition 4.1 (Terminological Cluster Trees). Given a knowledge base K,
a terminological cluster tree (TCT) is a (binary) logical tree where:

 – each leaf node stands for a cluster of individuals, C
 – each node contains a concept description D (over the signature of K);
 – each edge from an internal node corresponds to the outcome of the member-
   ship test of individuals w.r.t. the concept installed in the node3 .

Hence, a tree-node can be represented by a quadruple hD, C, Tleft , Tright i, indi-
cating the two subtrees connected by either edge.


                                                          Person


                              Person u ∃hasPublication.>                     C4


             Person u ∃hasPublication.(SWJ)              C3


                C1                      C2

Fig. 1. A simple TCT for describing individuals involved in the Semantic Web research
community


Example 4.1 (a TCT). Fig. 1 illustrates a simple example of a TCT for describ-
ing individuals in the academic domain. The root node of the tree contains the
concept Person. Two edges depart from this node: the left branch is used to
denote a positive membership of an individual w.r.t. the concept Person, while
3
    By convention the left branch is for positive instances and the right one is for negative
    instances.
     the right branch denote a negative membership w.r.t. the concept. On one hand,
     the right child of this node is a leaf that contains a cluster C4 composed by
     individuals that are not instances of the concept Person. On the other hand, the
     left child of the root node contains a further complex concept description. Again,
     there are two edges departing from this node: the right edge links the node to
     another leaf containing the cluster C3 of individuals denoting individuals that
     have no publication.                                                            t
                                                                                     u

         The details of the algorithms for (a) growing a TCT and (b) deriving inten-
     sional definitions are reported in the sequel.


     4.1     A Method for Inducing Terminological Cluster Trees

     A terminological cluster tree T is induced by means of a recursive strategy, which
     follows the same schema proposed for terminological decision trees (TDTs) [8].
     The sketch is reported in Alg. 1.


     Algorithm 1 Routines for inducing a TCT
 1 const ν, δ: thresholds
 2
 3 function induceTCT(I, C): TCT
 4 input I: set of individuals
 5         C: concept description
 6 begin
 7   T ← new TCT
 8   if stopCondition(ν, I) then
 9       T ← hnull, I, null, nulli
10   else
11       S ← ρ(C) {set of candidate specializations}
12       E ∗ ← selectBestConcept(S, I)
13       hIleft , Iright i ← split(E ∗ , I)
14       Tleft ← induceTCT(Ileft , E ∗ )
15       Tright ← induceTCT(Iright , ¬E ∗ )
16       T ← hE ∗ , I, Tleft , Tright i
17   return T
18 end
19
20 function split(E, I): pair of sets of individuals
21 input E: concept description
22            I: set of individuals
23 begin
24          hP, Ni ← retrievePosNeg(E, I, δ)
25          b ← getPrototype(P)
26          c ← getPrototype(N)
27          Ileft ← ∅
28          Iright ← ∅
29          for each a ∈ I
30              if d(a, b) ≤ d(a, c) then
31                  Ileft ← Ileft ∪ {a}
32              else
33                  Iright ← Iright ∪ {a}
34
35     return hIleft , Iright i
36 end
    The main routine is to be invoked as induceTCT(TI, C), where C may be
> or any other general concept the individuals in TI are known to belong to by
default.
    In this recursive algorithm, the base case depends on a test (via stopCondi-
tion) on a threshold over the heuristics employed for growing the tree, measuring
the cohesion of the cluster of individuals routed to the node in terms of a given
metric. If this value exceeds ν then the branch is marked as completed and the
cluster is stored in a leaf node. Further details about the heuristics and the stop
condition will be reported later on.
    In the inductive step, the current (parent) concept description C has to be
specialized by means of an refinement operator (ρ) exploring the search space of
specializations of C. A set S of candidate specializations ρ(C) is obtained. For
each E ∈ S, the set of positive and negative individuals, denoted, resp., by P
and N are retrieved.
    A tricky situation may occur (line 24) when either N or P is empty for a
given concept (e.g. due to a total lack of disjointness axioms). In such a case,
the algorithm can re-assign individuals I to N (resp. P) based on the distance
between them and the prototype of P (resp. N) when this goes beyond a given
threshold δ.
    For P and N, a representative element is determined as a prototype, i.e. their
medoid, a central element in a cluster, having a minimal average distance w.r.t.
the other elements. Then, function selectBestConcept evaluates the special-
izations of the concept according to the closeness w.r.t. the medoids, determined
according to a given distance measure (discussed later). The best concept E ∗ ∈ S
is the one for which the distance between the medoids for positive and negative
instance set is maximized. Then E ∗ is installed in the current node.
    After the assessment of the best concept E ∗ , the individuals are partitioned
by split to be routed along the left or right branch. Differently from TDTs,
the routine does not decide the branch where the individuals will be sorted
according to a concept membership test (instance check): rather it decides to
split individuals according to the distance w.r.t. the prototypes of positive and
negative membership w.r.t. E ∗ , i.e. the medoids of P and N. This divide-and-
conquer strategy is applied recursively until the instances routed to a node satisfy
the stop condition. Note that, the number of the clusters is not required as an
input but it depends on the number of paths generated in the growing phase:
the algorithm is able to determine it naturally following the data distribution.


Refinement Operator The proposed approach relies on a downward refine-
ment operator that can generate the concepts to be installed in child-nodes
performing a specialization process on the concept, say C, installed in a parent-
node:

ρ1 by adding a concept atom (or its complement) as a conjunct: C 0 = C u (¬)A;
ρ2 by adding a general existential restriction (or its complement) as a conjunct:
   C 0 = C u (¬)(∃)R.>;
ρ3 by adding a general universal restriction (or its complement) as a conjunct:
   C 0 = C u (¬)(∀)R.>;
ρ4 by replacing a sub-description Ci in the scope of an existential restriction in
   C with one of its refinements: ∃R.Ci0 ∈ ρ(∃R.Ci ) ∧ Ci0 ∈ ρ(Ci );
ρ5 by replacing a sub-description Ci in the scope of a universal restriction with
   one of its refinements: ∀R.Ci0 ∈ ρ(∀R.Ci ) ∧ Ci0 ∈ ρ(Ci ).
Note that the cases of ρ4 and ρ5 are recursive.

Prototypes Despite the common schema of the algorithms employed for grow-
ing TCTs and TDTs, the latter are obtained by selecting the best test in terms
of information gain maximization [8], while the clustering procedure for TCTs
resorts to a distance-based criterion on the individuals in the knowledge base.
Specifically, the heuristic adopted for selecting the best concept description that
will be installed as new node can be defined as follows:

                           E ∗ = arg max d (p(P), p(N))
                                    D∈ρ(C)

where P and N are sub-clusters obtained from splitting I w.r.t. D, d(·, ·) is a
distance measure between individuals and p(·) is a function which maps a set of
individuals to its prototype. As previously mentioned, the algorithm computes
the medoids of both the set of positive and negative instances w.r.t. the test.

Distance Measure The computation of medoids requires a (possibly) language-
independent measure for individuals whose definition should capture aspects of
their semantics in the context of the knowledge base. However individuals don’t
have an algebraic structure that can be exploited directly. In the TCT induction
algorithm a language-independent dissimilarity measure [5] has been adopted.
    Given a knowledge base K, the idea is to use the behavior of an individual
w.r.t. a set of concepts C = {C1 , C2 , . . . , Cm } that is dubbed context or committee
of features. After C has been chosen, a projection function for each Ci ∈ C can
be defined as a mapping πi : Ind(A) → {0, 21 , 1} such that
                                                  
                                                   1 if K |= Ci (a)
                   ∀ a ∈ Ind(A)      πi (a) = 0 if K |= ¬Ci (a)
                                                  1
                                                     2  otherwise

For the value of πi associated to the uncertain membership case (owing to the
OWA) we adopted a uniform prior probability. It can be set more accurately
according to the membership to Ci , πi : Ind(A) → [0, 1], with x ∈ Ind(A) 7→
πi (x) = P[K |= Ci (x)], if such a value can be estimated. For example, for densely
populated ontologies this may be estimated as |rA (Ci )|/|Ind(A)|, where rA ()
indicates the retrieval of the argument concept w.r.t. A, i.e. the set of individuals
in that are known to belong to the concept [2]: rA (C) = {a ∈ Ind(A) | K |=
C(a)}. For largely populated concept C, this may be estimated on the ground
of the assertions contained in A avoiding the bottleneck of reasoning.
       Through the projection function, it is possible to define a family of distance
    measures {dCp }p∈N for individuals as follows: dCp : Ind(A) × Ind(A) → [0, 1] such
    that                               v
                                       um
                                       uX                        p
                          dCp (a, b) = t
                                       p
                                            wi [1 − πi (a)πi (b)]
                                            i=1
                                                         pP
                                                                                      p
    In previous papers, e.g. [9], we also used the form: p     i wi |[πi (a) − πi (b)| .
    The vector of weights w can be set according to the entropy of the considered
    concepts computed over the individuals occurring in K [10].
       To speed up the algorithm, the projection functions can be pre-computed
    (once) before the learning phase.

    Stop Condition The growth of a TCT can be stopped by resorting to a heuris-
    tic that is similar to the one employed for selecting the best concept description.
    This can be made by introducing a threshold ν ∈ [0, 1], for the value of d(·, ·). If
    the value is lower than the threshold, the branch growth is stopped.

    4.2   Extracting Concepts from TCTs
    Alg. 2 reports the sketch of the function for deriving the concept descriptions
    describing the clusters obtained through a TCT. Given a TCT T , essentially


    Algorithm 2 Routines for deriving concept definitions from a TCT
1 function DeriveConcepts(C, T ): set of concepts
2 input C: concept name
3         T : TCT
4 begin
5    let T = hD, I, Tleft , Tright i
6    if Tleft = Tright = null then   { leaf }
7        return {C}
 8   else
 9       CSleft ← DeriveConcepts(C u D, Tleft )
10       CSright ← DeriveConcepts(¬C u ¬D, Tright )
11       return (CSleft ∪ CSright )
12 end


    function deriveConcepts traverses the tree structure to collect the concept
    descriptions that are used as parents of the leaf-nodes. In this phase, it generates
    a set of concept descriptions CS.
    Example 4.2. The set CS that can be obtained from the tree reported in Fig. 1
    contains the following concept descriptions (corresponding to the four clusters):

      CS = { D1 , D2 , D3 , D4 } =
           { Person u ∃hasPublication.> u ∃hasPublication.(SWJ),
             (Person u ∃hasPublication.>) u ¬(Person u ∃hasPublication.(SWJ)),
             Person u ¬(Person u ∃hasPublication.>),
             ¬Person                                                        }
Such concepts might be also simplified before being output.                      t
                                                                                 u

    Note that also the internal nodes contain concept descriptions that account
for the individuals routed to such nodes. Hence it is straightforward to extend the
extraction procedure, in order to produce a list of subsumption axioms between
couples of the node concepts which might be submitted to a knowledge engineer
for validation.


5   Conclusions, Ongoing and Future Work
In this work, we have proposed an extension of terminological decision trees [8],
which were originally employed for (supervised) concept learning, in order to
solve (unsupervised) conceptual clustering problems in the context of the datasets
belonging to the Web of Data.
    The algorithm essentially adopts a divide-and-conquer strategy that gener-
ates concept descriptions to be installed in the inner nodes via a refinement
operator and selects the most promising ones to represent clusters of similar
individuals using a suitable distance measure.
    This preliminary work can be extended along various possible directions
(some extensions are already being carried out), which can be listed as follows:
 – a comparison to other clustering methods in order to understand the feasi-
   bility of the proposed solution;
 – new distance measures between the individuals: in this work we adopted a
   language-independent distance measure between the individuals of a knowl-
   edge base. In this perspective, it may be interesting to investigate other
   distance measures (e.g. less computationally expensive functions);
 – new refinement operators: we adopted a refinement operator used for solving
   supervised learning problems. Further extensions of this work may consider
   new refinement operators that can be borrowed from other machine learning
   algorithms, e.g. DL-Learner [15] or other methods devised for the specific
   method;
 – new heuristics: the approach installs concept descriptions so that the overlap
   between the corresponding sets of positive and negative individuals is min-
   imized. It may be interesting to investigate a different heuristic that allows
   to quantify the degree of overlap between the two set of individuals;
 – discovery of disjointness axioms in order to enrich a knowledge base through
   the induction of terminological cluster trees and evaluation of the approach;
 – strategies to obtain simpler trees: we plan to investigate the effectiveness of
   post-pruning procedures
 – inducing different kind of clusters: we focused on crisp clustering in this
   work. Another possible extension concerns the integration of theories for un-
   certainty management such as probabilistic models or the Dempster-Shafer
   theory allowing overlapping clusters;
 – scalability: another extension is to adopt solutions to make the proposed
   method scalable. Some possible solutions span from the implementation of
    distributed version of TCT induction algorithm and the employment of ap-
    proximate reasoners in order to cope with the computational costs related
    to the underlying reasoning services adopted by the algorithm.

References
 1. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications.
    Chapman & Hall/CRC, 1st edn. (2013)
 2. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.):
    The Description Logic Handbook. Cambridge University Press, 2nd edn. (2007)
 3. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms.
    Kluwer Academic Publishers (1981)
 4. Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees.
    Artif. Intell. 101(1-2), 285–297 (1998)
 5. d’Amato, C., Fanizzi, N., Esposito, F.: Query Answering and Ontology Population:
    An Inductive Approach. In: Bechhofer, S., et al. (eds.) Proceedings of ESWC 2008.
    LNCS, vol. 5021, pp. 288–302. Springer (2008)
 6. De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: Lavrač,
    N., Džeroski, S. (eds.) Proceedings of ILP 1997. LNAI, vol. 1297, pp. 133–140.
    Springer (1997)
 7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete
    data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1),
    1–38 (1977)
 8. Fanizzi, N., d’Amato, C., Esposito, F.: Induction of Concepts in Web Ontologies
    through Terminological Decision Trees. In: Balcázar, J., et al. (eds.) Proceedings
    of ECML/PKDD2010. LNAI, vol. 6321, pp. 442–457. Springer (2010)
 9. Fanizzi, N., d’Amato, C.: A Hierarchical Clustering Method for Semantic Knowl-
    edge Bases. In: Apolloni, B., Howlett, R.J., Jain, L.C. (eds.) Proceedings of
    KES 2007, Part III. LNCS, vol. 4694, pp. 653–660. Springer (2007)
10. Fanizzi, N., d’Amato, C., Esposito, F.: Evolutionary Conceptual Clustering Based
    on Induced Pseudo-Metrics. Int. J. Semantic Web Inf. Syst. 4(3), 44–67 (2008)
11. Fanizzi, N., Iannone, L., Palmisano, I., Semeraro, G.: Concept Formation in Expres-
    sive Description Logics. In: Boulicaut, J., et al. (eds.) Proceedings of ECML 2004.
    LNAI, vol. 3201, pp. 99–110. Springer (2004)
12. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space.
    Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers (2011)
13. Kietz, J.U., Morik, K.: A Polynomial Approach to the Constructive Induction of
    Structural Knowledge. Mach Learn 14, 193–217 (1994)
14. Laskey, K., Costa, P., Kokar, M., Martin, T., Lukasiewicz, T.: Uncertainty Rea-
    soning for the World Wide Web. Tech. rep., URW3 W3C Incubator Group (2008),
    http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331
15. Lehmann, J.: DL-Learner: Learning Concepts in Description Logics. Journal of
    Machine Learning Research (JMLR) 10, 2639–2642 (2009)
16. MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate
    Observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the 5th Berke-
    ley Symposium on Mathematical Statistics and Probability. vol. 1, pp. 281–297.
    University of California Press (1967)
17. Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the Semantic
    Web - Statistical learning for next generation knowledge bases. Data Min. Knowl.
    Discov. 24(3), 613–662 (2012)
18. Stepp, R.E., Michalski, R.S.: Conceptual Clustering: Inventing Goal Oriented Clas-
    sifications of Structured Objects. In: Machine Learning: An Artificial Intelligence
    Approach, Vol II. Morgan Kaufmann (1986)

</pre>