=Paper=
{{Paper
|id=Vol-1665/paper5
|storemode=property
|title=Induction of Terminological Cluster Trees
|pdfUrl=https://ceur-ws.org/Vol-1665/paper5.pdf
|volume=Vol-1665
|authors=Giuseppe Rizzo,Claudia d'Amato,Nicola Fanizzi,Floriana Esposito
|dblpUrl=https://dblp.org/rec/conf/semweb/0001dFE16
}}
==Induction of Terminological Cluster Trees==
Induction of Terminological Cluster Trees Preliminaries, Model, Method and Perspectives Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, and Floriana Esposito LACAM – Dipartimento di Informatica Università degli Studi di Bari “Aldo Moro” Campus – Via Orabona 4, 70125, Bari, Italy http://lacam.di.uniba.it Abstract. In this paper, we tackle the problem of clustering individ- ual resources in the context of the Web of Data, that is characterized by a huge amount of data published in a standard data model with a well-defined semantics based on Web ontologies. In fact, clustering methods offer an effective solution to support a lot of complex related activities, such as ontology construction, debugging and evolution, tak- ing into account the inherent incompleteness underlying the representa- tion. Web ontologies already encode a hierarchical organization of the resources by means of the subsumption hierarchy of the classes, which may be expressed explicitly, with proper subsumption axioms, or it must be detected indirectly, by reasoning on the available axioms that de- fine the classes (classification). However it frequently happens that such classes are sparsely populated as the hierarchy often reflect a view of the knowledge engineer prior to the actual introduction of assertions involv- ing the individual resources. As a result, very general classes are often loosely populated, but this may happen also to specific subclasses, mak- ing it more difficult to check the types of a resource (instance checking), even through reasoning services. Among the large number of algorithms proposed in the Machine Learning literature, we propose a clustering method that is able to organize groups of resources hierarchically. Specif- ically, in this work, we introduce a conceptual clustering approach that combines a distance measure between individuals in a knowledge base in a divide-and-conquer solution that is intended to elicit ex post the underlying hierarchy based on the actual distributions of the instances. 1 Introduction and Motivation With the growth of the Web of Data, along with the Linked Data initiative [12], a large number of datasets/vocabularies are being published on the ground of a standard data model and connected within a uniform semantic space exploiting RDF and the Web infrastructure. However, this huge quantity of data is also known as inherently uncertain, for being often inconsistent/conflictual, vague and, especially, incomplete [14]. Web ontologies already encode a hierarchical organization of the data by means of the subsumption hierarchy of the classes, which may be expressed explicitly, with proper subsumption axioms, or it has to be determined indirectly, by reasoning on the available axioms defining the various classes (classification task). However it frequently happens that such classes are sparsely populated as the hierarchy often reflects a view of the knowledge engineer prior to the actual introduction of assertions involving the individual resources. As a result, very general classes are often loosely populated, but this may happen also to specific subclasses, hindering or, at least, making it difficult to check the types of a resource (instance checking), even employing reasoning services. This may have a strong negative impact on services based on query answering, which often only rely on explicit assertions regarding the data. Machine Learning methods can be employed to elicit implicit pieces of knowl- edge from the datasets, also in the face of such cases of inherent incompleteness, and provide non-standard inference services [17]. In particular, in this paper we will address conceptual clustering [18] as a means to exploit the data to detect further classes that may arise with the underlying hierarchical structure. Specif- ically, we introduce an approach that combines semantic distance measures over the space of individuals together with a divide-and-conquer solution that is in- tended to elicit in retrospection an additional class hierarchy to reflect the actual distributions of the instances. Clustering is an unsupervised learning task aiming at partitioning a collec- tion of objects into subsets or clusters, so that those within each cluster are more closely related to one another than to the objects assigned to different clusters [1]. In the Web of Data context, clustering can enable the definition of new emerging concepts (concept formation) on the grounds of those employed in a knowledge base; supervised methods can exploit these clusters to induce new concept definitions or to refine existing ones (ontology evolution); intensionally defined groupings may speed-up the task of search and discovery; clustering may also induce criteria for ranking the retrieved resources [10, 9]. An object is usually described by a fixed set of feature (attribute values) and the most common notion of similarity between the objects is expressed in terms of a distance function based on this set; for example, datasets made up of objects described by tuples of numeric features and (extensions of) the Euclidean distance are adopted to determine object and cluster similarity. An important difference among the various clustering techniques is related to the type of membership that is adopted. In the simplest (crisp) case, e.g. k- Means [16], cluster membership can be exclusive: each object belongs exactly to one cluster. Extensions, such as fuzzy c-means [3] or EM [7], admit an overlap between the clusters and a degree of membership (responsibility) of each object to a cluster. Further extensions include non-flat clustering structures such as those induced via hierarchical clustering [1]. In this work, moving on to conceptual clustering, we will require that classes with an intensional definition may account for and define these clusters. To this purpose the algorithms should be able to exploit a background knowledge that can be expressed using expressive representation languages1 , such as De- scription Logics. Again, in such methods, clustering requires the definition of (dis)similarity measure between the set of the objects to be clustered. We propose a generalized solution based on logical trees [4], namely Termi- nological Cluster Trees, as an extension of terminological decision trees [8]. They adopt a pseudo-metric defined over the space of individuals as a criterion to sep- arate groups of objects rather than information gain or other notions of purity devised for supervised concept learning and inductive classification methods, ter- minological decision trees. The proposed solution provides intensional definitions that can be used for describing the individuals in the cluster, and unlike other methods [13, 11], does not have to resort to complex approximations such as the most specific concept [2] as the representative of individuals on the conceptual level. Even more so, terminological cluster trees can determine autonomously the number of clusters to be generated, which is a required parameter for several other methods having a strong impact on the quality of the partitions obtained. The rest of the paper is organized as follows: the next section illustrates the basic notions about the underlying Description Logics foundations for the intended representation and reasoning services; Sect. 3 introduces the problem of clustering individuals of a knowledge base while Sect. 4 presents the approach required for inducing terminological cluster trees. Finally, Sect. 5 illustrates the conclusions and some possible extensions. 2 Basics In the following, we will borrow notation and terminology from Description Logics (DLs) [2] as the representation and reasoning services for the knowledge bases in the Web of Data ultimately rely on such a family of languages. Hence we will use the terms concept (description) and role as synonyms of class and property 2 , respectively. In these languages, a domain is modeled through atomic concepts (classes) NC and roles (relations) NR , which can be used to build complex descriptions re- garding individuals (instances, objects), by using specific operators (complement, conjunction and disjunction between concepts) that depend on the adopted lan- guage. A knowledge base is a couple K = (T , A) where the TBox T contains axioms concerning concepts and roles (typically inclusion axioms such as C v D) and the ABox A contains assertions, i.e. axioms regarding the individuals (C(a), resp. R(a, b)). The set of individuals occurring in A is denoted by Ind(A). The semantics of individuals, concepts, and roles is defined through interpre- tations. An interpretation is a couple I = (∆I , ·I ) where ∆I is the domain of the interpretation and ·I is a mapping such that, for each individual a, aI ∈ ∆I , for each concept C, C I ⊆ ∆I and for each role R, RI ⊆ ∆I ×∆I . The semantics 1 In the past, various fragments of First-Order Logic have been adopted, such as Clausal Logics, especially in Inductive Logic Programming. 2 Datatype properties, i.e. roles ranging on concrete domains will not be considered in this study. of complex descriptions descends from the interpretation of the primitive con- cepts/roles and of the operators employed, depending on the adopted language. I satisfies an axiom C v D (C is subsumed by D) when C I ⊆ DI and an assertion C(a) (resp. R(a, b)) when aI ∈ C I (resp. (aI , bI ) ∈ RI ). I is a model for K iff it satisfies each axiom/assertion α in K, denoted with I |= α. When α is satisfied w.r.t. these models, we write K |= α. We will be interested in the instance-checking inference service: given an individual a and a concept description C determine if K |= C(a). Due to the Open World Assumption (OWA), answering to a class-membership query is more difficult w.r.t. ILP settings where the closed-world reasoning is the standard form. Indeed, one may not be able to prove the truth of either K |= C(a) or K |= ¬C(a), as there may be possible to find different interpretations that satisfy either cases. 3 Conceptual Clustering for DL Knowledge Bases As we are targeting conceptual clustering for DL Knowledge Bases, the problem, in a simple formulation, may be formalized as follows: Definition 3.1 (conceptual clustering – flat case). Given: – a knowledge base K = hT , Ai – a set of training individuals TI ⊆ Ind(A) Find: – a partition of TI in n pairwise disjoint clusters {C1 , . . . , Cn } – for each i = 1, . . . , n, a concept description Di that accounts for Ci , i.e. such that ∀a ∈ Ci : 1. K |= Di (a) 2. K |= ¬Dj (a) ∀j ∈ {1, . . . , n}, j 6= i Note that in this setting the number of clusters (n) is not required as a pa- rameter. Condition 2. may be relaxed (e.g. K 6|= Dj (a)) to allow some overlap between the clusters/concepts that may be further extended towards probabilis- tic clustering methods and models [1]. This problem can be regarded as a recursive one, as each cluster, in its turn, might yield its internal partitioning and each sibling sub-cluster would be characterized intensionally by its sub-class. The decision on whether to partition recursively a given cluster or not gener- ally depends on cohesion metrics, assessing a measure of intra-cluster similarity (within the cluster) and the w.r.t. the inter-cluster dissimilarity (w.r.t. the sib- ling partitions). 4 Terminological Cluster Trees The notion of terminological cluster tree extends logical clustering trees intro- duced in [6] and learned through C0.5, a system derived from Tilde [4]. The induction of the model combines elements of logical decision trees induction (i.e. the approach based on recursive partitioning and exploiting of refinement oper- ator for specializing concept descriptions) with other elements of instance-based learning (i.e. the employment of a distance measure over the instance space). Definition 4.1 (Terminological Cluster Trees). Given a knowledge base K, a terminological cluster tree (TCT) is a (binary) logical tree where: – each leaf node stands for a cluster of individuals, C – each node contains a concept description D (over the signature of K); – each edge from an internal node corresponds to the outcome of the member- ship test of individuals w.r.t. the concept installed in the node3 . Hence, a tree-node can be represented by a quadruple hD, C, Tleft , Tright i, indi- cating the two subtrees connected by either edge. Person Person u ∃hasPublication.> C4 Person u ∃hasPublication.(SWJ) C3 C1 C2 Fig. 1. A simple TCT for describing individuals involved in the Semantic Web research community Example 4.1 (a TCT). Fig. 1 illustrates a simple example of a TCT for describ- ing individuals in the academic domain. The root node of the tree contains the concept Person. Two edges depart from this node: the left branch is used to denote a positive membership of an individual w.r.t. the concept Person, while 3 By convention the left branch is for positive instances and the right one is for negative instances. the right branch denote a negative membership w.r.t. the concept. On one hand, the right child of this node is a leaf that contains a cluster C4 composed by individuals that are not instances of the concept Person. On the other hand, the left child of the root node contains a further complex concept description. Again, there are two edges departing from this node: the right edge links the node to another leaf containing the cluster C3 of individuals denoting individuals that have no publication. t u The details of the algorithms for (a) growing a TCT and (b) deriving inten- sional definitions are reported in the sequel. 4.1 A Method for Inducing Terminological Cluster Trees A terminological cluster tree T is induced by means of a recursive strategy, which follows the same schema proposed for terminological decision trees (TDTs) [8]. The sketch is reported in Alg. 1. Algorithm 1 Routines for inducing a TCT 1 const ν, δ: thresholds 2 3 function induceTCT(I, C): TCT 4 input I: set of individuals 5 C: concept description 6 begin 7 T ← new TCT 8 if stopCondition(ν, I) then 9 T ← hnull, I, null, nulli 10 else 11 S ← ρ(C) {set of candidate specializations} 12 E ∗ ← selectBestConcept(S, I) 13 hIleft , Iright i ← split(E ∗ , I) 14 Tleft ← induceTCT(Ileft , E ∗ ) 15 Tright ← induceTCT(Iright , ¬E ∗ ) 16 T ← hE ∗ , I, Tleft , Tright i 17 return T 18 end 19 20 function split(E, I): pair of sets of individuals 21 input E: concept description 22 I: set of individuals 23 begin 24 hP, Ni ← retrievePosNeg(E, I, δ) 25 b ← getPrototype(P) 26 c ← getPrototype(N) 27 Ileft ← ∅ 28 Iright ← ∅ 29 for each a ∈ I 30 if d(a, b) ≤ d(a, c) then 31 Ileft ← Ileft ∪ {a} 32 else 33 Iright ← Iright ∪ {a} 34 35 return hIleft , Iright i 36 end The main routine is to be invoked as induceTCT(TI, C), where C may be > or any other general concept the individuals in TI are known to belong to by default. In this recursive algorithm, the base case depends on a test (via stopCondi- tion) on a threshold over the heuristics employed for growing the tree, measuring the cohesion of the cluster of individuals routed to the node in terms of a given metric. If this value exceeds ν then the branch is marked as completed and the cluster is stored in a leaf node. Further details about the heuristics and the stop condition will be reported later on. In the inductive step, the current (parent) concept description C has to be specialized by means of an refinement operator (ρ) exploring the search space of specializations of C. A set S of candidate specializations ρ(C) is obtained. For each E ∈ S, the set of positive and negative individuals, denoted, resp., by P and N are retrieved. A tricky situation may occur (line 24) when either N or P is empty for a given concept (e.g. due to a total lack of disjointness axioms). In such a case, the algorithm can re-assign individuals I to N (resp. P) based on the distance between them and the prototype of P (resp. N) when this goes beyond a given threshold δ. For P and N, a representative element is determined as a prototype, i.e. their medoid, a central element in a cluster, having a minimal average distance w.r.t. the other elements. Then, function selectBestConcept evaluates the special- izations of the concept according to the closeness w.r.t. the medoids, determined according to a given distance measure (discussed later). The best concept E ∗ ∈ S is the one for which the distance between the medoids for positive and negative instance set is maximized. Then E ∗ is installed in the current node. After the assessment of the best concept E ∗ , the individuals are partitioned by split to be routed along the left or right branch. Differently from TDTs, the routine does not decide the branch where the individuals will be sorted according to a concept membership test (instance check): rather it decides to split individuals according to the distance w.r.t. the prototypes of positive and negative membership w.r.t. E ∗ , i.e. the medoids of P and N. This divide-and- conquer strategy is applied recursively until the instances routed to a node satisfy the stop condition. Note that, the number of the clusters is not required as an input but it depends on the number of paths generated in the growing phase: the algorithm is able to determine it naturally following the data distribution. Refinement Operator The proposed approach relies on a downward refine- ment operator that can generate the concepts to be installed in child-nodes performing a specialization process on the concept, say C, installed in a parent- node: ρ1 by adding a concept atom (or its complement) as a conjunct: C 0 = C u (¬)A; ρ2 by adding a general existential restriction (or its complement) as a conjunct: C 0 = C u (¬)(∃)R.>; ρ3 by adding a general universal restriction (or its complement) as a conjunct: C 0 = C u (¬)(∀)R.>; ρ4 by replacing a sub-description Ci in the scope of an existential restriction in C with one of its refinements: ∃R.Ci0 ∈ ρ(∃R.Ci ) ∧ Ci0 ∈ ρ(Ci ); ρ5 by replacing a sub-description Ci in the scope of a universal restriction with one of its refinements: ∀R.Ci0 ∈ ρ(∀R.Ci ) ∧ Ci0 ∈ ρ(Ci ). Note that the cases of ρ4 and ρ5 are recursive. Prototypes Despite the common schema of the algorithms employed for grow- ing TCTs and TDTs, the latter are obtained by selecting the best test in terms of information gain maximization [8], while the clustering procedure for TCTs resorts to a distance-based criterion on the individuals in the knowledge base. Specifically, the heuristic adopted for selecting the best concept description that will be installed as new node can be defined as follows: E ∗ = arg max d (p(P), p(N)) D∈ρ(C) where P and N are sub-clusters obtained from splitting I w.r.t. D, d(·, ·) is a distance measure between individuals and p(·) is a function which maps a set of individuals to its prototype. As previously mentioned, the algorithm computes the medoids of both the set of positive and negative instances w.r.t. the test. Distance Measure The computation of medoids requires a (possibly) language- independent measure for individuals whose definition should capture aspects of their semantics in the context of the knowledge base. However individuals don’t have an algebraic structure that can be exploited directly. In the TCT induction algorithm a language-independent dissimilarity measure [5] has been adopted. Given a knowledge base K, the idea is to use the behavior of an individual w.r.t. a set of concepts C = {C1 , C2 , . . . , Cm } that is dubbed context or committee of features. After C has been chosen, a projection function for each Ci ∈ C can be defined as a mapping πi : Ind(A) → {0, 21 , 1} such that 1 if K |= Ci (a) ∀ a ∈ Ind(A) πi (a) = 0 if K |= ¬Ci (a) 1 2 otherwise For the value of πi associated to the uncertain membership case (owing to the OWA) we adopted a uniform prior probability. It can be set more accurately according to the membership to Ci , πi : Ind(A) → [0, 1], with x ∈ Ind(A) 7→ πi (x) = P[K |= Ci (x)], if such a value can be estimated. For example, for densely populated ontologies this may be estimated as |rA (Ci )|/|Ind(A)|, where rA () indicates the retrieval of the argument concept w.r.t. A, i.e. the set of individuals in that are known to belong to the concept [2]: rA (C) = {a ∈ Ind(A) | K |= C(a)}. For largely populated concept C, this may be estimated on the ground of the assertions contained in A avoiding the bottleneck of reasoning. Through the projection function, it is possible to define a family of distance measures {dCp }p∈N for individuals as follows: dCp : Ind(A) × Ind(A) → [0, 1] such that v um uX p dCp (a, b) = t p wi [1 − πi (a)πi (b)] i=1 pP p In previous papers, e.g. [9], we also used the form: p i wi |[πi (a) − πi (b)| . The vector of weights w can be set according to the entropy of the considered concepts computed over the individuals occurring in K [10]. To speed up the algorithm, the projection functions can be pre-computed (once) before the learning phase. Stop Condition The growth of a TCT can be stopped by resorting to a heuris- tic that is similar to the one employed for selecting the best concept description. This can be made by introducing a threshold ν ∈ [0, 1], for the value of d(·, ·). If the value is lower than the threshold, the branch growth is stopped. 4.2 Extracting Concepts from TCTs Alg. 2 reports the sketch of the function for deriving the concept descriptions describing the clusters obtained through a TCT. Given a TCT T , essentially Algorithm 2 Routines for deriving concept definitions from a TCT 1 function DeriveConcepts(C, T ): set of concepts 2 input C: concept name 3 T : TCT 4 begin 5 let T = hD, I, Tleft , Tright i 6 if Tleft = Tright = null then { leaf } 7 return {C} 8 else 9 CSleft ← DeriveConcepts(C u D, Tleft ) 10 CSright ← DeriveConcepts(¬C u ¬D, Tright ) 11 return (CSleft ∪ CSright ) 12 end function deriveConcepts traverses the tree structure to collect the concept descriptions that are used as parents of the leaf-nodes. In this phase, it generates a set of concept descriptions CS. Example 4.2. The set CS that can be obtained from the tree reported in Fig. 1 contains the following concept descriptions (corresponding to the four clusters): CS = { D1 , D2 , D3 , D4 } = { Person u ∃hasPublication.> u ∃hasPublication.(SWJ), (Person u ∃hasPublication.>) u ¬(Person u ∃hasPublication.(SWJ)), Person u ¬(Person u ∃hasPublication.>), ¬Person } Such concepts might be also simplified before being output. t u Note that also the internal nodes contain concept descriptions that account for the individuals routed to such nodes. Hence it is straightforward to extend the extraction procedure, in order to produce a list of subsumption axioms between couples of the node concepts which might be submitted to a knowledge engineer for validation. 5 Conclusions, Ongoing and Future Work In this work, we have proposed an extension of terminological decision trees [8], which were originally employed for (supervised) concept learning, in order to solve (unsupervised) conceptual clustering problems in the context of the datasets belonging to the Web of Data. The algorithm essentially adopts a divide-and-conquer strategy that gener- ates concept descriptions to be installed in the inner nodes via a refinement operator and selects the most promising ones to represent clusters of similar individuals using a suitable distance measure. This preliminary work can be extended along various possible directions (some extensions are already being carried out), which can be listed as follows: – a comparison to other clustering methods in order to understand the feasi- bility of the proposed solution; – new distance measures between the individuals: in this work we adopted a language-independent distance measure between the individuals of a knowl- edge base. In this perspective, it may be interesting to investigate other distance measures (e.g. less computationally expensive functions); – new refinement operators: we adopted a refinement operator used for solving supervised learning problems. Further extensions of this work may consider new refinement operators that can be borrowed from other machine learning algorithms, e.g. DL-Learner [15] or other methods devised for the specific method; – new heuristics: the approach installs concept descriptions so that the overlap between the corresponding sets of positive and negative individuals is min- imized. It may be interesting to investigate a different heuristic that allows to quantify the degree of overlap between the two set of individuals; – discovery of disjointness axioms in order to enrich a knowledge base through the induction of terminological cluster trees and evaluation of the approach; – strategies to obtain simpler trees: we plan to investigate the effectiveness of post-pruning procedures – inducing different kind of clusters: we focused on crisp clustering in this work. Another possible extension concerns the integration of theories for un- certainty management such as probabilistic models or the Dempster-Shafer theory allowing overlapping clusters; – scalability: another extension is to adopt solutions to make the proposed method scalable. Some possible solutions span from the implementation of distributed version of TCT induction algorithm and the employment of ap- proximate reasoners in order to cope with the computational costs related to the underlying reasoning services adopted by the algorithm. References 1. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, 1st edn. (2013) 2. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The Description Logic Handbook. Cambridge University Press, 2nd edn. (2007) 3. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers (1981) 4. Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artif. Intell. 101(1-2), 285–297 (1998) 5. d’Amato, C., Fanizzi, N., Esposito, F.: Query Answering and Ontology Population: An Inductive Approach. In: Bechhofer, S., et al. (eds.) Proceedings of ESWC 2008. LNCS, vol. 5021, pp. 288–302. Springer (2008) 6. De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: Lavrač, N., Džeroski, S. (eds.) Proceedings of ILP 1997. LNAI, vol. 1297, pp. 133–140. Springer (1997) 7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977) 8. Fanizzi, N., d’Amato, C., Esposito, F.: Induction of Concepts in Web Ontologies through Terminological Decision Trees. In: Balcázar, J., et al. (eds.) Proceedings of ECML/PKDD2010. LNAI, vol. 6321, pp. 442–457. Springer (2010) 9. Fanizzi, N., d’Amato, C.: A Hierarchical Clustering Method for Semantic Knowl- edge Bases. In: Apolloni, B., Howlett, R.J., Jain, L.C. (eds.) Proceedings of KES 2007, Part III. LNCS, vol. 4694, pp. 653–660. Springer (2007) 10. Fanizzi, N., d’Amato, C., Esposito, F.: Evolutionary Conceptual Clustering Based on Induced Pseudo-Metrics. Int. J. Semantic Web Inf. Syst. 4(3), 44–67 (2008) 11. Fanizzi, N., Iannone, L., Palmisano, I., Semeraro, G.: Concept Formation in Expres- sive Description Logics. In: Boulicaut, J., et al. (eds.) Proceedings of ECML 2004. LNAI, vol. 3201, pp. 99–110. Springer (2004) 12. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers (2011) 13. Kietz, J.U., Morik, K.: A Polynomial Approach to the Constructive Induction of Structural Knowledge. Mach Learn 14, 193–217 (1994) 14. Laskey, K., Costa, P., Kokar, M., Martin, T., Lukasiewicz, T.: Uncertainty Rea- soning for the World Wide Web. Tech. rep., URW3 W3C Incubator Group (2008), http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331 15. Lehmann, J.: DL-Learner: Learning Concepts in Description Logics. Journal of Machine Learning Research (JMLR) 10, 2639–2642 (2009) 16. MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the 5th Berke- ley Symposium on Mathematical Statistics and Probability. vol. 1, pp. 281–297. University of California Press (1967) 17. Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the Semantic Web - Statistical learning for next generation knowledge bases. Data Min. Knowl. Discov. 24(3), 613–662 (2012) 18. Stepp, R.E., Michalski, R.S.: Conceptual Clustering: Inventing Goal Oriented Clas- sifications of Structured Objects. In: Machine Learning: An Artificial Intelligence Approach, Vol II. Morgan Kaufmann (1986)