Automatic Construction and Re nement of a Class Hierarchy over Semistructured Data Nathalie Pernelle, Marie-Christine Rousset, Veronique Ventos L.R.I., C.N.R.S. & University of Paris-Sud Building 490, 91405, Orsay Cedex, France Email: fpernelle, mcr,ventosg@lri.fr 1 Introduction The subsumption relation is a preorder relation be- In many applications, it becomes crucial to help users to tween class descriptions, induced by the inclusion rela- access to a huge amount of data by clustering them in a tion between class extensions. small number of classes described at an appropriate level De nition 2 (Subsumption between classes) of abstraction. In this paper, we present an approach Let C and C be two L class descriptions. C is 1 2 1 based on the use of two languages of description of classes subsumed by C , denoted C L C , i for every set I 2 1 2 for the automatic clustering of semistructured data. The of instances, extI (C )  extI (C ). 1 2 rst language of classes has a high power of abstraction In sections 2.3, and 2.4, we will provide a constructive and guides the construction of a lattice of classes cover- characterization of subsumption for the two languages of ing the whole set of the data. The second language of classes that we consider. classes, more expressive and more precise, is the basis for The notion of abstraction of an instance in a language the re nement of a part of the lattice that the user wants of classes L corresponds, when it exists, to the most to focus on. Our approach has been implemented and speci c class description in L which it is an instance of. experimented on real data in the setting of the GAEL project 1 which aims at building exible electronic cata- De nition 3 (Abstraction of an instance) Let i be logs organized as a hierarchy of classes of products. Our an instance, the L class description C is an abstraction experiments have been conducted on real data coming of i in L (for short C = absL (i)) i from the C/Net (http://www.cnet.com) electronic cata- 1. i isaL C, and log of computer products. 2. if D is a class description such that i isaL D, then C L D. 2 Languages of instances and classes The notion of least common subsumer (a.k.a msg) will In this section, we de ne the language of instances, L1, be the basis for gathering classes in our clustering algo- in which we describe semistructured data, and the two rithm. languages of classes L2 and L3 that we use to describe De nition 4 (Least Common Subsumer) Let classes over those data, at two levels of abstraction. C ; : : :; Cn be class descriptions in L. The L class de- First, we provide some notations and preliminaries. 1 scription C is a least common subsumer of C ; : : :; Cn 2.1 Preliminaries and notations 1 in L (for short C = lcsL (C ; : : :; Cn)) i 1 Given a language of instances, a language of classes L 1. Ci L C for all 1  i  n, and de nes the expressions that are allowed as class descrip- 2. if D is a class description satisfying Ci L D for tions. A class description is intended to represent in an all 1  i  n, then C L D abstract and concise way the properties that are com- mon to the set of its instances. A membership relation, 2.2 The language of instances denoted by isaL , establishes the necessary connection The data that serve as instances of the classes that we between a given language of instances and an associated build are semistructured data in the following sense: language of classes L. each data is described by a set of pairs (Attribute, Val- De nition 1 (Extension of a class description) ues) but the attributes used for describing the data may Let I be a set of instances, and C a L class description. vary from an item to another. In addition, each data is The extension of C w.r.t I is the following set: typed (i.e., labelled by the name of a basic type). extI (C) = fi 2 I j i isaL C g De nition 5 (Terms of L ) Let B be a nite set of 1 1 GAEL is funded by the French Ministry of Industry ; it is basic types, A a nite set of attributes, and V a set of a joint project with the Verso database group of INRIA and values. A term of L is of the form: 1 a startup, MatchVision, specialized in Electronic Commerce fc; att = V ; : : :; attn = Vn g 1 1 where c 2 B, 8i 2 [1::n], atti 2 A and Vi  V . De nition 9 (Membership relation for L ) Let i 3 The description of an instance is a term of L1. For be an instance description in L . Let C be a L class 1 3 example, we can nd a product in the C/Net catalog, description. i is an instance of C i every attribute in whose L1 description is: i appears in C and for every term attsuf f : V appearing in C , fRemovableDiskDrive, - when suff = , if there exists V 0 s.t att = V 0 2 i, then CD=DV D=Type=fCDRW g, V0  V, StorageRemovableT ype=fSuperDiskg, - when suff = +, there exists V 0  V s.t att = V 0 2 i, Compatibility=fMAC; P C gg - when suff =?, if there exists V 0 s.t att = V 0 2 i, then In the following, we will consider that the type c of a V 0 is a singleton and V 0  V , L1 description is a boolean attribute. - when suff = , there exists V 0 singleton s.t V 0  V and att = V 0 2 i. 2.3 The language of classes L2 The product described in 2.2 is an instance of the L3 De nition 6 (Class description in L ) A L class 2 2 class description C1 : description (of size n) is a tuple of attributes f RemovableDiskDrive :ftrueg, fatt ; : : :; attng, where 8i 2 [1::n], atti 2 A. 1 CD=DV D=ReadSpeed? :f20x; 32x; 24xg, The connection between the language of instances L 1 CD=DV D=Type :fCDROM; CDRW g, and the language of classes L is based on the following 2 Compatibility+ :fMAC;PC g, de nition of the membership relation. StorageRemovableType :fSuperDisk; ZIP; JAZ gg De nition 7 (Membership relation for L ) Let i 2 be an instance description in L . Let C be a L class 1 2 It represents the set of products that have in their de- description: i is an instance of C i every attribute ap- scription (i) necessarily the monovalued and boolean at- pearing in C also appears in i. tribute RemovableDiskDrive whose value must be true, (ii) possibly the attribute CD=DV D=ReadSpeed, and The following proposition, whose proof is straight- if that is the case, this attribute is monovalued and its forward, characterizes subsumption, least common sub- value belongs to the set f20x; 32x; 24xg, (iii) necessarily sumer and abstraction in L2. the attribute CD=DV D=Type, which is monovalued and Proposition 1 (Properties of L ) . takes its value in the set fCDROM; CDRW g, (iv) nec-  Let C and C be two L class descriptions. C L2 2 essarily the attribute Compatibility, which can be mul- 1 2 2 C i every attribute of C is also an attribute of C . 1 tivalued and takes its value(s) in the set fMAC; P C g, 2 2  Let fatt = V ; : : :; attn = Vng be an instance de- 1 (v) necessarily the attribute StorageRemovableT ype, 1 1 scription in L . Its abstraction in L is unique: it is which is monovalued and takes its value in the set 1 fatt ; ; : : :; attng. 2 fSuperDisk; ZIP; JAZ g. 1  Let C ; : : :; Cn be n L class descriptions. Their The following propositions state the main properties of 1 2 least common subsumer is unique: it is made of the set L3. Their proofs follow from results in tractable descrip- of attributes that are common to all the Ci 's. tion logics where structural subsumption is complete. 2.4 The language of classes L3 Proposition 2 (Characterization of subsumption in L ) 3 Let C and C be two L class descriptions. C L3 C 1 2 3 1 2 L is richer than L on di erent aspects: it makes possi- 3 2 i all the attributes appearing in C1 appear also in C2 ble to restrict the possible values of an attribute ; it en- and for every pair attsuf f : V appearing in C2, ables to distinguish the number of values of an attribute - when suff = , if there exists attsuf f : V 0 2 C1, then 0 through di erent suxes ( ; + ; ? ; ) whose notation is V0  V, inspired by the one used in XML for describing docu- - when suff = +, there exists V 0  V s.t att+ : V 0 2 C1 ment type de nitions (DTDs), and whose formal seman- or att : V 0 2 C1 tics corresponds to standard description logics construc- - when suff = ?, if there exists attsuf f : V 0 2 C1 , then 0 tors. In fact, as it will become clearer in the following, suff 0 = ? or suff 0 = , and V 0  V , L3 is a subset of the C-CLASSIC description logic [7]. - when suff = , there exists V 0 s.t V 0  V and att : V 0 De nition 8 (Class description in L ) A L class 3 3 2 C1. description (of size n) is a tuple The complexity of checking subsumption in L3 is quadratic w.r.t the maximal size of class descriptions. fatt : V ; : : :; attsuf suf f1 f : V ng n Proposition 3 (Characterization of abstraction in L ) 1 1n where 8i 2 [1::n], atti 2 A, Vi  V , and suffi 2 3 Let fatt = V ; : : :; attn = Vn g be an instance de- f; +; ?; g 1 1 scription in L . Its abstraction in L is unique: 1 3 The following de nition formalizes the membership re- absL3 = fattsuf f1 : V ; : : :; attsuf 1 fn : Vn g, where 8i 2 [1::n], if j Vi j 2 then suffi = + else suffi = . 1 n lation between an instance and a class description in L3. Proposition 4 (Characterization of lcs in L ) Let 3 2. In the second step, a lattice of clusters is constructed C ; : : :; Cn be n L class descriptions. Let A be the set 1 3 by gathering basic classes according to similarities of attributes belonging to at least one description Ci . of their L2 descriptions. In this step, clusters are C1; : : :; Cn have a unique least common subsumer in unions of basic classes. The computational com- L3, whose description is characterized as follows: plexity of this step does not depend on the number  for every attribute att 2 A, let V be the union of of initial data but only on the size of the L2 descrip- the sets of values associated with att in the class tions of basic classes. S descriptions Ci's: V = 1 fv 2 Vi j att n suf f : Vi 2 We now detail this second step. A cluster ci1 : : :ci in the lattice if the L2 descriptions of the k Ci g. will appear classes { att:V 2 lcs(C1; : : :; Cn) i att:Vi 2 Ci 8i 2 their instances. c i 1 : : k :ci are judged similar enough to gather [1::n]. The similarity between class descrip- tions is stated by attributes in common. However, we { att :V 2 lcs(C1; : : :; Cn) i ? take into account only attributes that do not occur in (8i 2 [1::n] att :Vi 62 Ci and att+ :Vi 62 Ci ),and too many (or all the) classes. For instance, the attribute  either 9i 2 [1::n] s.t. att? :Vi 2 Ci , price may appear in all the instances of a catalog describ-  or 9i 2 [1::n] s.t. atts :V 0 62 Ci for any s. ing products, and is therefore not useful to discriminate { att :V 2 lcs(C1; : : :; Cn) i product descriptions. Among the set A of attributes,  either 9i 2 [1::n] s.t. att :Vi 2 Ci ,  we select meaningful attributes as being the attributes  or 9i 2 [1::n] s.t. att :Vi 2 Ci , and 9j 2 a 2 A such that jclasses + jclassesj  s where s is a certain (a)j [1::n] s.t. att? :Vj 2 Cj or atts :V 0 62 Cj for threshold (e.g., s = 0:8). Let A0 be the set of mean- 0 any sux s0 . ingful attributes. We redescribe all the basic classes in { att :V 2 lcs(C1; : : :;+ Cn) i + terms of the attributes of A0 only: for a basic class c, we 9i +2 [1::n] s.t. att  :Vi 2 Ci and 8j 2 [1::n], call its short description, denoted shortdesc(c), the L2 att :Vj 2 Cj or att :Vj 2 Cj description of c restricted to the meaningful attributes: shortdesc(c) = desc(c) \ A0. For example, if C2 is the L3 description: Our clustering algorithm, L2-Cluster, is described in fCompatibility :fPC; Unix g , Algorithm 1. It is adapted from a frequent item set StorageRemovableType :fDAT g, algorithm ([2]). It iteratively builds levels of clusters, CompressedCapacity : f8; 24; 32; 70gg starting with building the level of the coarsest clusters lcs(C1 ; C2)= corresponding to unions of classes having atleast one at- fRemovableDiskDrive?? : ftrueg, tribute in common. Each iteration k is guided by at- CD=DV D=ReadSpeed :f20x; 32x; 24xg, tribute sets of increasing size k which, being common to CD=DV D=Type :fCDROM; CDRW g, ? some class descriptions, are the support of the creation of Compatibility :fMAC; P C; Unixg, a potential node gathering those classes. Among those StorageRemovableType :fSuperDisk; ZIP; JAZ; DAT g, potential nodes, we e ectively add to the lattice those CompressedCapacity? :f8; 24; 32;70gg whose L2 short description is equal to their k-support: Computing the lcs of L3 descriptions is linear in the k-itemsetthe k-support of a node generated at iteration k is the number of descriptions and quadratic in their size. supporting the generation of that node. By doing so, we guarantee that the description of the nodes 3 Construction of a lattice of L2 classes added fathers. to the lattice is strictly subsumed by those of their The goal is to structure a set of data described in L1 Notation: We call a k-itemset a set of attributes of size into clusters labelled by L2 descriptions, and organized k. We assume that attributes in itemsets are kept sorted in a lattice providing a browsable semantic interface fa- in their lexicographic order. We use the notation p[i] to cilitating the access to the data for end-users. represent the i-th attribute of the k-itemset p consisting We proceed to a two-step clustering: of the attributes p[1]; : : :; p[k] where p[1] < : : : < p[k]. 1. In the rst step, the data are partitioned according when Figure 1 shows the lattice returned by L2-Cluster to their type: for each type c, we create a basic class fa1; a2it; ais3; aapplied on C = fc1 ; c2; c3; c4; c5g and A = 4 g such that: named c. Its set of instances, denoted inst(c), is the shortdesc(c1 ) = fa1; a2; a3g shortdesc(c2 ) = fa2g set of data of type c. Its L2 description, desc(c), is shortdesc(c3 ) = fa1; a3g shortdesc(c4 ) = fa3; a4g obtained by computing the least common subsumer shortdesc(c5 ) = fa1; a3g of the abstractions of its instances. The result of this step is a set C of basic classes and a set A of the algorithm Lproposition The following 2 -Cluster. summarizes the properties of attributes supporting the L2 descriptions of the classes of C . For each attribute a, the set classes(a) Proposition 5 (Properties of L2 -Cluster) Let H be of basic classes having a in their description is com- the lattice returned by L2-Cluster. puted. This preliminary clustering step has a linear  For each node n 2 H, let shortdesc(n) and data complexity. classes(n) be respectively the description and the set {a3} c1 c3 c4 c5 Require: a set A of meaningful attributes: for each 0 a 2 A , classes(a) is the set of basic classes of C 0 whose L short description contains a. 2 Ensure: return a lattice organized in levels of nodes. {a1 a3} c1 c3 c5 Each node n is characterized by classes(n): the basic classes it gathers, and shortdesc(n): the least com- {a2} c1 c2 mon subsumer of the short description of the basic classes of the cluster. 1: (* Initialization step gathering the biggest unions of {a1 a2 a3} c1 {a3 a 4} c4 classes having atleast one attribute in common:*) 2: A1 A0 , level(1) ;; : : :; level(j C j) ; Figure 1: Example of a lattice constructed by L2-Cluster 3: for every a 2 A1 do 4: let classes(a) = fca1 ; : : :; caj g of basic classes returned by L2 -Cluster: 5: let desc = lcsL2 (desc(ca1 ); : : :; desc(caj )) 6: if desc \ A0 = fag then shortdesc(n) = lcsL2 (abstL2 (i1 ); : : :; abstL2 (ik )) 7: add to level(j) a node n such that: S where fi1; : : :; ik g = c2classes(n) inst(c). classes(n) = fca1 ; : : :; cajg ; shortdesc(n) = desc \ A0;  H is a Galois lattice, i.e. for every node n, the pair node(fag) = n (classes(n); shortdesc(n)) is maximal in the follow- 8: k 1 ing sense: there is no m 2 H such that classes(n)  9: (* Generation of new nodes supported by classes(m) and shortdesc(n) = shortdesc(m), or k + 1-itemsets : *) shortdesc(n)  shortdesc(m) and classes(n) = 10: repeat classes(m). 11: for every pair (p; q) 2 Ak do  The worst time complexity of L -Cluster is expo- if p[1] = q[1]; : : :; p[k ; 1] = q[k ; 1]; p[k] < q[k] 2 12: nential in the maximal size of the basic classes L then descriptions. 2 13: let newp = p [ fq[k]g, and let Sk be the set 14: of k-subsets of newp. if Sk  Ak and classes(node(p)) \ 4 Re nement in L3 classes(q[k]) = 6 ; then The goal of this step is to re ne a part of the lattice H 15: add newp to Ak+1 computed by L2 -Cluster based on the more expressive 16: let fci1 ; : : :; ci g be classes(node(p)) \ language L3. This step is achieved after a user chooses classes(q[k]) j one node Fatn and one of its descendants Sonn in H. 17: let desc = lcsL2 (desc(ci1 ); : : :; desc(ci )) Algorithm 2 describes how new nodes are possibly added 18: if desc = newp then j between Sonn and Fatn. Those new nodes correspond 19: add to level(j) a node n such that: to clusters whose descriptions in L2 did not distinguish classes(n) = fci1 ; : : :; ci g ; from those of Fatn or Sonn, while having distinct de- shortdesc(n) = desc; j scriptions in L3. A closure operation on those nodes is node(newp) = n necessary in order to make their L3 descriptions maxi- 20: k k + 1 mal w.r.t the union of basic classes which they gather. 21: until Ak = ; L3-Cluster applies after the descriptions in L3 (denoted 22: (* Creation of the lattice. For every node n, desc3 in Algorithm 2) have been computed for Sonn and Fathers(n) group the fathers of n among the nodes Fatn. Those computations are least common subsumer of greater levels:*) calculations whose overall time cost is polynomial w.r.t 23: Initialize Fathers(n) and AncNotFathers(n) to ; to the size and the number of the instances of the basic for every generated node n. classes involved in Fatn. 24: for i =j C j ;1 downto 1 do Let us illustrate the application of L3-Cluster on the 25: for every node n 2 level(i) do nodes c1 c3 c5 and c1 of Figure 1, assuming that the L3 26: for j = i + 1 to j C j do descriptions of+the involved +basic classes are: 27: for every node m 2 level(j) do desc(c1 )=fatt1 :fv1; v3g; att2 :fv2; v4g; att3:fv6gg 28: if classes(n)  classes(m) desc(c3 )=fatt1:fv3g; att?2:fv4g; att3:fv7gg and m 62 AncNotFathers(n) then desc(c5 )=fatt1:fv5g; att3:fv7; v8gg. 29: add m to Fathers(n) LRes-Cl and L-Cl are initialized to ffc1; c3; c5g; fc1gg. 30: add F athers(m) [ AncNotFathers(m)  Gathering+ c3 with c1 isconsidered rst: to AncNotFathers(n) desc3=fatt1 :fv1; v3g; att2:fv2; v4g; att3:fv6; v7gg, classes=fc1; c3g Algorithm 1: L2-Cluster Since desc3(c5) is not subsumed by desc3, the node c1c3 is added. LRes-Cl becomes ffc1,c3,c5g,fc1g,fc1,c3gg. Require: Two nodes Fatn and Sonn such that 5 Conclusion and Discussion classes(Sonn)  classes(F atn). This paper has proposed an approach to organize into Ensure: return a lattice between Fatn and Sonn clusters large sets of semistructured data. The scaling 1: L-Cl fclasses(Fatn); classes(Sonn)g up of the approach is made possible because its complex- 2: LRes-Cl L-Cl ity is remained in control in di erent ways: (1) the data 3: Nodes fSonng are aggregated into basic classes and the clustering ap- 4: for every node n 2 Nodes do plies on the set of those basic classes instead of applying 5: for every class c 2 classes(F atn) n classes(n) do on the data set (2) the two-step clustering method rst 6: Change false builds a coarse hierarchy, based on a simple language 7: classes classes(n) [ fcg for describing the clusters, and uses a more elaborate 8: if classes 62 L-Cl then language for re ning a small subpart of the hierarchy 9: L-Cl L-Cl [fclassesg delimited by two nodes. 10: desc3 lcsL3 (desc3(n); desc3(c)) Experimental results: We have evaluated our ap- 11: (* Closure operation: *) proach using a real dataset composed of 2274 computer 12: for every class Cl 2 classes(F atn) n classes products extracted from the C/Net catalog. Each prod- do uct is described using a subset of 234 attributes, possibly 13: if desc3(Cl) L3 desc3 then multi-valued. There are 59 types of products and each 14: add Cl to classes product is labelled by one and only one type. The goal 15: Change true of the experiment was twofold : to assess the eciency 16: if classes 62 LRes-Cl then and the simplicity of the lattice for the rst clustering 17: add a new node p to Nodes such that step and to show the accuracy of the re nement of a part classes(p) = classes and desc3(p) = desc3 of the lattice using the second clustering step. 18: LRes-Cl LRes-Cl [fclasses(p)g In order to make the L2 lattice even simpler, the 19: if Change = true then number of nodes obtained with L2-Cluster may be 20: L-Cl L-Cl [classes(p) parametrized by a threshold n used to restrict the nodes 21: Suppress n from Nodes that appear in the lattice to gather at least n basic Algorithm 2: L3-Cluster classes. Figure 2 shows that, as it is mentionned in [15], this quantitative criteria allows us to signi cantly de- crease the size of the lattice.  Gathering c5 with c1 is now considered: desc3=fatt+1 :fv1; v3; v5g; att2:fv2; v4g; att3:fv6; v7; v8gg, classes = fc1; c5g Threshold 1 2 3 4 5 Since desc3(c3) is subsumed by desc3, c3 is added to Number of nodes 119 60 24 13 9 classes, which is updated to fc1; c3; c5g. The node corresponding to c1 c5 is not added since it is not closed, Maximal depth 4 3 2 1 1 the node corrresponding to its closure c1 c3 c5 is not Running Times (s) 12.3 10.4 2.1 1.1 1 added either because fc1; c3; c5g is already in LRes-Cl. The following proposition summarizes the main prop- erties of the algorithm L3-Cluster. Figure 2: Quantitative results of L2 -Cluster Proposition 6 (Properties of L3-Cluster) Let C1 Figure 3 illustrates the simplicity of the L2 descrip- be the set of basic classes of the father node. Let C2 tions and the signi cance of the nodes. (C2  C1) be the set of basic classes of the son node. Let L3-Cluster allows to distinguish nodes that can- H3 be the lattice returned by L3-Cluster. not be distinguished by L2 -Cluster. For instance,  For each node n 2 H3, let desc3(n) and classes(n) if L3-Cluster is applied when an end-user chooses be respectively the description and the set of basic to re ne the L2 lattice between the node (a) classes returned by L3 -Cluster: and the node (b) in Figure 3, the aggregation of all types of drivers (i.e. RemovableTapeDrive, desc3(n) = lcsL3 (abstL3 (i1 ); : : :; abstL3 (ik )) RemovableDiskDrive and HardDiskDrive) is part S where fi1 ; : : :; ik g = c2classes(n) inst(c). of the L3 lattice. This new cluster appears for the following reasons : no driver is described  H3 is a complete Galois lattice, i.e. for every node using the attributes StorageController=RAIDLevel, n, the pair (classes(n); desc3(n)) is maximal, and Networking=DataLinkProtocol or Networking=T ype H contains every node verifying the maximality (those attributes were optional in L3 description of 3 criteria and whose set of classes includes C and (a)). In addition, the value SCSI for the attribute are included in C . 2 StorageController=T ype is not possible for a driver. 1 Related work: Our work can be compared with exist-  The worst time complexity of L -Cluster is exponen- 3 ing work in machine learning based on more expressive tial w.r.t j C j ; j C j. 1 2 languages than propositional language and/or using a {Stor./Type, {Input Dev/Type} Perspectives: We plan to extend our current work to Stor. Controller/Type} take nested attributes and textual values into account in Hard Disk Drive Key-Entry Device L in order to fully deal with XML data. 3 Pointing Device References Removable Disk Drive Game Controller (a) Network Storage Communic. Device Removable Tape Drive Handheld/PDA [1] R. Agrawal, J. Gehrke, D. Gunopulos, and Laptop P. Raghavan. Automatic subspace clustering of high {Stor./Type, dimensional data for data mining applications. SIG- Stor. Controller/Type, {Input Dev/Type, MOD Record, 27:94{105, 1998. Removable Stor/Type} Form Factor} [2] R. Agrawal and R. Srikant. Fast algorithms for min- Removable Disk Drive Key-Entry Device ing association rules. In VLDB-94, Santiago, Chile. Network Storage Pointing Device [3] F. Baader, R. Kusters and R. Molitor. Comput- Removable Disk Drive Game Controller ing Least Common Subsumers in Description Logics with Existencial Restrictions. In IJCAI-99, Stock- holm, Sweden. {Stor./Type, [4] G. Bisson. Conceptual clustering in a rst order (b) Removable Disk Drive Stor. Controller/Type, Removable Stor/Type, logic representation. In ECAI-92, Vienne, Austria. CD/DVD/Type} [5] H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In ICML-98, Morgan Figure 3: A part of the L2 lattice for C/net (n=3) Kaufmann, 1998. [6] P. Brezellec and H. Soldano. Tabata: a learning algorithm performing a bidirectional search in a re- shift of representation. Most work on expressive lan- duced search space using a tabu strategy. In ECAI- guages has been developped in a supervised setting (e.g. 98, Brighton, Angleterre, 1998. Inductive Logic Programming), while little work exists [7] W. W. Cohen and H. Hirsh. Learning the CLAS- in an unsupervised setting. We can cite KBG [4], TIC [5] SIC description logic: Theoretical and experimental and [11] which perform clustering in a relational setting. results. In KR-94, 1994. The main di erence with our approach is that they use [8] W. Van de Velde. Learning through progres- a distance as a numerical estimation of similarity. Al- sive re nement. In Pitman, editor, Proceedings of though the best representation of a cluster is the least the third European Working Session on Learning common subsumer of its instances, they approximate it (EWSL'98),London, 1983. numerically by the cluster centroid (i.e., the point that [9] J. Ganascia. Tdis: an algebraic formalization. In minimizes the sum of squared distances). The reason is IJCAI-93,Vancouver, Canada, 1993. that, in contrast with our setting where the lcs compu- tation in L3 is polynomial, lcs computing in their rst- [10] J. U. Kietz and K. Morik. A polynomial approach to order language may be exponential.[3] presents lcs com- the constructive induction of structural knowledge. puting for a more expressive language than L3 but for Machine Learning, 14(2):193{217, 1994. this language subsumption is exponential. KLUSTER [11] Mathias Kirsten and Stephan Wrobel. Extending [10] re nes a basic taxonomy of concepts in the setting k-means clustering to rst-order representations. In of a description logic for which computing lcs is poly- J. Cussens and A. Frish, editors, Proc. of the 10th nomial. In KLUSTER, the clusters are not unions but International Conference on Inductive Logic Pro- subconcepts of primitive concepts, and the re nement gramming, Springer Verlag, 2000. aims at learning discriminating de nitions of mutually [12] R.S. Michalski and R.E. Stepp. Learning from ob- disjoint subconcepts of a same concept. As for the use servation: Conceptual clustering. In Morgan Kauf- of a shift of representation, it is used in supervised learn- mann, editor, Machine learning : An arti cial in- ing in order to increase accuracy (i.e. the proportion of telligence approach, 1983. correctly predicted concepts in a set of test examples) [8; 14] or to search eciently a reduced space of concepts [9; [13] G. Stumme. Local Scaling in Conceptual data Sys- 6]. In unsupervised learning, shift of representational tems. In Springer Verlag, editor, ICCS, LNAI 1115, bias may be used to change the point of view about the 1996. data [12; 13]. For instance, Cluster/2 [12] provides a [14] P. E. Utgo . Shift of bias for inductive concept user with a set of parameters about his preferences on learning. In Morgan Kaufmann,ed., Machine learn- the concepts to be created. Finally, the two-step clus- ing : An arti cial intelligence approach Vol. 2, 1986. tering approach proposed in [1] is similar in spirit with [15] R. Wille. Restructuring lattice theory. In Sym- our clustering in L2 since it rst identi es basic clusters posium on Ordered Sets, University of Calgary, (as high density clusters) before building more general Boston, 1982. clusters that are unions of those basic clusters.