<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Automatic Construction and Re nement of a Class Hierarchy o ver Semistructured Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nathalie</forename><surname>Pernelle</surname></persName>
							<email>fpernelle@lri.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">University o f</orgName>
								<address>
									<addrLine>P aris-Sud Building 490</addrLine>
									<postCode>91405</postCode>
									<settlement>Orsay Cedex</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marie-Christine</forename><surname>Rousset</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University o f</orgName>
								<address>
									<addrLine>P aris-Sud Building 490</addrLine>
									<postCode>91405</postCode>
									<settlement>Orsay Cedex</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Veronique</forename><surname>Ventos</surname></persName>
							<email>ventosg@lri.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">University o f</orgName>
								<address>
									<addrLine>P aris-Sud Building 490</addrLine>
									<postCode>91405</postCode>
									<settlement>Orsay Cedex</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Automatic Construction and Re nement of a Class Hierarchy o ver Semistructured Data</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3315BC45AF33181D34B66AC0D4B7FEA4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In many applications, it becomes crucial to help users to access to a huge amount of data by clustering them in a small number of classes described at an appropriate level of abstraction. In this paper, we present an approach based on the use of two languages of description of classes for the automatic clustering of semistructured data. The rst language of classes has a high power of abstraction and guides the construction of a lattice of classes covering the whole set of the data. The second language of classes, more expressive and more precise, is the basis for the re nement of a part of the lattice that the user wants to focus on. Our approach has been implemented and experimented on real data in the setting of the GAEL project 1 which aims at building exible electronic catalogs organized as a hierarchy of classes of products. Our experiments have been conducted on real data coming from the C/Net (http://www.cnet.com) electronic catalog of computer products.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Languages of instances and classes</head><p>In this section, we de ne the language of instances, L 1 , in which w e describe semistructured data, and the two languages of classes L 2 and L 3 that we use to describe classes over those data, at two levels of abstraction. First, we provide some notations and preliminaries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Preliminaries and notations</head><p>Given a language of instances, a language of classes L de nes the expressions that are allowed as class descriptions. A class description is intended to represent i n a n abstract and concise way the properties that are common to the set of its instances. A membership relation, denoted by isa L , establishes the necessary connection between a given language of instances and an associated language of classes L. De nition 1 (Extension of a class description) Let I be a set of instances, and C a L class description.</p><p>The extension of C w.r.t I is the following set: ext I (C) = fi 2 I j i isa L Cg 1 GAEL is funded by the French Ministry of Industry it is a joint project with the Verso database group of INRIA and a startup, MatchVision, specialized in Electronic Commerce</p><p>The subsumption relation is a preorder relation between class descriptions, induced by the inclusion relation between class extensions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>De nition 2 (Subsumption between classes)</head><p>Let C 1 and C 2 be t w o L class descriptions. C 1 is subsumed b y C 2 , denoted C 1 L C 2 , i for every set I of instances, ext I (C 1 ) ext I (C 2 ).</p><p>In sections 2.3, and 2.4, we will provide a constructive characterization of subsumption for the two languages of classes that we consider.</p><p>The notion of abstraction of an instance in a language of classes L corresponds, when it exists, to the most speci c class description in L which it is an instance of. De nition 3 (Abstraction of an instance) Let i be an instance, the L class description C is an abstraction of i in L (for short C = abs L (i)) i 1. i isa L C, and 2. if D is a class description such that i isa L D, t h e n C L D. The notion of least common subsumer (a.k.a msg) will be the basis for gathering classes in our clustering algorithm.</p><p>De nition 4 (Least Common Subsumer) Let C 1 : : : C n be class descriptions in L. The L class description C is a least common subsumer of C 1 : : : C n in L (for short C = lcs L (C 1 : : :</p><formula xml:id="formula_0">C n )) i 1. C i L C for all 1 i n, a n d 2. if D is a class description satisfying C i L D for all 1 i n, then C L D</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">The language of instances</head><p>The data that serve as instances of the classes that we build are semistructured data in the following sense: each data is described by a set of pairs (Attribute, Values) but the attributes used for describing the data may vary from an item to another. In addition, each d a t a i s typed (i.e., labelled by the name of a basic type).</p><p>De nition 5 (Terms of L 1 ) Let B be a nite set of basic types, A a nite set of attributes, a n d V a set of values. A term of L 1 is of the form: fc att 1 = V 1 : : : a t t n = V n g where c 2 B , 8i 2 The connection between the language of instances L 1 and the language of classes L 2 is based on the following de nition of the membership relation.</p><p>De nition 7 (Membership relation for L 2 ) Let i be an instance description in L 1 . L et C be a L 2 class description: i is an instance o f C i every attribute appearing in C also appears in i.</p><p>The following proposition, whose proof is straightforward, characterizes subsumption, least common subsumer and abstraction in L 2 . Proposition 1 (Properties of L 2 ) .</p><p>Let C 1 and C 2 be two L 2 class descriptions. C 1 L2</p><p>C 2 i every attribute of C 2 is also an attribute of C 1 .</p><p>Let fatt 1 = V 1 : : : a t t n = V n g be an instance d escription in L 1 . Its abstraction in L 2 is unique: it is fatt 1 : : : a t t n g.</p><p>Let C 1 : : : C n be n L 2 class descriptions. Their least common subsumer is unique: it is made of the set of attributes that are c ommon to all the C i 's.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.4</head><p>The language of classes L 3 L 3 is richer than L 2 on di erent aspects: it makes possible to restrict the possible values of an attribute it enables to distinguish the numb e r o f v alues of an attribute through di erent su xes ( + ? ) whose notation is inspired by the one used in XML for describing document t ype de nitions (DTDs), and whose formal semantics corresponds to standard description logics constructors. In fact, as it will become clearer in the following, : V 1 : : : a t t suf fn n : V n g where 8i 2 1::n], att i 2 A , V i V , and suff i 2 f + ? g</p><p>The following de nition formalizes the membership relation between an instance and a class description in L 3 .</p><p>De nition 9 (Membership relation for L 3 ) Let i be an instance description in L 1 . L et C be a L 3 class description. i is an instance o f C i every attribute in i appears in C and for every term att suf f : V appearing in C, -when suff = , i f t h e r e exists V 0 s.t att = V 0 2 i, t h e n V 0 V , -when suff = + , there exists V 0 V s.t att = V 0 2 i, -when suff =?, if there exists V 0 s.t att = V 0 2 i, t h e n V 0 is a singleton and V 0 V , -when suff = , there exists V 0 singleton s.t V 0 V and att = V 0 2 i.</p><p>The product described in 2. i all the attributes appearing in C 1 appear also in C 2 and for every pair att suf f : V appearing in C 2 , -when suff = , i f t h e r e e x i s t s att suf f 0 : V 0 2 C 1 , t h e n V 0 V , -when suff = + , there exists V 0 V s.t att + : V 0 2 C 1 or att : V 0 2 C 1 -when suff = ?, if there e x i s t s att suf f 0 : V 0 2 C 1 , t h e n suff 0 = ? o r suff 0 = , a n d V 0 V , -when suff = , there exists V 0 s.t V 0 V and att :</p><formula xml:id="formula_1">V 0 2 C 1 .</formula><p>The complexity o f c hecking subsumption in L 3 is quadratic w.r.t the maximal size of class descriptions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proposition 3 (Characterization of abstraction in L 3 )</head><p>Let fatt 1 = V 1 : : : a t t n = V n g be an instance description in L 1 . Its abstraction in L 3 is unique: abs L3 = fatt suf f1 1 : V 1 : : : a t t suf fn n : V n g, where 8i 2 1::n], i f j V i j 2 then suff i = + else suff i = .</p><p>Proposition 4 (Characterization of lcs in L 3 ) Let C 1 : : : C n be n L 3 class descriptions. Le t A b e the set of attributes belonging to at least one description C i . C 1 : : : C n have a unique least common subsumer in L 3 , whose description is characterized as follows:</p><p>for every attribute att 2 A, l e t V be the union of the sets of values associated with att in the class</p><formula xml:id="formula_2">descriptions C i 's: V = S n 1 fv 2 V i j att suf f : V i 2 C i g.</formula><p>{ att :V 2 lcs(C 1 : : : C n ) i att :V i 2 C i 8i 2 Computing the lcs of L 3 descriptions is linear in the number of descriptions and quadratic in their size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Construction of a lattice of L 2 classes</head><p>The goal is to structure a set of data described in L 1 into clusters labelled by L 2 descriptions, and organized in a lattice providing a browsable semantic interface facilitating the access to the data for end-users. We proceed to a two-step clustering:</p><p>1. In the rst step, the data are partitioned according to their type: for each t ype c, w e create a basic class named c. Its set of instances, denoted inst(c), is the set of data of type c. Its L 2 description, desc(c), is obtained by computing the least common subsumer of the abstractions of its instances. The result of this step is a set C of basic classes and a set A of attributes supporting the L 2 descriptions of the classes of C. F or each attribute a, the set classes(a)</p><p>of basic classes having a in their description is computed. This preliminary clustering step has a linear data complexity.</p><p>2. In the second step, a lattice of clusters is constructed by gathering basic classes according to similarities of their L 2 descriptions. In this step, clusters are unions of basic classes. The computational complexity of this step does not depend on the number of initial data but only on the size of the L 2 descriptions of basic classes. We n o w detail this second step. A cluster c i1 : : : c ik will appear in the lattice if the L 2 descriptions of the classes c i1 : : : c ik are judged similar enough to gather their instances. The similarity b e t ween class descriptions is stated by attributes in common. However, we take i n to account only attributes that do not occur in too many (or all the) classes. For instance, the attribute price may appear in all the instances of a catalog describing products, and is therefore not useful to discriminate product descriptions. Among the set A of attributes, we select meaningful attributes as being the attributes a 2 A such that jclasses(a)j jclassesj s where s is a certain threshold (e.g., s = 0 :8). Let A 0 be the set of meaningful attributes. We redescribe all the basic classes in terms of the attributes of A 0 only: for a basic class c, w e call its short description, d e n o t e d shortdesc(c), the L 2 description of c restricted to the meaningful attributes:</p><formula xml:id="formula_3">shortdesc(c) = desc(c) \ A 0 .</formula><p>Our clustering algorithm, L 2 -Cluster, is described in Algorithm 1. It is adapted from a frequent i t e m s e t algorithm <ref type="bibr" target="#b1">( 2]</ref>). It iteratively builds levels of clusters, starting with building the level of the coarsest clusters corresponding to unions of classes having atleast one attribute in common. Each iteration k is guided by a ttribute sets of increasing size k which, being common to some class descriptions, are the support of the creation of a potential node gathering those classes. Among those potential nodes, we e ectively add to the lattice those whose L 2 short description is equal to their k-support:</p><p>the k-support of a node generated at iteration k is the k-itemset supporting the generation of that node. By doing so, we guarantee that the description of the nodes added to the lattice is strictly subsumed by those of their fathers.</p><p>Notation: We c a l l a k-itemset a set of attributes of size k. W e assume that attributes in itemsets are kept sorted in their lexicographic order. We use the notation p i] t o represent the i-th attribute of the k-itemset p consisting of the attributes p 1] : : : p k] where p 1] &lt; : : : &lt; p k].</p><p>Figure <ref type="figure" target="#fig_2">1</ref> shows the lattice returned by L 2 -Cluster when it is applied on C = fc 1 c 2 c 3 c 4 c 5 g and A = fa 1 a 2 a 3 a 4 g such that:</p><p>shortdesc(c 1 ) = fa 1 a 2 a 3 g shortdesc(c 2 ) = fa 2 g shortdesc(c 3 ) = fa 1 a 3 g shortdesc(c 4 ) = fa 3 a 4 g shortdesc(c 5 ) = fa 1 a 3 g</p><p>The following proposition summarizes the properties of the algorithm L 2 -Cluster. Proposition 5 (Properties of L 2 -Cluster) Let H be the lattice r eturned b y L 2 -Cluster.</p><p>For each node n 2 H, let shortdesc(n) and classes(n) be r espectively the description and the set Require: a set A 0 of meaningful attributes: for each a 2 A 0 , classes(a) is the set of basic classes of C whose L 2 short description contains a. Ensure: return a lattice organized in levels of nodes.</p><p>Each n o d e n is characterized by classes(n): the basic classes it gathers, and shortdesc(n): the least common subsumer of the short description of the basic classes of the cluster. if p 1] = q 1] : : : p k ; 1] = q k ; 1] p k] &lt; q k] then 13: let newp = p f q k]g, and let S k be the set of k-subsets of newp. where fi 1 : :</p><formula xml:id="formula_4">: i k g = S c2classes(n) inst(c).</formula><p>H is a Galois lattice, i.e. for every node n, the pair 4 Re nement i n L 3</p><formula xml:id="formula_5">(classes(n) shortdesc(n)) is</formula><p>The goal of this step is to re ne a part of the lattice H computed by L 2 -Cluster based on the more expressive language L 3 . T h i s s t e p i s a c hieved after a user chooses one node F a t nand one of its descendants Sonn in H.</p><p>Algorithm 2 describes how new nodes are possibly added between Sonn and F a t n . Those new nodes correspond to clusters whose descriptions in L 2 did not distinguish from those of F a t nor Sonn, while having distinct descriptions in L 3 . A closure operation on those nodes is necessary in order to make their L 3 descriptions maximal w.r.t the union of basic classes which they gather.</p><p>L 3 -Cluster applies after the descriptions in L 3 (denoted desc3 in Algorithm 2) have been computed for Sonn and Fatn. Those computations are least common subsumer calculations whose overall time cost is polynomial w.r.t to the size and the number of the instances of the basic classes involved in Fatn.</p><p>Let us illustrate the application of L 3 -Cluster on the nodes c1 c3 c5 a n d c1 of Figure <ref type="figure" target="#fig_2">1</ref>, assuming that the L 3 descriptions of the involved basic classes are: desc(c 1 )=fatt + 1 :fv1 v 3g att + 2 :fv2 v 4g a t t 3 :fv6gg desc(c 3 )=fatt 1 :fv3g a t t ? 2 :fv4g att 3 :fv7gg desc(c 5 )=fatt 1 :fv5g a t t 3 :fv7 v 8gg.</p><p>LRes-Cl and L-Cl are initialized to ffc1 c 3 c 5g fc1gg.</p><p>Gathering c3 with c1 is considered rst: desc3=fatt + 1 :fv1 v 3g att 2 :fv2 v 4g a t t 3 :fv6 v 7gg, classes=fc1 c 3g Since desc3(c5) is not subsumed by desc3, the node c1c3 is added. LRes-Cl becomes ffc1,c3,c5g,fc1g,fc1,c3gg.</p><p>Require: Two nodes F a t n and Sonn such that classes(Sonn) classes(F a t n ).</p><p>Ensure: return a lattice between Fatn and Sonn 1: L-Cl f classes(Fatn) c l a s s e s (Sonn)g where fi 1 : :</p><formula xml:id="formula_6">: i k g = S c2classes(n) inst(c).</formula><p>H 3 is a complete Galois lattice, i.e. for every node n, the pair (classes(n) desc3(n)) is maximal, and H 3 contains every node verifying the maximality criteria and whose set of classes includes C 2 and are included i n C 1 .</p><p>The worst time complexity of L 3 -Cluster is exponential w.r.t j C 1 j ; j C 2 j.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Discussion</head><p>This paper has proposed an approach to organize into clusters large sets of semistructured data. The scaling up of the approach is made possible because its complexity is remained in control in di erent w ays: (1) the data are aggregated into basic classes and the clustering applies on the set of those basic classes instead of applying on the data set (2) the two-step clustering method rst builds a coarse hierarchy, based on a simple language for describing the clusters, and uses a more elaborate language for re ning a small subpart of the hierarchy delimited by t wo nodes.</p><p>Experimental results: We h a ve e v aluated our approach using a real dataset composed of 2274 computer products extracted from the C/Net catalog. Each p r o duct is described using a subset of 234 attributes, possibly multi-valued. There are 59 types of products and each product is labelled by one and only one type. The goal of the experiment w as twofold : to assess the e ciency and the simplicity of the lattice for the rst clustering step and to show the accuracy of the re nement of a part of the lattice using the second clustering step.</p><p>In order to make t h e L 2 lattice even simpler, the number of nodes obtained with L 2 -Cluster may b e parametrized by a threshold n used to restrict the nodes that appear in the lattice to gather at least n basic classes. Figure <ref type="figure" target="#fig_5">2</ref> shows that, as it is mentionned in 15], this quantitative criteria allows us to signi cantly decrease the size of the lattice. L 3 -Cluster allows to distinguish nodes that cannot be distinguished by L 2 -Cluster. For instance, if L 3 -Cluster is applied when an end-user chooses to re ne the L 2 lattice between the node (a) and the node (b) in Figure <ref type="figure" target="#fig_0">3</ref>, the aggregation of all types of drivers (i.e. RemovableT apeDrive, RemovableDiskDrive and HardDiskDrive) is part of the L 3 lattice. This new cluster appears for the following reasons : no driver is described using the attributes StorageController=RAIDLevel, Networking=DataLinkProtocol or Networking=T ype (those attributes were optional in L 3 description of (a)). In addition, the value SCSI for the attribute StorageController=T ype is not possible for a driver.</p><p>Related work: Our work can be compared with existing work in machine learning based on more expressive languages than propositional language and/or using a Figure <ref type="figure" target="#fig_0">3</ref>: A part of the L 2 lattice for C/net (n=3) shift of representation. Most work on expressive l a nguages has been developped in a supervised setting (e.g. Inductive Logic Programming), while little work exists in an unsupervised setting. We can cite KBG 4], TIC 5] and 11] which perform clustering in a relational setting. The main di erence with our approach is that they use a distance as a numerical estimation of similarity. Although the best representation of a cluster is the least common subsumer of its instances, they approximate it numerically by the cluster centroid (i.e., the point that minimizes the sum of squared distances). The reason is that, in contrast with our setting where the lcs computation in L 3 is polynomial, lcs computing in their rstorder language may be exponential. 3] presents lcs computing for a more expressive language than L 3 but for this language subsumption is exponential. <ref type="bibr">KLUSTER 10]</ref> re nes a basic taxonomy of concepts in the setting of a description logic for which computing lcs is polynomial. In KLUSTER, the clusters are not unions but subconcepts of primitive concepts, and the re nement aims at learning discriminating de nitions of mutually disjoint subconcepts of a same concept. As for the use of a shift of representation, it is used in supervised learning in order to increase accuracy (i.e. the proportion of correctly predicted concepts in a set of test examples) <ref type="bibr">8 14]</ref> or to search e ciently a reduced space of concepts <ref type="bibr">9 6]</ref>. In unsupervised learning, shift of representational bias may be used to change the point of view about the data <ref type="bibr">12 13]</ref>. For instance, Cluster/2 12] provides a user with a set of parameters about his preferences on the concepts to be created. Finally, the two-step clustering approach proposed in 1] is similar in spirit with our clustering in L 2 since it rst identi es basic clusters (as high density clusters) before building more general clusters that are unions of those basic clusters.</p><p>Perspectives: We plan to extend our current w ork to take nested attributes and textual values into account i n L 3 in order to fully deal with XML data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>L 3</head><label>3</label><figDesc>is a subset of the C-CLASSIC description logic 7]. De nition 8 (Class description in L 3 ) A L 3 class description (of size n) is a tuple fatt suf f1 1</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>2 is an instance of the L 3 class description C 1 : f RemovableDiskDrive :ftrueg, CD=DV D=ReadSpeed ? :f20x 32x 24xg, CD=DV D=Type :fCDROM CDRWg, Compatibility + :fMA C PCg, StorageRemovableType :fS u p e r D i s k Z I P J A Z gg It represents the set of products that have in their description (i) necessarily the monovalued and boolean attribute RemovableDiskDrive whose value must be true, (ii) possibly the attribute CD=DV D=ReadSpeed, and if that is the case, this attribute is monovalued and its value belongs to the set f20x 32x 24xg, (iii) necessarily the attribute CD=DV D=Type, w h i c h i s m o n o valued and takes its value in the set fCDROM CDRWg, (iv) necessarily the attribute Compatibility, which c a n b e m ultivalued and takes its value(s) in the set fMA C PCg, (v) necessarily the attribute StorageRemovableT ype, which i s m o n o valued and takes its value in the set fSuperDisk ZIP JAZg. The following propositions state the main properties of L 3 . Their proofs follow from results in tractable description logics where structural subsumption is complete. Proposition 2 (Characterization of subsumption in L 3 ) Let C 1 and C 2 be t w o L 3 class descriptions. C 1 L3 C 2</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>1 :</head><label>1</label><figDesc>(* Initialization step gathering the biggest unions of classes having atleast one attribute in common:*) 2: A 1 A 0 , level<ref type="bibr" target="#b0">(1)</ref> : : : l e v e l (j C j ) 3: for every a 2 A 1 do 4: let classes(a) = fc a * Generation of new nodes supported by k + 1-itemsets : *) 10: repeat 11: for every pair (p q) 2 A k do 12:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example of a lattice constructed by L 2 -Cluster of basic classes returned b y L 2 -Cluster: shortdesc(n) = lcs L2 (abst L2 (i 1 ) : : : a b s t L2 (i k ))</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>maximal in the following sense: there i s n o m 2 H such that classes(n) classes(m) and shortdesc(n) = shortdesc(m), o r shortdesc(n) shortdesc(m) and classes(n) = classes(m).The worst time complexity of L 2 -Cluster is exponential in the maximal size of the basic classes L 2 descriptions.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>2 :</head><label>2</label><figDesc>LRes-Cl L-Cl 3: Nodes f Sonng 4: for eve r y n o d e n 2 Nodes do 5: for every class c 2 classes(F a t n ) n classes(nnode p to Nodes such that classes(p) = classes and desc3(p) = desc3 18: LRes-Cl LRes-Cl fclasses(p)g 19: if Change = true then 20: L-Cl L-Cl classes(p) 21: Suppress n from Nodes Algorithm 2: L 3 -Cluster Gathering c5 with c1 i s n o w considered: desc3=fatt + 1 :fv1 v 3 v 5g a t t 2 :fv2 v 4g att 3 :fv6 v 7 v 8gg, classes = fc1 c 5g Since desc3(c3) is subsumed by desc3, c3 i s a d d e d t o classes, which is updated to fc1 c 3 c 5g. The node corresponding to c1 c5 is not added since it is not closed, the node corrresponding to its closure c1 c3 c5 i s n o t added either because fc1 c 3 c 5g is already in LRes-Cl. The following proposition summarizes the main properties of the algorithm L 3 -Cluster. Proposition 6 (Properties of L 3 -Cluster) Let C 1 be the set of basic classes of the father node. Let C 2 (C 2 C 1 ) b e the set of basic classes of the son node. Let H 3 be the lattice r eturned b y L 3 -Cluster. For each node n 2 H 3 , let desc3(n) and classes(n) be r espectively the description and the set of basic classes returned b y L 3 -Cluster: desc3(n) = lcs L3 (abst L3 (i 1 ) : : : a b s t L3 (i k ))</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Quantitative results of L 2 -Cluster Figure 3 illustrates the simplicity o f t h e L 2 descriptions and the signi cance of the nodes.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>1::n], att i 2 A and V i V . The description of an instance is a term of L 1 . F or example, we can nd a product in the C/Net catalog,</figDesc><table><row><cell>whose L 1 description is:</cell></row><row><cell>fRemovableDiskDrive, CD=DV D=T ype=fCDRWg, StorageRemovableT ype=fSuperDiskg, Compatibility=fMA C PCgg</cell></row><row><cell>In the following, we will consider that the type c of a L 1 description is a boolean attribute.</cell></row><row><cell>2.3 The language of classes L 2 De nition 6 (Class description in L 2 ) A L 2 class description (of size n) is a tuple of attributes fatt 1 : : : a t t n g, where 8i 2 1::n], att i 2 A .</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Automatic subspace clustering of high dimensional data for data mining applications</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gehrke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gunopulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIG-MOD Record</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="94" to="105" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Fast algorithms for mining association rules</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">VLDB-94, S a n tiago</title>
				<meeting><address><addrLine>, Chile</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Computing Least Common Subsumers in Description Logics with Existencial Restrictions</title>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Usters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Molitor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI-99, S t o c kholm</title>
				<meeting><address><addrLine>, Sweden</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Conceptual clustering in a rst order logic representation</title>
		<author>
			<persName><forename type="first">G</forename><surname>Bisson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECAI-92</title>
				<meeting><address><addrLine>Vienne, Austria</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Top-down induction of clustering trees</title>
		<author>
			<persName><forename type="first">H</forename><surname>Blockeel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">De</forename><surname>Raedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ramon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICML-98</title>
				<imprint>
			<publisher>Morgan Kaufmann</publisher>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Tabata: a learning algorithm performing a bidirectional search in a reduced search space using a tabu strategy</title>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Soldano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">I n ECAI-98</title>
				<meeting><address><addrLine>Brighton, Angleterre</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Learning the CLAS-SIC description logic: Theoretical and experimental results</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hirsh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">KR</title>
		<imprint>
			<biblScope unit="volume">94</biblScope>
			<biblScope unit="issue">1 9 9 4</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Learning through progressive re nement</title>
		<author>
			<persName><forename type="first">W</forename><surname>Van De Velde</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the third European Working Session on Learning (EWSL&apos;98)</title>
				<editor>
			<persName><surname>Pitman</surname></persName>
		</editor>
		<meeting>the third European Working Session on Learning (EWSL&apos;98)<address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Tdis: an algebraic formalization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ganascia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI-93</title>
				<meeting><address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A polynomial approach t o the constructive induction of structural knowledge</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">U</forename><surname>Kietz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Morik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="193" to="217" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Extending k-means clustering to rst-order representations</title>
		<author>
			<persName><forename type="first">Mathias</forename><surname>Kirsten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stephan</forename><surname>Wrobel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 10th International Conference on Inductive Logic Programming</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Cussens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Frish</surname></persName>
		</editor>
		<meeting>of the 10th International Conference on Inductive Logic Programming</meeting>
		<imprint>
			<publisher>Springer Verlag</publisher>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Learning from observation: Conceptual clustering</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Michalski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Stepp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine learning : An arti cial intelligence a p p r oach</title>
				<editor>
			<persName><forename type="first">Morgan</forename><surname>Kaufmann</surname></persName>
		</editor>
		<imprint>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Local Scaling in Conceptual data Systems</title>
		<author>
			<persName><forename type="first">G</forename><surname>Stumme</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">ICCS, LNAI</title>
		<imprint>
			<biblScope unit="volume">1115</biblScope>
			<date type="published" when="1996">1996</date>
			<publisher>Springer Verlag</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Shift of bias for inductive concept learning</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">E</forename><surname>Utgo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine learning : An arti cial intelligence approach</title>
				<editor>
			<persName><forename type="first">Morgan</forename><surname>Kaufmann</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="1986">1986</date>
			<biblScope unit="volume">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Restructuring lattice theory</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wille</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Symposium on Ordered S e t s</title>
				<meeting><address><addrLine>Boston</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1982">1982</date>
		</imprint>
		<respStmt>
			<orgName>, U n i v ersity of Calgary</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
