<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>MOD Record</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Stor./Type, (b) Removable Disk Drive Stor. Controller/Type, Removable Stor/Type, CD/DVD/Type}</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1998</year>
      </pub-date>
      <volume>27</volume>
      <issue>94</issue>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>and concise way the properties that are comGiven a language of instances, a language of classes L language of classes L. denes the expressions that are allo wed as class descripdenoted by establishes the necessary connection isaL, mon to the set of its instances. A membership relation, 2.1 Preliminaries and notations tions. A class description is intended to represent in an between a given language of instances and an associated</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>log of computer products.</p>
      <p>In many applications, it becomes crucial to help users to
logs organized as a hierarchy of classes of products. Our
experimented on real data in the setting of the GAEL
based on the use of two languages of description of classes
access to a huge amount of data by clustering them in a
small number of classes described at an appropriate level
for the automatic clustering of semistructured data. The
rst language of classes has a high po wer of abstraction
and guides the construction of a lattice of classes
coverfrom the C/Net (http://www.cnet.com) electronic
cataof abstraction. In this paper, we present an approach
to focus on. Our approach has been implemented and
experiments have been conducted on real data coming
the renemen t of a part of the lattice that the user wants
project 1 which aims at building exible electronic
cataing the whole set of the data. The second language of
classes, more expressive and more precise, is the basis for
2.4
2.3
V
denition of the mem bership relation.</p>
      <p>The connection between the language of instances L1
and the language of classes is based on the following L2
Denitio n 7 (Membership relation for ) Let i L2
description: i is an instance of C i every attribute
appearing in C also appears in i.
be an instance description in Let C be a class L1. L2
description is a boolean attribute. L1
In the following, we will consider that the type c of a
sumer and abstraction in L2.
forward, characterizes subsumption, least common
subThe following proposition, whose proof is
straightthrough dieren t suxes ( ; + ; ? ; ) whose notation is
is richer than on dieren t aspects: it makes possi- L3 L2
ables to distinguish the number of values of an attribute
tics corresponds to standard description logics
construcble to restrict the possible values of an attribute ; it
eninspired by the one used in XML for describing
docutors. In fact, as it will become clearer in the following,
is a subset of the C-CLASSIC description logic L3 [7].
ment type denitions (DTDs), and whose formal
semanfor every attribute att 2 A, let V be the union of
3
whose description is characterized as follows: L3,
Proposition 4 (Characterization of lcs in Let L3)
of attributes belonging to at least one description Ci.
: : : ; be n class descriptions. Let A be the set C1; Cn L3
: : : ; have a unique least common subsumer in C1; Cn
plexity of this step does not depend on the number
by gathering basic classes according to similarities
of their descriptions. In this step, clusters are L2
of initial data but only on the size of the descrip- L2
tions of basic classes.
2. In the second step, a lattice of clusters is constructed
unions of basic classes. The computational
comthe sets of values associSat1nefdv w2itVhi ajttatitnsufthfe: cVlias2s
descriptions V = Ci’s:</p>
      <p>Cig.
data complexity.
1. In the rst step, the data are partitioned according
obtained by computing the least common subsumer
of attributes supporting the descriptions of the L2
set of data of type c. Its description, desc(c), is L2
of the abstractions of its instances. The result of
classes of C. For each attribute a, the set classes(a)
this step is a set C of basic classes and a set A
named c. Its set of instances, denoted inst(c), is the
to their type: for each type c, we create a basic class
of basic classes having a in their description is
computed. This preliminary clustering step has a linear
Perspectives: We plan to extend our current work to
take nested attributes and textual values into account in
in order to fully deal with XML data. L3</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>