<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Homogeneity and Stability in Conceptual Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paula Brito</string-name>
          <email>mpbrito@fep.up.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geraldine Polaillon</string-name>
          <email>geraldine.polaillon@supelec.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculdade de Economia &amp; LIAAD/INESC-Porto L.A., Universidade do Porto Rua Dr. Roberto Frias</institution>
          ,
          <addr-line>4200-464 Porto</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SUPELEC Science des Systemes (E3S) - Departement Informatique Plateau de Moulon</institution>
          ,
          <addr-line>3 rue Joliot Curie, 91192 Gif-sur-Yvette cedex</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work comes within the eld of data analysis using Galois lattices. We consider ordinal, numerical single or interval data as well as data that consist on frequency/probability distributions on a nite set of categories. Data are represented and dealt with on a common framework, by de ning a generalization operator that determines intents by intervals. In the case of distribution data, the obtained concepts are more homogeneous and more easily interpretable than those obtained by using the maximum and minimum operators previously proposed. The number of obtained concepts being often rather large, and to limit the in uence of atypical elements, we propose to identify stable concepts using interval distances in a cross validation-like approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This work concerns multivariate data analysis using Galois concept lattices. Let
E = f!1; : : : ; !ng be the set of elements to be analyzed, described by p variables
Y1; : : : ; Yp. In this paper we consider the speci c case where the variables Yj are
numerical (real or interval-valued), ordinal and modal. Modal variables allow
associating with each element of E a probability/frequency distribution on an
underlying nite set of categories (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]).
      </p>
      <p>
        The use of Galois lattices in Data Analysis was rst introduced by Barbut
and Monjardet, in the seventies of last century [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and then further developed
and largely spread out by the work of R. Wille and B. Ganter (see, e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). Let
(A; 1) and (B; 2) be two ordered sets. A Galois connection is a pair (f; g),
where f is a mapping f : A ! B, g is a mapping g : B ! A, such that f and g are
antitone, and both h = gof and h0 = f og are extensive; h and h0 are then closure
operators. The mapping f de nes the intent of a set S E, and the mapping g
that allows obtaining the extent in E associated with a set of attributes T O,
where O is the set of the considered (binary) attributes. The couple (f; g) then
constitutes a Galois connection between (P (E); ) and (P (O); ). A concept is
de ned as a couple (S; T ) where S E; T O; S = g(T ) and T = f (S), i.e., we
have h(S) = S; S is the extent of the concept and T its intent. This approach
has been applied to non-binary variables, but in this case data are generally
submitted to a previous \binarization", by performing a binary coding of the
data array; for numerical or ordinal variables Y , attributes of the form \Y
for each observed value x, are considered.
x,"
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] this approach has been extended by de ning directly the intent of a
set of elements; which has allowed obtaining, for each variable type (classical or
otherwise) appropriate couples of mappings (f; g) forming a Galois connection.
This has the advantage of allowing analyzing the data directly as it is presented,
without imposing any sort of binary pre-coding, which may, and generally does,
drastically increase the size of the data array to be analyzed. Galois lattices where
intents are obtained by union and by intersection are obtained. This approach has
been further extended to modal variables (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). The case of ordinal variables
has been dealt with in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], using an approach similar to that of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for modal
variables.
      </p>
      <p>
        Ganter and Kuznetsov [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a general construction, called pattern
structures, which allows for arbitrary descriptions with a semilattice operation
on them; since union and intersection of intervals de ne semilattices, they make
respective pattern structures. An application on gene expression data is
presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Here, we consider a common framework for numerical (real or interval-valued),
ordinal and modal variables, by de ning a generalization operator that
determines the intent in the form of vectors of intervals. For ordinal and modal (i.e.,
distribution-valued) variables the obtained concepts are more homogeneous and
therefore easier to interpret than those obtained by applying the minimum and
maximum operators, as previously proposed. In the next sections, we detail how
generalization of a set of elements is performed for each variable type.</p>
      <p>
        The number of obtained concepts being often rather large, we propose to
identify stable concepts (see also [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), using distances designed for
interval data. The criteria is that the intent of a concept should not be too di erent
from those obtained by sequentially removing one element of the extent at a time
- which would reveal that this particular element is provoking a drastic change
in the concepts' intent. Should it occur, the concept would be considered to be
non-stable.
      </p>
      <p>
        In the case of multi-valued data, other approaches of lattice reduction,
directly applied to the concept lattice, have been proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These
two approaches rely on the same idea of merging together similar attribute values
(in respect to a given threshold), and thereby reducing the number of concepts.
      </p>
      <p>The remainder of the paper is organized as follows. Section 2 describes the
generalization procedure for real and interval-valued variables, which is extended
in Section 3 to modal variables. In Section 4 a common generalization approach
by vectors of intervals is presented. In Section 5 the problem of concept stability
is considered, and a method using interval distances is proposed, which allows
addressing the question of lattice reduction. Section 6 concludes the paper,
opening paths for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>Real and interval-valued variables</title>
      <p>Let E = f!1; :::; !ng be the set of n elements or objects to be analyzed, and
Y1; : : : ; Yp real or interval-valued variables such that Yj (!i) = [lij ; uij ]. We shall
consider real-valued variables as a special case of interval-valued ones; it is
therefore equivalent to write Yj (!i) = x or Yj (!i) = [x; x].</p>
      <p>
        Let A = f!1; : : : ; !hg E. Generalization by union is de ned (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) by the
mapping f : P (E) ! Ip where I is the set of intervals of IR endowed with the
inclusion order, such that f (A) = (I1; : : : ; Ip), with Ij = [M in flij g ; M ax fuij g],
!i 2 A, j = 1; : : : ; p, i.e., for each j = 1; : : : ; p; Ij is the minimum interval (for
the inclusion order) that covers all values taken by the elements of A for variable
Yj . Let g : Ip ! P (E) be the mapping de ned as g((I1; : : : ; Ip)) =
= f!i 2 E : Yj (!i) Ij ; j = 1; : : : ; pg, i.e., the set of elements of E taking values
within Ij ; for j = 1; : : : ; p. The couple (f; g) is a Galois connection.
      </p>
      <p>Likewise, we may generalise by intersection de ning f and g as follows:
f : P (E) ! Ip, f (A) = (I1; : : : ; Ip), with Ij = [M ax flij g ; M in fuij g] if
M ax flij g M in fuij g ; !i 2 A, Ij = otherwise (i.e., the largest interval
contained in all intervals taken by the elements of A for variable Yj , which
may be empty), for j = 1; : : : ; p, and g : Ip ! P (E) with g ((I1; : : : ; Ip)) =
f!i 2 E : Yj (!i) Ij ; j = 1; : : : ; pg (the set of elements of E taking
intervalvalues that contain Ij ;) for j = 1; : : : ; p. The couple (f ; g ) forms also a Galois
connection.</p>
      <sec id="sec-2-1">
        <title>Example 1:</title>
        <p>Consider three persons, Ann, Bob and Charles characterized by two variables,
age and amount of time (in minutes) necessary to go to work (which varies
from day to day, and is therefore represented by an interval-valued variable), as
presented in Table 1.</p>
        <p>Age Time
Ann 25 [15; 20]
Bob 32 [25; 30]</p>
        <p>Charles 40 [10; 20]</p>
        <p>Let A = fBob,Charlesg. Generalization by the union leads to
f (A) = ([32; 40]; [10; 30]), describing people who are between 32 and 40 years
old and take 10 to 30 minutes to go to work; in this dataset people meeting this
description are given by g(f (A)) = g(([32; 40]; [10; 30])), i.e., fBob, Charlesg =
A. Here, (fBob, Charlesg; ([32; 40]; [10; 30])) is a concept.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Modal variables</title>
      <p>
        Yj (!i) =
Two Galois connections may also be de ned for the case of modal variables (see
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Let Y1; : : : ; Yp be p modal variables, Oj = mj1; : : : ; mjkj the set of kj
possible categories of variable Yj , Mj the set of distributions de ned on Oj , for
j = 1; : : : ; p, and M = M1 : : : Mp. For variable Yj and element !i 2 E,
n
      </p>
      <p>mj1(pj!1i ); : : : ; mjkj (pj!kij )o, where pj!ki` is the probability/frequency
associated with category mj` (` = 1; : : : ; kj ) of variable Yj , and element !i. Let
A = f!1; : : : ; !hg E.</p>
      <p>To generalise by the maximum we take, for each category mj`, the maximum
of its probabilities/frequencies in A. Let f : P (E) ! M , such that f (A) =
(d1; : : : ; dp), with dj = fmj1(tj1); : : : ; mjkj (tjkj )g; where tj` = M axfpj!`i ; !i 2
Ag; ` = 1; : : : ; kj . The intent of a set A E is then to be interpreted as \objects
with at most tj` cases presenting category mj`; ` = 1; : : : ; kj ; j = 1; : : : ; p". The
couple (f; g) with g : M ! P (E) de ned as, for dj = fmj1(pj1); : : : ; mjkj (pjkj )g;
g ((d1; : : : ; dp)) = n!i 2 E : pj!`i
a Galois connection.
g((d1; : : : ; dp)) = n!i 2 E : pj!`i pj`; ` = 1; : : : ; kj ; j = 1; : : : ; po, forms a Galois
connection.</p>
      <p>Similarly, we may generalise by the minimum taking for each category the
minimum of its probabilities/frequencies. Let f : P (E) ! M , f (A) = (d1; : : : ; dp),
with dj = fmj1(vj1); : : : ; mjkj (vjkj )g; where vj` = M infpj!`i ; !i 2 Ag; ` =
1; : : : ; kj . The intent of a set A E is now interpreted as \objects with at
least vj` cases presenting category mj`; ` = 1; : : : ; kj ; j = 1; : : : ; p".</p>
      <p>The couple (f ; g ) with g : M ! P (E) such that, for dj = fmj1(pj1); : : : ; mjkj (pjkj )g;
pj`; ` = 1; : : : ; kj ; j = 1; : : : ; po forms likewise</p>
      <sec id="sec-3-1">
        <title>Example 2:</title>
        <p>Consider four groups of students for each of which a categorical mark is given,
according to the following scale: a: mark &lt; 10, b: mark between 10 and 15, c:
mark &gt; 15 as summarized in Table 2.</p>
        <p>The intent, obtained by the maximum operator, of the set formed by groups
1 and 2, is fa(0:3); b(0:6); c(0:4)g and is interpreted as \students' groups with at
most 30% of marks a, at most 60% of marks b and at most 40% of marks c".
The corresponding extent comprehends groups 1, 2, 3 and 4. If, alternatively,
we determine the intent of the same set by the minimum operator, we obtain
fa(0:2); b(0:3); c(0:2)g, to be read as \students' groups with at least 20% of marks
a, at least 30% of marks b and at least 20% of marks c", whose extent is formed
by groups 1, 2 and 5.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A common approach: generalization by intervals</title>
      <p>We now present a unique framework allowing to perform generalization for
numerical (real or interval-valued) variables, ordinal variables and modal variables,
based on generalization by intervals.</p>
      <p>For numerical (real or interval-valued) data, we are in the above mentioned
case of generalization by taking the union.</p>
      <p>For modal variables, it amounts to consider, for each category, an interval
corresponding to the range of its probability/frequency. In fact, it has often been
observed that generalization either by the maximum or by the minimum, as
de ned in Section 3, may quickly lead to over-generalization. As a consequence,
f (A), A E, is not very informative.</p>
      <p>Let MjI = fmj`(Ij`); ` = 1; : : : ; kj g; mj` 2 Oj ; Ij` [0; 1] and M I = M1I
: : : MpI . Generalization is now de ned as</p>
      <p>f I : P (E) ! M I
f I (A) = (d1; : : : ; dp)
with dj = fmj1(Ij1); : : : ; mjkj (Ijkj )g;
where Ij` =
h
M infpj!`i g; M axfpj!`i gi ; !i 2 A; ` = 1; : : : ; kj , j = 1; : : : ; p and</p>
      <p>gI : M I ! E
g((d1; : : : ; dp)) = n!i 2 E : pj!`i 2 Ij`; ` = 1; : : : ; kj ; j = 1; : : : ; p
o
The so-de ned couple of mappings (f I ; gI ) forms a new Galois connection.</p>
      <p>On the data of Example 2, generalization by intervals of groups 1 and 2
provides the intent fa [0:2; 0:3] ; b [0:3; 0:6] ; c [0:2; 0:4]g, to be read as \students'
groups having between 20% and 30% cases of mark a, between 30% and 60%
cases of mark b and between 20% and 40% cases of mark c" and whose extent
now only contains groups 1 and 2.</p>
      <p>
        The case of ordinal variables has been addressed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], performing
generalization either using the maximum or the minimum. To allow for more exibility,
the author proposes to choose the operator individually for each variable.
Nevertheless, one of these generalization operators must be chosen in each case, and
over-generalization is not prevented. Our proposal for this type of variables, is
to generalise a set A E considering, no longer a minimum or a maximum, but
rather an interval of ordinal values.
      </p>
      <p>Example 3:
Consider the classi cations given by four cinema critics while evaluating three
movies, Movie 1, Movie 2 and Movie 3 as given in Table 3.</p>
      <p>The intent obtained by using the maximum operator of the group formed by
critics 1 and 2 is (5; 5; 4), to be interpreted as \critics giving at most mark 5 to
Movie 1, at most mark 5 to Movie 2 and at most mark 4 to Movie 3" - which
is obviously too general and would cover almost everyone; in this dataset the
corresponding extent contains critics 1, 2, 3 and 4. Therefore, the class formed
by critics 1 and 2, who present a similar behavior, does not correspond to a
concept. The intent obtained by using the minimum operator of the group formed
by critics 3 and 4 is (1; 1; 1), to be read \critics giving at least mark 1 to Movie 1,
at least mark 1 to Movie 2 and at least mark 1 to Movie 3" - which would cover
every critic; its extent in this dataset consists again of critics 1, 2, 3 and 4. Here
again, the class formed by critics 3 and 4, who give quite similar marks, does not
correspond to a concept. If we now perform generalization by interval-vectors
of the group formed by critics 1 and 2, we obtain the intent ([5; 5] ; [4; 5] ; [4; 4]);
likewise for the group formed by critics 3 and 4, we have ([1; 2] ; [1; 2] ; [1; 2]);
in the rst case we are clearly referring to critics giving high marks while in
the second case we describe critics giving low marks to all movies. The
corresponding extents no longer contain other critics, presenting a rather di erent
pro le from those considered each time. Furthermore, both (fCritic 1, Critic
2g; ([5; 5] ; [4; 5] ; [4; 4]) and (fCritic 3, Critic 4g; ([1; 2] ; [1; 2] ; [1; 2]) are concepts.
When determining concepts, according to the minimum or the maximum
operators, e.g. in a clustering context, there is therefore a risk of forming
heterogeneous clusters, since over-generalization may lead to a too large extent. By taking
interval-vectors of observed values, the over-generalization problem is avoided.
To conclude this section, we now present a more general example, with variables
of the di erent considered types.</p>
      <sec id="sec-4-1">
        <title>Example 4:</title>
        <p>Consider the data in Table 4, where 4 persons are described by their age, a
real-valued variable, time (in minutes) they take to go to work, an
intervalvalued variable, the means of transportation used, a modal variable, and their
classi cations given to three newspapers, A, B and C (ordinal variable).</p>
        <p>Age
25
40
32
58
Concepts are theoretically very interesting, and do provide rich information on
the values shared by subsets of elements of the set under study. However, the
number of concepts of a data array is often rather large, even for relatively low
cardinals of the sets of elements and variables. This fact makes the analysis
and interpretation of results a bit delicate. It is often to be noticed that when
analyzing the concepts generated by numerical or modal variables, groups of
concepts appear which are quite similar. This may be due to noise or minor
di erences, generally not pertinent. The idea is therefore to extract only those
concepts which are representative of these groups of similar concepts, so as to
obtain a more concise representation with signi cantly homogeneous concepts.</p>
        <p>
          Several solutions may be pointed out for this objective. We will focus on the
notion of stability, as introduced in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which evaluates the amount of
information of the intent that depends on speci c objects of the concept's extent.
Formally, the stability of a concept is de ned as the probability of keeping its
intent unchanged while deleting arbitrarily chosen objects of its extent.
        </p>
        <p>When analyzing data described by numerical (real or interval-valued), ordinal
or modal variables, and generalizing using interval-vectors (as described in the
previous sections), we shall apply a similar approach to each formed concept, but
introducing a distance measure. The objective being to retain the homogeneous
concepts, it is wished to avoid that a single element of the concepts' extent
produces an important increase in the intent's intervals' ranges.</p>
        <p>To identify the stable concepts, a threshold depending on the maximum
distance is de ned (so as no to be dependent from the variables' scales). A
concept is said to be \stable" if the distance between the intent obtained by
removing one element of the extent at a time, and its original intent, is not
above the given threshold. This is in fact a cross-validation-like approach, in
that one element of the extent is removed at a time, and the resulting intent is
compared with the original one.</p>
        <p>When data have an interval form, interval distances should be used.
Different measures are available in the literature; we will focus on three interval
distance measures: the Hausdor distance, the interval Euclidean distance and
the interval City-Block distance.</p>
        <p>Let Ii = [li; ui] and Ih = [lh; uh] be two intervals we wish to compare. The
Hausdor distance dH , the interval Euclidean distance d2 and the interval
CityBlock distance d1 between Ii and Ih are respectively
dH (Ii; Ih) = M ax ffjli
lhj ; jui</p>
        <p>uhjg
d2(Ii; Ih) = p(li
d1(Ii; Ih) = jli
lh)2 + (ui</p>
        <p>uh)2
lhj + jui
uhj :
The Hausdor distance between two sets is the maximum distance of a set
to the nearest point in the other set, i.e., two sets are close in terms of the
Hausdor distance if every point of either set is close to some point of the other
set. Interval Euclidean and City-Block distances are just the counterparts of the
corresponding distances for real values; if we embed the interval set in IR2, where
one dimension is used for the lower and the other for the upper bound of the
intervals, then these distances are just the Euclidean and City-Block distances
between the corresponding points in the two-dimensional space.</p>
        <p>Let C = (A; D) be a concept, where A = f!1; : : : ; !hg E is its extent
and D = (I1; : : : ; Ip) is its intent, D = f (A). The considered criterion is then
the distance between D et D i where D i is the intent of A without !i,
D i = f (A n f!ig); i = 1; : : : ; h, de ned by: = M axf (D; D i); !i 2 Ag,
measuring the dissimilarity between interval-vectors.</p>
        <p>Let d be the distance (according to the chosen measure) between the intervals
corresponding to variable Yj in a concept's intent. Two options may then be
foreseen, whether it is wished to consider the maximal or the average distance
on the intervals de ning the intents:
1.
2.</p>
        <p>Max(D; D i) = M axfd(Ij ; Ij i)g, j indexing the variable set Yj ; j = 1; : : : ; p
in the case of numerical and ordinal variables, and the global category set
O = O1 [ : : : [ Op in the case of p modal variables;</p>
        <p>Mean(D; D i) = M eanfd(Ij ; Ij i)g, j as in 1.</p>
        <p>A concept C = (A; D) is then considered to be stable if . This
approach allows keeping only the stable, and therefore more representative,
concepts, avoiding the e ect of outlier observations.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Illustrative application</title>
      <p>Consider again classi cations given by cinema critics evaluating three movies,
Movie 1, Movie 2 and Movie 3 where Yj (Critici) is the mark given by Critic i
to Movie j, i = 1; : : : ; 5; j = 1; 2; 3, as given in Table 5.</p>
      <p>Tables 6 and 7 list the concepts obtained when the Minimum and the
Maximum generalization operators are used, respectively.</p>
      <p>The concepts (except for the empty extent one) obtained from this data
table, using generalization by intervals, i.e., for A E; f (A) = (I1; I2; I3); with
Ij = [M in fYj (Critici)g ; M ax fYj (Critici)g], Critici 2 A; j = 1; 2; 3, are listed
in Table 8.</p>
      <p>We notice that all the concepts obtained using the Minimum or the Maximum
operator are concepts for the interval generalization, although with a di erent
meaning, given the di erent intent mapping. As discussed before, even in this
small example it may be observed that concepts obtained using the Minimum
or the Maximum operator often present a rather general intent, thus leading to
over-generalization in the concept formation. Consider, for instance, the concept
(f1g , (Movie 1 3 , Movie 2 2 , Movie 3 3)) in Table 6, it indicates that
Critic 1 gives high marks to each movie, which is not really the case, whereas
the concept (f1g , (Movie 1 2 [3; 3] , Movie 2 2 [2; 2] , Movie 3 2 [3; 3])) in Table
8 gives a much more accurate description of the concepts's extent. Also, concept
(f3g , (Movie 1 5 , Movie 2 5 , Movie 3 1)) in Table 7 describes Critic 3
as giving any marks to Movies 1 and 2, and low marks to Movie 3; using interval
generalization we learn that the marks given by Critic 3 to Movies 1 and 2 are
the highest and non other. Consider now concept (f3;4g , (Movie 1 4 , Movie
2 3 , Movie 3 1)) in Table 6: the intent reports any mark for Movie 3 (in
particular, high marks are possible); if we use interval generalization instead we
obtain the concept (f3;4g , (Movie 1 2 [4;5] , Movie 2 2 [3;5] , Movie 3 2 [1;2]
which more accurately describes the observed situation.</p>
      <p>We now compare the concepts retained as stable with each of the three
distances, using both Max and Mean, and a threshold value of 1 and 2. The
identi ed stable concepts in each case, represented by the corresponding extent,
are listed in Table 9.</p>
      <p>Distance Criterion Threshold
dH Max 1</p>
      <p>As it may be seen from Table 9, for all distances and both criteria, a
demanding threshold identi es a small number of stable concepts, therefore leading to
an important reduction in the number of retained concepts; if we use a more
liberal threshold, a larger number of concepts are retained as stable, as was to
be expected. The maximum criterion is naturally more strict than the mean,
which retains more concepts as stable, for all distances and both threshold
values. Finally, in this example, no important di erence appears between the results
obtained for the di erent distance measures.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>A common generalization procedure, for numerical, ordinal and modal variables,
which uses a representation based on interval-vectors is presented. This allows
de ning more homogeneous concepts, than generalization operators that use the
maximum and/or the minimum. The proposed approach for ordinal variables
allows addressing recommendation systems, analyzing preference data tables. It
would also be interesting to explore how the proposed generalization operator
behaves in a supervised learning context.</p>
      <p>The number of obtained concepts being often rather large, a method for
identifying stable concepts is proposed, using a cross-validation-like approach.
This allows avoiding the e ect of atypical elements in the concepts' formation.
Naturally, the value of the used threshold has an important in uence in the
rate of concept reduction. The next step will be to explore this methodology for
larger data tables, so as to have a more accurate evaluation of its e ciency in
concept reduction. Another issue interesting to investigate is the comparaison
of the list of concepts with those obtained with a subset of the given variables.
This then leads to the problem of variable selection in the context of Galois
lattices construction and analysis. As concerns applications, we are particularly
interested in analyzing real preference data, for application in recommendation
systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Assaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaytoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Messai</surname>
          </string-name>
          and
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Napoli</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>On the mining of numerical data with Formal Concept Analysis and similarity</article-title>
          .
          <source>In Proc. Societe</source>
          Francophone de Classi cation, pp.
          <fpage>121</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Barbut</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Monjardet</surname>
          </string-name>
          (
          <year>1970</year>
          ).
          <article-title>Ordre et Classi cation</article-title>
          ,
          <source>Algebre et Combinatoire</source>
          ,
          <string-name>
            <surname>Tomes</surname>
            <given-names>I et II</given-names>
          </string-name>
          . Paris: Hachette.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Brito</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Order structure of symbolic assertion objects</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>6</volume>
          (
          <issue>5</issue>
          ),
          <volume>830</volume>
          {
          <fpage>835</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Brito</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and G.
          <string-name>
            <surname>Polaillon</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Structuring probabilistic data by Galois lattices</article-title>
          .
          <source>Math. &amp; Sci. Hum. / Mathematics and Social Sciences</source>
          <volume>169</volume>
          (
          <issue>1</issue>
          ),
          <volume>77</volume>
          {
          <fpage>104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B. and S.O.</given-names>
          </string-name>
          <string-name>
            <surname>Kuznetsov</surname>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Pattern structures and their projections</article-title>
          . In: G. Stumme and H.
          <string-name>
            <surname>Delugach</surname>
          </string-name>
          (Eds.),
          <source>Proc. 9th Int. Conf. on Conceptual Structures, ICCS'01, Lecture Notes in Arti cial Intelligence</source>
          , vol.
          <volume>2120</volume>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Formal Concept Analysis</article-title>
          ,
          <source>Mathematical Foundations</source>
          . Berlin: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Duplessis</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Mining gene expression data with pattern structures in formal concept analysis</article-title>
          .
          <source>Information Sciences</source>
          , Volume
          <volume>181</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>10</given-names>
          </string-name>
          ,
          <year>1989</year>
          {
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>On stability of a formal concept</article-title>
          .
          <source>Annals of Mathematics and Arti cial Intelligence</source>
          <volume>49</volume>
          (
          <issue>1-4</issue>
          ),
          <volume>101</volume>
          {
          <fpage>115</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Noirhomme-Fraiture</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Brito</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Far beyond the classical data models: Symbolic Data Analysis</article-title>
          .
          <source>Statistical Analysis and Data Mining</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ),
          <volume>157</volume>
          {
          <fpage>170</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Pernelle</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Rousset</surname>
          </string-name>
          , and V.
          <string-name>
            <surname>Ventos</surname>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Automatic construction and re nement of a class hierarchy over multi-valued data</article-title>
          . In L. De Raedt and
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Siebes</surname>
          </string-name>
          (Eds.),
          <source>Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science</source>
          , pp.
          <volume>386</volume>
          {
          <fpage>398</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Pfaltz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Representing numeric values in concept lattices</article-title>
          . In J. Diatta,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eklund and M. Liquiere</surname>
          </string-name>
          (Eds.),
          <source>Proc. Fifth International Conference on Concept Lattices and Their Applications</source>
          , pp.
          <volume>260</volume>
          {
          <fpage>269</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Obiedkov</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Kourie</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>On succint representation of knowledge community taxonomies with Formal Concept Analysis</article-title>
          .
          <source>International Journal of Foundations of Computer Science</source>
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <volume>383</volume>
          {
          <fpage>404</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>