<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Biclusters of Similar Values with Triadic Concept Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mehdi Kaytoue</string-name>
          <email>kaytoue@dcc.ufmg.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei O. Kuznetsov</string-name>
          <email>kuznetsovs@yandex.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juraj Macko</string-name>
          <email>juraj.macko@upol.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wagner Meira Jr.</string-name>
          <email>meira@dcc.ufmg.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <email>napoli@loria.fr</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Palacky University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidade Fereral de Minas Gerais</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Biclustering numerical data became a popular data-mining task in the beginning of 2000's, especially for analysing gene expression data. A bicluster re ects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a well-known intractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept analysis. More speci cally, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Interestingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modi cations) after a preprocessing step for extracting maximal biclusters of similar values.</p>
      </abstract>
      <kwd-group>
        <kwd>Triadic concept analysis</kwd>
        <kwd>numerical biclustering</kwd>
        <kwd>scaling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Numerical data biclustering mainly appeared in the beginning of 2000's as a
rst answer to new challenges raised by biological data analysis, and especially
gene expression data analysis [13]. Starting from an object/attribute numerical
data-table (e.g. Table 1), the goal is to group together some objects with some
attributes according to the values taken by these attributes for these objects [13].
Accordingly, a bicluster is formally de ned as a pair composed of a set of
objects and a set of attributes. Such pair can be represented as a rectangle in the
numerical table, modulo lines and columns permutations. Table 1 is a numerical
dataset with objects in lines and attributes in columns, while each table entry
corresponds to the value taken by the attribute in column for the object in line.
Table 2 illustrates bicluster (fg1; g2; g3g; fm1; m2; m3g) as a grey rectangle.</p>
      <p>
        There are several types of biclusters in the literature (see [13] for a survey),
depending on the relation between the values taken by their attributes for their
objects. The most simple case can be understood as rectangles of equal
values: a bicluster corresponds to a set of objects whose values taken by a same
set of attributes are exactly the same, e.g. (fg1; g2; g3g; fm5g). Constant
biclusters only appear in idyllic situations: generally numerical data are noisy.
Accordingly, a straightforward generalization of such biclusters lies in so called
biclusters of similar values: they are represented by rectangles with almost
identical, say similar, values [
        <xref ref-type="bibr" rid="ref1 ref7">13, 1, 7</xref>
        ]. Table 2 illustrates a bicluster of similar values
(fg1; g2; g3g; fm1; m2; m3g) where two values are said to be similar if their
difference is no more than 1. Moreover, this bicluster is maximal: neither an object
nor an attribute can be added without violating the similarity condition.
      </p>
      <p>
        Only few methods address a complete, correct and non redundant
enumeration of such patterns [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ], which is a well-known intractable problem [13],
while no formal framework exists. In this paper, we show that Formal Concept
Analysis (FCA) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and especially Triadic Concept Analysis (TCA) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
provides a suitable and well de ned framework for this task: Basically, an object
has an attribute under a condition (a value). After a simple scaling procedure
(turning original data into binary), a bicluster is represented as a triadic
concept, composed of a set of objects, a set of attributes (both characterizing the
corresponding \rectangle") and a set of values. All sets are maximal thanks to
existing concept forming derivation operators of TCA. This comes with several
advantages:
{ Two values w1; w2 of the original data are said to be similar i their di erence
does not exceed a given parameter . In this case, we write w1 ' w2 ()
jw1 w2j . Otherwise, we write w1 6' w2. The trilattice produced with
TCA after scaling gives all maximal biclusters of similar values for any
ordered w.r.t. similarity of their values.
{ The well known notion of frequency takes a semantics w.r.t. similarity of
values. For example, let (A; B; C) be a triconcept, where A is a set of objects,
B a set of attributes, and C a set of similar values. Assume (A; B) to be the
corresponding bicluster. The higher jCj, the more similar are the values of
the bicluster. If all jA , Bj, and jCj are high we obtain a bicluster represented
j j
as a large rectangle of close values.
{ Existing algorithms from TCA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and n-ary closed set mining [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] can be used
directly after scaling. We also provide a new algorithm to compute biclusters
maximal only for a given (see algorithm TriMax later on).
{ Both scaling procedure and algorithm TriMax computations can be directly
distributed to several computing cores.
{ The method can be adapted to n-ary numerical datasets. For example, with
n = 3, a n-cluster would be a maximal 3D-box of similar values. It can be
applied to 3D gene expression data, monitoring the behaviour of genes in
di erent samples over time. It follows that mining n-dimensional clusters
can be achieved with n + 1-adic concept analysis.
      </p>
      <p>The paper is organized as follows. Firstly, preliminaries regarding TCA are
presented in Section 2. Then Section 3 formally states the problem. It is followed
by the description of our two methods, respectively in Section 4 and 5. The
rst shows how TCA can help characterizing all maximal biclusters for any
, while the second restricts the problem to a user-given . This is followed
by experiments on the proposed approaches. Finally, the paper ends with a
discussion and perspectives of further research.
hKi; Kj ; YAijk i, where (ai; aj ) 2 YAijk i ai; aj ; ak are related by Y for all ak 2 Ak.</p>
      <p>
        From a computational point of view, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] developed the algorithm Trias for
extracting frequent triadic concepts, i.e. whose extent, intent and modus
cardinalities are higher than user-de ned thresholds (see also [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). Cerf et al. presented
a more e cient algorithm called Data-peeler able to handle n-ary relations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
while formal de nitions lie in so called Polyadic Concept Analysis [14].
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Notations and problem settings</title>
      <p>
        A numerical dataset is realized by a many-valued context [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and we de ne
accordingly (maximal) biclusters of similar values.
      </p>
      <p>De nition 1 (Many-valued context). Let G be a set of objects, M be a set
of attributes, W be the set of attribute values and I be a ternary relation de ned
on the Cartesian product G M W . The fact (g; m; w) 2 I, also written
m(g) = w, means that \Attribute m takes the value w for the object g". The
tuple (G; M; W; I) is called many-valued context, or simply numerical dataset in
this paper.</p>
      <p>Example 1. Table 1 is a numerical dataset, or many-valued context, with objects
G = fg1; g2; g3; g4g, attributes M = fm1; m2; m3; m4; m5g, W = f0; 1; 2; 6; 7; 8; 9g
and for example m5(g2) = 6.</p>
      <p>De nition 2 (Bicluster). In a numerical dataset (G; M; W; I), a bicluster is
a tuple (A; B) with A G and B M .</p>
      <p>De nition 3 (Similarity relation and bicluster of similar values). Let
w1; w2 2 W be two attribute values and 2 N be a user-de ned parameter,
called similarity parameter. w1 and w2 are said to be similar i jw1 w2j
and we note w1 ' w2. (A; B) is bicluster of similar values if m(g) ' n(h) for
all g; h 2 A and for all m; n 2 B.</p>
      <p>De nition 4 (Maximal bicluster of similar values). A bicluster of similar
values (A; B) is maximal if adding either an object in A or an attribute in B
does not result in a bicluster of similar values.</p>
      <p>Example 2 (From Table 1). (fg1; g4g; fm2; m4g) is a bicluster. (fg1; g2g; fm2g)
is a bicluster of similar values with 1. However, it is not maximal. With
1 &lt; 5, (fg1; g2; g3g; fm1; m2; m3g) is maximal. Finally, with = 7 the
bicluster (fg1; g2; g3g; fm1; m2; m3; m4; m5g) is maximal. Note that a constant
(maximal) bicluster is a (maximal) bicluster of similar values with = 0.</p>
      <p>
        Thus the problem that we address in this paper is the extraction of all
maximal biclusters of similar values from a numerical dataset. We desire the
extraction to be complete, correct and non-redundant compared to several existing
methods of the literature based on heuristics [13]. For that matter, we
propose in the next section a rst method aiming at extracting biclusters for any
similarity parameter . This method establishes new links between biclustering
and FCA in general, and TCA in particular. Then, the present methodology is
adapted to characterize and extract biclusters that are maximal for a given
only as usually done in the literature [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7, 13</xref>
        ].
      </p>
      <p>Biclusters of similar values in Triadic Concept Analysis
Firstly, we consider the problem of generating maximal biclusters for any .
Starting from a numerical dataset (G; M; W; I), the basic idea lies in building a
triadic context (G; M; T; Y ) where the two rst dimensions remain formal objects
and formal attributes, while W is scaled into a third dimension denoted by T .
This new dimension T is called the scale dimension: intuitively, it gives di erent
\spaces of values" that each object-attribute pair (g; m) 2 G M can take. Once
the scale is given, a triadic context is derived from which triadic concepts are
characterized.</p>
      <p>
        We use the interordinal scaling [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to build the scale dimension. It allows to
encode in 2T all possible intervals of values in W . This scale allows to derive a
triadic context from which any bicluster of similar values can be characterized
as a triadic concept. We made more precise these statements and illustrate the
whole procedure with examples.
      </p>
      <p>De nition 5 (Interordinal Scaling). A scale is a binary relation J W T
associating original elements from the set of values W to their derived
elements in T . In the case of interordinal scaling, T = f[min(W ); w]; 8w 2 W g [
f[w; max(W )]; 8w 2 W g. Then (w; t) 2 J i w 2 t.</p>
      <p>Example 3. Table 3 gives the tabular representation of the interordinal scale for
Table 1. Intuitively, each line describes a single value, while dyadic concepts
represent all possible intervals over W . An example of dyadic concept in this
table is given by (f6; 7; 8g; ft6; t7; t8; t9; t10g), rewritten as (f6; 7; 8g; f[6; 8]g) since
ft6; t7; t8; t9; t10g represents the interval [0; 8] \ [0; 9] \ [1; 9] \ [2; 9] \ [6; 9] = [6; 8].
];0 ];1 ];2 ];6 ];7 ];8 ];9 ];9 ];9 ];9 ];9 ];9 ];9
[0 [0 [0 [0 [0 [0 [0 [1 [2 [6 [7 [8 [9
Example 4. The object-attribute pair (g1; m1) taking value m1(g1) = 1 is scaled
into triples (g1; m1; t) 2 Y where t takes any interval in f[0; 1]; [0; 2]; [0; 6]; [0; 7],
[0; 8]; [0; 9]; [1; 9]g. The intersection of intervals in this set is the original value
itself, i.e. m1(g1) = 1, a basic property of interordinal scaling. As a result, Table 4
illustrates the whole scaled triadic context derived from the numerical dataset
given in Table 1 using interordinal scale. The very rst cross ( ) in this table
(upper left) represents the tuple (g2; m4; t1), meaning that m4(g2) 2 [0; 0].</p>
      <p>We present now our rst main result: there is a one-to-one correspondence
between (i) the set of maximal biclusters of similar values in a given numerical
dataset for any similarity parameter and (ii) the set of all triadic concepts in
the triadic context derived with interordinal scaling.</p>
      <p>Proposition 1. Tuple hA; B; U i, where A G, B G and U
concept i (A; B) is a maximal bicluster of similar values for some
T is triadic
0.</p>
      <p>
        Proof. We leave the proof in the Appendix of the paper since we need to
introduce notations and propositions not necessary in the rest of the paper.
Example 5. For example, (fg1; g2; g3g; fm1; m2; m3g; ft3; t4; t5; t6; t7; t8g) is a
triadic concept from the context depicted in Table 4. It corresponds to the maximal
bicluster (fg1; g2; g3g; fm1; m2; m3g) with = 1. = 1 since ft3; t4; t5; t6; t7; t8g
is maximal (it is a modus), it corresponds to interval [1; 2] and naturally 2 1 = 1
is the length of this interval.
scaling, the more intervals in U , the smaller their interval intersection. Mining
so called top-k frequent triadic concepts can accordingly be achieved with the
existing algorithm Data-Peeler [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        On another hand, extracting maximal biclusters for all may be neither
e cient nor e ective with large numerical data: their number tends to be very
large and all biclusters are not relevant for a given analysis. Furthermore, both
size and density of contexts derived with interordinal scaling are known to be
problematic w.r.t algorithmic scalability, see e.g. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In existing methods of the
literature, is set a priori. We show now how to handle this case with slight
modi cations, our second main result.
5
      </p>
      <p>
        Extracting biclusters of similar values for a given
In this section we consider the problem of extracting maximal biclusters of
similar values in TCA for a given only. It comes with slight modi cations of the
methodology presented in last section. Intuitively, consider the previous scaling
applied on a numerical dataset (G; M; W; I). It scales W into dimension T and
subsets of T characterize all intervals of values over W . To get maximal biclusters
for a given only, we should not consider all possible intervals in W , but rather
all intervals (i) having a range size that is less or equal than to avoid biclusters
with non similar values, and (ii) having a range size the closest as possible to
to avoid non-maximal biclusters. For example, if we set = 2, it is probably
not interesting to consider interval [0; 8] in the scale dimension since 8 0 &gt; .
Similarly, considering the interval [6; 6] may not be interesting as well, since a
bicluster with all its values equal to 6 may not be maximal. As introduced in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
those maximal intervals of similar values used for the scale are called blocks of
tolerance over the set of numbers W with respect to the tolerance relation ' .
      </p>
      <p>Therefore we rstly recall basics on tolerance relations over a set of numbers.
It allows us to de ne a simpler scaling procedure. The resulting triadic context
is then mined with a new TCA algorithm called TriMax to extract maximal
biclusters of similar values for a given .</p>
      <p>Blocks of tolerance over W are de ned as maximal sets of pairwise similar
values from W :
De nition 7 (Tolerance blocks from a set of numbers). The similarity
relation ' is called a tolerance relation, i.e. re exive, symmetric but not
transitive. Given a set W of values, a subset V W , and a tolerance relation '
over W , V is a block of tolerance if:
(i) 8w1; w2 2 V; w1 ' w2 (pairwise similarity)
(ii) 8w1 62 V; 9w2 2 V; w1 6' w2 (maximality).</p>
      <p>From Table 1 we have W = f0; 1; 2; 6; 7; 8; 9g. With = 2, one has 0 '2 2 but
2 6'2 6. Accordingly, one obtains 3 blocks of tolerance, namely the sets f0; 1; 2g,
f6; 7; 8g and f7; 8; 9g. These three sets can be renamed as the convex hull of their
elements on N: respectively, [0; 2], [6; 8] and [7; 9]: any number lying between the
minimal and the maximal elements (w.r.t. natural number ordering) of a block
of tolerance is naturally similar to any other element of the block.</p>
      <p>To derive a triadic context from a numerical dataset, we simply use tolerance
blocks over W to de ne the scale dimension.</p>
      <p>De nition 8 (Trimax scale relation). The scale relation is a binary relation
J W C, where C is the set of blocks of tolerance over W renamed as their
convex hulls. Then, (w; c) 2 J i w 2 c.</p>
      <p>Example 6. From Table 1 we have: C = f[0; 1]; [1; 2]; [6; 7]; [7; 8]; [8; 9]g with
1, and C = f[0; 2]; [6; 8]; [7; 9]g with = 2.
=</p>
      <p>Then, we can apply the same context derivation as in previous section: scaling
is still based on intervals, but this time it uses tolerance blocks.</p>
      <p>De nition 9 (TriMax triadic scaled context). Let Y G M C be a
ternary relation. Then (g; m; c) 2 Y i (m(g); c) 2 J , or simply m(g) 2 c, where
J is the scale relation. (G; M; C; Y ) is called the TriMax triadic scaled context.
Example 7. Table 5 is the Trimax triadic scaled concept derived from the
numerical dataset lying in Table 1 with = 1.</p>
      <p>label 1 label 2 label 3 label 4 label 5
[0; 1] [1; 2] [6; 7] [7; 8] [8; 9]
m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5</p>
      <p>De nition 10 (Dyadic context associated with a block of tolerance).
Consider a block of tolerance c 2 C. The dyadic context associated with this block
is given by (G; M; Z) where z 2 Z denotes all (g; m) 2 G M such as m(g) 2 c.
Example 8. In Table 5, each such dyadic context is labelled by its corresponding
block of tolerance.</p>
      <p>Now, remark that blocks of tolerance over W are totally ordered: let [v1; v2]
and [w1; w2] be two blocks of tolerance, one has [v1; v2] &lt; [w1; w2] i v1 &lt; w1.
Hence, associated dyadic contexts are also totally ordered and we use a
corresponding indexing set to label them. In Table 5, contexts for blocks h[0; 1]; [1; 2],
[6; 7], [7; 8]; [8; 9]i are respectively labelled h1; 2; 3; 4; 5i.</p>
      <p>We now present our second main results: The scaled triadic context supports
the extraction of maximal biclusters of similar values for a given . In this case
however, existing algorithms of TCA cannot be applied directly. For example, in
Table 5, the triconcept (fg3g; fm4g; f3; 4g) corresponds to a bicluster of similar
values which is not maximal. Hence we present hereafter a new TCA algorithm
for this task, called TriMax.</p>
      <p>The basic idea of TriMax relies on the following facts. Firstly, since each
dyadic context corresponds to a block of tolerance, we do not need to compute
intersections of contexts, such as classically done in TCA. Hence each dyadic
context is processed separately. Secondly, a dyadic concept of a dyadic context
necessarily represents a bicluster of similar values, but we cannot be sure it is
maximal (see previous example). Hence, we need to check if a concept is still
a concept in other dyadic contexts, corresponding to other classes of tolerance.
This is made precise with the following proposition.</p>
      <p>Proposition 2. Let (A; B; U ) be a triadic concept from Trimax triadic scaled
context (G; M; C; Y ), such that U is the outer closure of a singleton fcg C.
If jU j = 1, (A; B) is a maximal bicluster of similar values. Otherwise, (A; B)
is a maximal bicluster of similar values i @y 2 [min(U ); max(U )], y &lt; c s.t.
(A; B) 6= y0 ( y((A; B))), where y0 (:) and y(:) correspond to inner derivation
operators associated with yth dyadic context.</p>
      <p>Proof. When jU j = 1, (A; B) is a dyadic concept only in one dyadic context
corresponding to a block of tolerance. By properties of tolerance blocks, (A; B)
is a maximal bicluster. If jU j 6= 1, (A; B) is a dyadic concept in jU j dyadic
contexts. Since the tolerance block set is totally ordered, it directly implies that
modus U is an interval [min(U ); max(U )]. Hence, if 9y 2 [min(U ); max(U )] s.t.
(A; B) = y0 ( y((A; B))) this means that (A; B) is not a maximal bicluster of
similar values.</p>
      <p>Description of the TriMax algorithm. TriMax starts with scaling
initial numerical data into several dyadic contexts, each one standing for a block of
tolerance over W with given . The set of all dyadic contexts forms accordingly
a triadic context. Then, each dyadic context is mined with any FCA algorithm
(or closed itemset mining algorithm), and all formal concepts are extracted. For
a given concept (A; B), we compute outer derivation 0 ((A; B)), i.e. to obtain
the set of dyadic contexts labels in which the current dyadic concept holds. If it
results in a singleton, this means that (A; B) is a concept for the current block
of tolerance only, i.e. it is a maximal bicluster of similar values, and it has been,
or will never be, generated twice. Otherwise, (A; B) is a concept in other
contexts, and can be generated accordingly several times (as much as the number of
contexts in which it holds). Then, we only consider (A; B) if we are sure it is the
last time it is computed. Finally, we need to check if current concept represents
a maximal bicluster, i.e. there should not exist a context from the modus where
(A; B) is not a dyadic concept.</p>
      <p>Proposition 3. TriMax outputs a (i) complete, (ii) correct and (iii) non
redundant collection of all maximal biclusters of similar values for a given
numerical dataset and similarity parameter .</p>
      <p>Proof. (i) and (ii) follow directly from Proposition 2. Statement (iii) is ensured
by the second if condition of the algorithm: a dyadic concept (or equivalently
bicluster) is considered i it has been extracted in the last dyadic context in
which it holds.</p>
      <sec id="sec-2-1">
        <title>Algorithm 1: TriMax</title>
        <p>input : Numerical dataset (G; M; W; I), tolerance parameter
output: Maximal biclusters of similar values
Let C = f[ai; bi]g be the totally ordered set of all blocks over W for given .
Indices i form an indexing set.
forall the [ai; bi] 2 C do</p>
        <p>Build context (G; M; Zi) such that (g; m) 2 Zi , m(g) 2 [ai; bi]
forall the (G; M; Zi) do</p>
        <p>Use any FCA algorithm to extract all its concepts (A; B)
forall the dyadic concepts (A; B) in the current context (G; M; Zi) do
if j 0 ((A; B))j = 1 then</p>
        <p>print (A; B)
else if max( 0 ((A; B)) = i then
x min( 0 ((A; B))</p>
        <p>0
if @y 2 [x; i[ s.t. (A; B) 6= y( y((A; B))) then</p>
        <p>print (A; B)
6</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Computer experiments</title>
      <p>In this section, we experiment with the algorithm TriMax and highlight various
aspects of its practical complexity.</p>
      <p>
        Data. We explore a gene expression dataset of the species Laccaria bicolor
available at NCBI5. More details on this dataset can be found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This gene
expression dataset monitors the behaviour of 11; 930 genes in 12 biological situations,
re ecting various stages of Laccaria bicolor biological cycle. Attribute values in
W vary between 0 and 60; 000.
      </p>
      <p>TriMax implementation. TriMax is written in C++. It uses the boost
library 1.42 for data structures and the implementation of InClose from its
authors6 for dyadic concepts extraction. At each iteration of the main loop,
i.e. each tolerance block, the current scaled dyadic context is produced: We do
not generated the whole triadic context which cannot t into memory for large
databases. It turns out that the modus computation for a given dyadic concept
requires to compute scaling \on the y", i.e. when computing the set of dyadic
contexts in which a current concept holds. The experiments were carried out on
an Intel CPU 2.54 Ghz machine with 8 GB RAM running under Ubuntu 11.04.
Experiment settings. The goal of the present experiments is not to give a
qualitative evaluation of the present approach (say biological interpretation),
but rather a quantitative evaluation. Indeed, the present work aims at showing
5 http://www.ncbi.nlm.nih.gov/geo/ as series GSE9784
6 http://sourceforge.net/projects/inclose/
(i) Numbers of patterns (Y-axis)
w.r.t. (X-axis) and jGj (Z-axis)
(ii) Execution times in seconds (Y-axis)
w.r.t. (X-axis) and jGj (Z-axis)
(iii) Numbers of blocks of tolerance (Y-axis) (iv) Density of triadic contexts (Y-axis)
w.r.t. (X-axis) and jGj (Z-axis) w.r.t. (X-axis) and jGj (Z-axis)
(v) Comparing the number of generated dyadic
concepts w.r.t. the actual number of maximal
biclusters varying with jGj = 500
(vi) Repartition of execution time</p>
      <p>
        w.r.t main steps of TriMax
with = 33; 000 and jGj = 500
Fig. 1: Monitoring with di erent settings (i) the number of maximal biclusters,
(ii) the execution times of TriMax, (iii) the number of tolerance blocks, (iv)
the derived triadic context density, (v) the number of non-maximal biclusters
generated as dyadic-concepts w.r.t. the number of maximal biclusters, and (vi)
repartition of execution time in the TriMax algorithm.
how an existing type of biclusters can be mined with Triadic Concept Analysis.
For a qualitative evaluation, the reader may refer for example to [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ].
      </p>
      <p>
        Accordingly, we designed the following experiments to monitor various
aspects of the TriMax algorithm. For most of the experiments, the dataset used is
composed of an increasing number of objects and all attributes. The objects are
chosen randomly once and for all so that the di erent experiment results can be
compared. We also vary the parameter in the same way across all experiments.
Then, we monitor the following aspects, as presented in Figure 1:
i. Number of maximal biclusters of similar values
ii. Execution time (in seconds)
iii. Number of tolerance blocks
iv. Density of the triadic context, where density is de ned as d(G; M; C; Y ) =
jY j=(jGj jM j jCj). This information is important, since contexts with
high density are known to be hard to process with FCA algorithms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and
we use the InClose algorithm for dyadic contexts processing.
v. Comparison between the number of non-maximal biclusters produced by
TriMax (i.e. dyadic concepts that do not corresponds to maximal
biclusters) with the number of maximal biclusters.
vi. Execution time pro ling of the main procedures of TriMax. This is achieved
with the tool GNU GProf and gives us what parts of the algorithm are
the most time consuming.
      </p>
      <p>Experiment results. Figure 1 presents the results of our experiments with
di erent settings. In these settings, we vary the number of objects jGj and the
parameter . A rst observation arises from graph (i): the number of biclusters
is the highest when ' 30; 000. A rst explanation is that 30; 000 is the half of
the maximal value of W and almost all multiples of 100 in [0; 60; 000] belongs
to W . In graph (ii), execution time has the same behaviour as graph (i). These
results can be understood by paying attention to the next graphs (iii) and (iv).
In (iii) is monitored the number of tolerance blocks. The maximal number is
reached when = 0, i.e. jCj = jW j. When = max(W ), we have jCj = 1. Now
we observe in (iv) that the density follows a reverse behaviour: When = 0, the
density tends towards 0%; when = max(W ), then density exactly equal 1%.
Combining both graph (iii) and (iv), the worst cases happen when both density
and tolerance bloc count are high.</p>
      <p>Another observation, which explains also the execution times, arises from
graph (v). Here are compared the number of maximal biclusters and the number
of non-maximal biclusters generated as dyadic concepts. Here again, worst case
is reached when ' 30; 000. Looking at graph (vi), we learn that this is however
not the major problem. The mostly consuming procedure of TriMax is the
computation of the modus of a dyadic concept. The explanation is that we
compute modus with \on the y scaling".</p>
      <p>Therefore, the bottleneck of our algorithm reveals itself to be the modus
computation. In practical applications however, the analyst is not interested in
all biclusters of similar values. Some constraints are generally de ned, such as
a minimal (resp. maximal) number of objects (resp. attributes) in a bicluster
(A; B), or a minimal area jAj jBj, etc. Interestingly, most of those constraints
can be evaluated on a generated dyadic concept. Therefore, before computing
the modus of such concept, we can check such properties and discard the concept
if not respecting the constraints. Although not re ected in this paper, we tested
how adding minimal (resp. maximal) size constraints on a bicluster a ects both
number of biclusters and execution times. The results are very interesting: for
example with = 33; 000, jGj = 500, and minimal (resp. maximal) size for jAj
set to 10 (resp. 40), TriMax produces only 5; 332 maximal biclusters in 2:1
seconds compared to 104; 226 maximal biclusters extracted in 16:130 seconds
without any constraint.</p>
      <p>
        Finally, the most interesting aspect of TriMax is its direct distributed
computation capacity. Indeed, each iteration, i.e. for each block of tolerance, can
be achieved independently from the others. Furthermore, the core of TriMax
consisting in extracting dyadic contexts can also be distributed, see e.g. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
A deeper investigation remains to be done in this case. Note that although the
method description involves W as a set of natural numbers, TriMax can directly
handle numerical data real numbers, and has been implemented as such.
Comparison with existing methods. Two existing methods in the literature
also consider the problem of extracting all maximal biclusters of similar values
from a numerical dataset. The rst method is called Numerical Biset Miner
(NBS-Miner [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). The second method is based on interval pattern structures
(IPS [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]). Limited by space, we do not detail these methods. Both NBS-Miner
and IPS algorithms have been implemented in C++. First experiments show
that NBS-Miner is not scalable compared to IPS and TriMax. On another
hand, it seems that TriMax outperforms IPS, but a deeper investigation is
required. The main problem in IPS is to nd an e cient algorithm able to
compute tolerance blocks over a set of intervals.
7
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        We addressed the problem of biclustering numerical data with Formal Concept
Analysis. So called (maximal) biclusters of similar values can be characterized
and extracted with Triadic Concept Analysis, which turns out to be a novel
mathematical framework for this task. We properly de ned a scaling procedure
turning original numerical data into triadic contexts from which biclusters can
be extracted as triadic concepts with existing algorithms. This approach allows a
correct, complete and non-redundant extraction of all maximal biclusters, for any
similarity parameter and can be extended to n-ary numerical datasets while
their computation can be directly distributed. The interpretation of triadic
concepts is very rich: both extent and intent allow to characterize a bicluster (i.e. the
rectangle), while the modus gives the range of values of the biclusters, and for
which is the bicluster maximal. Moreover, the larger the modus, the more
similar the values within current bicluster. It follows a perspective of research, aiming
at extracting the top-k frequent tri-concepts with Data-Peeler [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which can
help to handle the problem of top-k biclusters extraction. We also adapted the
TCA machinery with algorithm TriMax to extract maximal biclusters for a
user-de ned , which is classical in the existing literature. It appears that
TriMax is a fully customizable algorithm: any concept extraction algorithm can be
used inside its core (along with several constraints on produced dyadic concepts),
while its distributed computation is direct. Among several other experiments,
it remains now to determine which are the best core algorithms for a given
parameter, the very last directly in uencing derived contexts density.
Acknowledgements. Authors would like to thank Dmitry Andreevich Morozov
for implementing the algorithms NBS-Miner and IPS. The second author was
supported by the project of the Russian Foundation for Basic Research, grant
no. 08-07-92497-NTsNIL a. Juraj Macko acknowledges support by Grant No.
202/10/0262 of the Czech Science Foundation.
      </p>
      <p>A</p>
    </sec>
    <sec id="sec-5">
      <title>Proof of the Proposition 1.</title>
      <p>Before proving this proposition, we need to introduce the following. For sake of
simplicity, we now consider W as the set of all natural numbers from a numerical
dataset that are greater or equal than the minimal value and lower or equal than
the maximal value, i.e. W = f0; 1; 2; 3; 4; 5; 6; 7; 8; 9g with the example of Table 1.
De nition 11 (Scale value and scale relation). We call scale value s =
q r where r = min(W ) and q = max(W ). The scale relation is a binary
relation J W T , where T = ft1; : : : ; t2s+1g r w q and hw; tii 2 J i
i 2 [w r + 1; w r + 1 + s].</p>
      <p>Note that J is equivalent to interordinal scale of W previously given, but
this notations are used for the proof.</p>
      <sec id="sec-5-1">
        <title>De nition 12 (E w - cluster base). We introduce E w</title>
        <p>[tw+ r+1; tw r+1+s] for given and w 2 W .</p>
        <p>T de ned as E w =
Example 9 (E w - cluster base). E12 = [t2+1 0+1; t2 0+1+9] = [t4; t12].
Proposition 4. (wb = m(g)) ' (n(h) = wc) i (hg; mi 2 Y 12
E b
YE12b ).
and hh; ni 2
[twb+
of ' .</p>
        <p>Proof. Let Eb; Ec T and wc wb. According to the de nition (g; m) 2 YE12b
i m; g; t are related by Y for all t 2 E b. Using scaling and de nition we have
[twb r+1; twb r+1+s] = Eb E b = [twb+ r+1; twb r+1+s] which is
straightforward. We just need to show that (h; n) 2 Y 12
E b holds as well. With scaling
de nition and previous de nition we get [twc r+1; twc r+1+s] = Ec E b =
r+1; twb r+1+s] holding i wc wb , which is equal to the de nition
Moreover we can easily see as a corollary that wc wb holds i Eb \Ec E b
and wc wb = holds i Eb \ Ec = E b. Now we can prove the Proposition 1
from the main text.</p>
        <p>Proposition 1. Tuple hA1; A2; U i, where A1 G, A2 M and U T is
triadic concept i (A1; A2) is a maximal bicluster of similar values for some
0. Furthermore the value of is de ned as = s jU j + 1.</p>
        <p>Proof. Let U = E b and consider dyadic context Y 12 = Y 12 for some wb. Using
U E b
dyadic closure operator 0 ( ((A1)) we get (A1; A2). From de nition of triconcept
we know that A1 B1 implies A1 = B1 (the same for A2). From de nition of
maximal bicluster of similar values we know that hA1; A2i is maximal when it
does not exists hB1; B2i s.t. B1 A1 (the same applies for A2). It is obvious
that both sets are maximal from de nition and when we have the same dyadic
context Y 12 = Y 12 . Now we need to look at dyadic context Y 12 = YE12b . In</p>
        <p>U E b U
jU j = jE bj = j[twb+ r+1; twb r+1+s]j we can easily see that jU j = s + 1,
which gives = s jU j + 1.</p>
        <p>Finally, U is maximal (as being modus of a triconcept) and E b is maximal
as well because wc wb holds i Eb \ Ec E b. All facts mentioned in this
proof leads to equality of the triconcept and maximal bicluster of similar values.
13. Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a
survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics
1(1), 24{45 (2004)
14. Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295{304 (2002)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Besson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raedt</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Mining bi-sets in numerical data</article-title>
          . In: Dzeroski, S.,
          <string-name>
            <surname>Struyf</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          . (eds.)
          <source>KDID. Lecture Notes in Computer Science</source>
          , vol.
          <volume>4747</volume>
          , pp.
          <volume>11</volume>
          {
          <fpage>23</fpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cerf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Besson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Closed patterns meet n-ary relations</article-title>
          .
          <source>TKDD</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis</source>
          . Springer (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Jaschke, R.,
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmitz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Trias - an algorithm for mining iceberg tri-lattices</article-title>
          . In: ICDM. pp.
          <volume>907</volume>
          {
          <issue>911</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>K.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tung</surname>
            ,
            <given-names>A.K.H.</given-names>
          </string-name>
          :
          <article-title>Mining frequent closed cubes in 3d datasets</article-title>
          .
          <source>In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB)</source>
          . pp.
          <volume>811</volume>
          {
          <fpage>822</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assaghir</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Embedding tolerance relations in formal concept analysis: an application in information fusion</article-title>
          .
          <source>In: CIKM</source>
          . pp.
          <volume>1689</volume>
          {
          <fpage>1692</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Biclustering numerical data in formal concept analysis</article-title>
          . In: Valtchev,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , Jaschke, R. (eds.)
          <source>ICFCA. LNCS</source>
          , vol.
          <volume>6628</volume>
          , pp.
          <volume>135</volume>
          {
          <fpage>150</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Revisiting numerical pattern mining with formal concept analysis</article-title>
          .
          <source>In: Proceedings of the 22nd International Joint Conference on Arti cial Intelligence (IJCAI)</source>
          .
          <source>IJCAI/AAAI</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duplessis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Mining gene expression data with pattern structures in formal concept analysis</article-title>
          .
          <source>Inf. Sci</source>
          .
          <volume>181</volume>
          (
          <issue>10</issue>
          ),
          <year>1989</year>
          {
          <year>2001</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Krajca</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychodil</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Distributed algorithm for computing formal concepts using map-reduce framework</article-title>
          .
          <source>In: IDA</source>
          . pp.
          <volume>333</volume>
          {
          <fpage>344</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.A.:</given-names>
          </string-name>
          <article-title>Comparing performance of algorithms for generating concept lattices</article-title>
          .
          <source>J. Exp. Theor. Artif. Intell</source>
          .
          <volume>14</volume>
          (
          <issue>2-3</issue>
          ),
          <volume>189</volume>
          {
          <fpage>216</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A triadic approach to formal concept analysis</article-title>
          .
          <source>In: ICCS. LNCS</source>
          , vol.
          <volume>954</volume>
          , pp.
          <volume>32</volume>
          {
          <fpage>43</fpage>
          . Springer (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>