-

Three Related FCA Methods for Mining Biclusters of Similar Values on Columns

Mehdi Kaytoue

mehdi.kaytoue@insa-lyon.fr 2

Victor Codocedo

Jaume Baixieres

Amedeo Napoli

0 0 LORIA (CNRS - Inria Nancy Grand Est - Universite de Lorraine) , B.P. 239, F-54506, Vand uvre-les-Nancy 1 Universitat Politecnica de Catalunya.

08032, Barcelona. Catalonia

2 Universite de Lyon. CNRS , INSA-Lyon, LIRIS. UMR5205, F-69621 , France

Biclustering numerical data tables consists in detecting particular and strong associations between both subsets of objects and attributes. Such biclusters are interesting since they model the data as local patterns. Whereas there exists several de nitions of biclusters, depending on the constraints they should respect, we focus in this paper on biclusters of similar values on columns. There are several ad hoc methods for mining such biclusters in the literature. We focus here on two aspects: genericity and e ciency. We show that Formal Concept Analysis provides a mathematical framework to characterize them in several ways, but also to compute them with existing and e cient algorithms. The proposed methods, which rely on pattern structures and triadic concept analysis, are experimented and compared on two di erent datasets.

biclustering triadic concept analysis pattern structure

Biclustering has attracted a lot of attention for many years now, as it was used in an extensive way for mining biological data [ 7 ]. Given a data-table with objects as rows and attributes as columns, the goal is to nd \sub-tables", or pairs of both subsets of objects and attributes, such that the values in the subtables respect well-de ned constraints or maximize a given measure [ 17 ].

There exist several types of biclusters depending on the relation the values should respect. For example, constant biclusters are subtables with equal values [ 12, 6, 17 ]. Biclusters with similar values on columns (BSVC) are subtables where all values are pairwise similar for each column [ 4, 17 ]. The latter can also be generalized to biclusters of similar values (BSV): any two values in the subtable are similar [ 2, 3, 12, 21 ]. Dozens of algorithms, mostly ad hoc, have been proposed for computing the di erent types of biclusters. In this paper, we are interested in possible extensions of the Formal Concept Analysis (FCA) formalism for achieving the problem of biclustering. This comes with two goals: (i) formalizing and understanding biclusters formation and structure, and (ii) reusing existing algorithms for genericity purposes.

Actually, the present paper is in continuation with the work of the authors on the use of pattern structures {an extension of FCA for mining complex data [ 8, 12 ]{ for discovering functional dependencies in a crisp and a fuzzy settings [ 1 ], and as well on the adaptation of pattern structures to a speci c biclustering task: the discovery of biclusters of type BSV [ 6, 11 ]. Moreover, the biclustering task is usually considered as a \`two-dimensional" (2D) process where biclusters are rectangles in a table verifying some prior constraints. It was one main idea of [ 11 ] to transpose the problem in a \three-dimensional" setting by using and adapting triadic concept analysis [ 16 ] to the biclustering task.

Here we follow the same line and we propose a new approach for discovering biclusters in a numerical dataset where biclusters have \similar values" w.r.t. their columns (type BSVC). This works is a new attempt to extend the capabilities of FCA and of pattern structures, in dealing with the important problem of biclustering. Actually, biclustering can be also considered in a (pure) numerical setting, where it is sometimes called coclustering [ 18 ] and where kernel or spectral methods are often used for achieving the task. Here we keep the discrete setting and more precisely an FCA-based setting.

The rest of this paper is organized as follows. In Section 2 we formally introduce the biclustering problem. Then, we recall in Section 3 the FCA basics that are necessary for developing our three methods in Section 4. We experiment with these methods and compare them by processing two real-world datasets in Section 5 before concluding. 2

Problem De nition

We introduce the problem of mining biclusters of similar values on columns, or simply biclusters when no confusion can be made. A numerical dataset is de ned as a many-valued context in which biclusters are denoted as pairs of object and attribute subsets for which a particular similarity constraint holds.

De nition 1 (Many-valued context and numerical dataset). A many

valued context consists in a quadruple (G; M; W; I) where G is a set of objects,

M a set of attributes, W a set of attribute values, and I G M W a ternary

relation. An element (g; m; w) 2 I, also written m(g) = w or g(m) = w, can be interpreted as: w is the value taken by the attribute m for the object g. The relation I is such that g(m) = w and g(m) = v implies w = v.

In the present work, W is a set of numbers and Knum = (G; M; W; I) denotes a numerical dataset, i.e. a many-valued context where W is a set of numbers. Example. A tabular representation of a numerical dataset is given in Table 1: objects G = fg1; g2; g3; g4; g5g are represented by rows while attributes M = fm1; m2, m3; m4g are represented by columns. W = f0; 1; 2; 6; 7; 8; 9g and we have for example g2(m4) = 9.

m1 m2 m3 m4 g1 1 2 2 8 g2 2 1 2 9 g3 2 1 1 2 g4 1 0 7 6 g5 6 6 6 7 Fig. 1. A numerical dataset De nition 2 (Biclusters with similar values on columns). Given a numerical dataset (G; M; W; I), a pair (A; B) (where A G; B M ) is called a bicluster of similar values on columns when the following statement holds: 8g; h 2 A; 8m 2 B; m(g) ' m(h) where ' is a similarity relation: 8w1; w2 2 W; 2 [0; max(W ) min(W )], w1 ' w2 () jw1 w2j . A bicluster (A; B) is maximal if @g 2 GnA such that (A [ fgg; B) is a bicluster, and @m 2 M nB such that (A; B [ fmg) is a bicluster.

Example. In Table 1, with = 1, we have that (A; B) = (fg1; g2g; fm1; m2; m3g) is a bicluster. Indeed, consider each attribute of B separately: the values taken by the objects A are pairwise similar. However, (A; B) is not maximal, since we have that both (A [ fg3g; B) and (A; B [ fm4g) are also biclusters. Then, (fg1; g2; g3g; fm1; m2; m3g) and (fg1; g2g; fm1; m2; m3; m4g) are both maximal. Problem (Biclustering). Given a numerical dataset (G; M; W; I) and a similarity parameter , the goal of biclustering is to extract the set of all maximal biclusters (A; B) respecting the similarity constraint.

Remark. It should be noticed that in the formal de nition, the similarity parameter is the same for all attributes. It is possible however to use a di erent parameter for each attribute without changing neither the problem de nition or its resolution. For real-world datasets, one can choose di erent similarity parameters m (8m 2 M ), but also can normalize/scale the attribute domains and use a single similarity parameter . 3

Basics on Formal Concept Analysis

In this paper, we show how our biclustering problem can be formalized and answered in FCA in di erent ways: (i) using standard FCA [ 9 ], (ii) using pattern structures [ 8 ], and (iii) using triadic concept analysis [ 16 ]. We recall below the basics of each approach.

Dyadic Concept Analysis. Let G be a set of objects, M a set of attributes and I G M be a binary relation. The fact (g; m) 2 I is interpreted as \g has attribute m". The two following derivation operators ( )0 are de ned: A0 = fm 2 M j 8g 2 A : gImg B0 = fg 2 G j 8m 2 B : gImg f or A f or B

G; M which de ne a Galois connection between the powersets of G and M . For A G, B M , a pair (A; B) such that A0 = B and B0 = A, is called a (formal) concept. Concepts are partially ordered by (A1; B1) (A2; B2) , A1 A2 (, B2 B1). With respect to this partial order, the set of all formal concepts forms a complete lattice called the concept lattice of the formal context (G; M; I). For a concept (A; B) the set A is called the extent and the set B the intent of the concept. Triadic Concept Analysis. A triadic context is given by (G; M; B; Y ) where G, M , and B are respectively called sets of objects, attributes and conditions, and Y G M B. The fact (g; m; b) 2 Y is interpreted as the statement \Object g has the attribute m under condition b". A (triadic) concept of (G; M; B; Y ) is a triple (A1; A2; A3) with A1 G, A2 M and A3 B satisfying the two following statements: (i) A1 A2 A3 Y , X1 X2 X3 Y and (ii) A1 X1, A2 X2 and A3 X3 implies A1 = X1, A2 = X2 and A3 = X3. If (G; M; B; Y ) is represented by a three dimensional table, (i) means that a concept stands for a 3-dimensional rectangle full of crosses while (ii) characterizes component-wise maximality of concepts. For a triadic concept (A1; A2; A3), A1 is called the extent, A2 the intent and A3 the modus. To derive triadic concepts, two pairs of derivation operators are de ned. The reader can refer to [ 16 ] for their de nitions which are not necessary for the understanding of the present work. Pattern Structures. Let G be a set of objects, let (D; u) be a meet-semilattice of potential object descriptions and let : G ! D be a mapping. Then (G; (D; u); ) is called a pattern structure. Elements of D are called patterns and are ordered by a subsumption relation v such that given c; d 2 D one has c v d () cud = c. Within the pattern structure (G; (D; u); ) we can de ne the following derivation operators ( ) , given A G and a description d 2 (D; u): A = l (g) g2A d = fg 2 Gjd v (g)g These operators form a Galois connection between (}(G); ) and (D; v). (Pattern) concepts of (G; (D; u); ) are pairs of the form (A; d), A G, d 2 (D; u), such that A = d and A = d . For a pattern concept (A; d), d is called a pattern intent and is the common description of all objects in A, called pattern extent. When partially ordered by (A1; d1) (A2; d2) , A1 A2 (, d2 v d1), the set of all concepts forms a complete lattice called a (pattern) concept lattice.

Computing Concepts and Concept Lattices. Processing a formal context

in order to generate its set of concepts can be achieved by various algorithms (see [ 15 ] for a survey and a comparison, see also itemset mining [ 19 ]). For processing pattern structures, such algorithms generally need minor adaptations. Basically, one needs to override the code for (i) computing the intersection of any two arbitrary descriptions, and (ii) test the ordering between two descriptions. Processing a triadic context is however not so direct and can be done with nested FCA algorithms [ 10 ] or dedicated data-mining algorithm [ 5 ]. Similarity relations in FCA. The notion of similarity can be formalized by a tolerance relation: a symmetric, re exive but not necessarily transitive relation. The similarity relation ' used for de ning biclusters of similar values is a tolerance. Given W a set of numbers, any maximal subset of pairwise similar values is called a block of tolerance.

De nition 3. A binary relation T W

(i) 8x 2 W xT x (re exivity) (ii) 8x; y 2 W xT y ! yT x (symmetry)

W is called a tolerance relation if: De nition 4. Given a set W , a subset K

W , K is a block of tolerance if: (i) 8x; y 2 K xT y (pairwise similarity) (ii) 8z 62 K; 9u 2 K :(zT u) (maximality) It is shown that tolerance blocks can be obtained from the formal context of a tolerance relation [ 14 ]. In the context (W; W; ' ), one can characterize all blocks of tolerance K (and only them) as formal concepts (K; K). 4

Mining biclusters of similar values on columns in FCA

The basic notions of FCA of the previous section allow us now to answer our biclustering problem in various ways with: (i) an original method using interval pattern structure, (ii) a recently introduced method using partition pattern structures [ 6 ], and (iii) an original method relying on triadic concept analysis. We emphasize the genericity of FCA to answer a data mining problem. 4.1

Interval Pattern Structure Approach

For a dataset Knum = (G; M; W; I), an interval pattern structure (G; (D; u); ) is de ned as follows [ 13 ]: the objects from G are described by vectors of intervals, where each dimension gives a range of values for an attribute m 2 M (following a canonical ordering of the dimensions, i.e. dimension i corresponds to attribute mi 2 M ). Then, for m 2 M , the semi-lattice of intervals (Dm; um) is given by:

Dm = f[w1; w2] j 9g; h 2 G s:t: m(g) = w1 and m(h) = w2g [a; b] um [c; d] =

[min(a; c); max(b; d)] c um d = c () c vm d [a; b] vm [c; d] () [c; d]

[a; b] The description space (D; u) of the interval pattern structure is a product of meet-semi-lattices (D; u) = m2M (Dm; um) which is a semi-lattice. Examples. In Table 1, (fg1; g2; g3g; h[1; 2]; [1; 2]; [1; 2]; [2; 9]i) is a pattern concept: (g1) = h[1; 1]; [2; 2]; [2; 2]; [8; 8]i fg1; g2; g3g = (g1) u (g2) u (g3) = h[1; 2]; [1; 2]; [1; 2]; [2; 9]i h[1; 2]; [1; 2]; [1; 2]; [8; 9]i v h[1; 2]; [1; 2]; [1; 2]; [2; 9]i fg1; g2; g3g = fg1; g2; g3g We now give the intuitive idea on how the interval pattern concept lattice can be used to characterize the biclusters. Consider rst the concept (A1; d1) = (fg1; g2g; h[1; 2]; [1; 2]; [1; 2]; [8; 9]i). Consider also a function attr : D ! M which returns for an interval pattern the set of attributes whose interval is not larger than the parameter, for d = h[ai; bi]i, i 2 [1; jM j]: attr(d) = fmi 2 M jai ' big. (A1; attr(d1)) = (fg1; g2g; fm1; m2; m3; m4g) is a maximal bicluster. Consider the interval pattern concept (A2; d2) = (fg1; g2; g3g; h[1; 2]; [1; 2]; [1; 2]; [2; 9]i): (A2; attr(d2)) = (fg1; g2; g3g; fm1; m2; m3g) is a maximal bicluster (with = 1). This means that biclusters can be characterized thanks to pattern concepts.

Proposition 1. Consider a numerical dataset (G; M; W; I) as an interval pat

tern structure (G; (D; u); ). For any maximal bicluster (A; B), there exists a pattern concept (A; d) such that (A; B) = (A; attr(d)).

Proof. To ease reading, the proof is given in an appendix. tu 4.2

Partition pattern structure approach

A partition pattern structure is a pattern structure instance where the description space is given by a semi-lattice of partitions over a set X [ 2 ]. Formally, we have (G; (D; u); ) where: D = P art(X) and d1 u d2 = S pi \ pj where pi; pj X, pi 2 d1, pj 2 d2. The semi-lattice is actually a complete lattice of set partitions in which the bottom element is not considered. In [ 1 ], we showed that the de nition of u, and equivalently v, needs a slight modi cation when D = 22K , i.e. a description d 2 D is a set of subsets of X, and they do cover X (possibly with overlapping). In that case, we have that d1 u d2 = max(S pi \ pj ) where pi; pj X, pi 2 d1, pj 2 d2 and max(:) returns the maximal sets w.r.t. inclusion.

Now we show that such a pattern structure can be constructed from a numerical dataset, and that the corresponding concepts allow to generate all maximal biclusters. From a numerical dataset (G; M; W; I), we build the structure (M; (D; u); ) where D = 22G . The description of an object4 m 2 M is given by: (m) = fp1; p2; :::g where p1; p2; :: G and: m(g1) '

m(g2); 8g1; g2 2 pi (similarity) m(gk); 8gk 2 pi (maximality) [ pi = G i (covering) In other words, each original attribute m 2 M is described by a family of subsets of G, where each one corresponds to a block of tolerance w.r.t. the values of attribute m. Let (A; d = fpig) be a partition pattern concept, it is easy to see how the pairs bici = (pi; A) are biclusters with rows g 2 pi and columns m 2 A5. While any bici = (pi; A) is a bicluster, it is not necessarily a maximal bicluster. Nevertheless, maximal biclusters can be identi ed using the concept lattice. Proposition 2. Consider a pattern concept (A; d = fpig). The bicluster bici = (pi; A) is maximal if there is no pattern concept (C; fpi; :::g) with A C. Proof. The proof to this proposition is very intuitive. Recall from Section 2 that the bicluster (pi; A) is maximal if two conditions are met, namely @g 2 Gnpi such that (pi [ fgg; A) is a bicluster and @m 2 M nA such that (pi; A [ fmg) is 4 Object in the pattern structure; attribute in the numerical dataset. 5 In order to keep consistency with the previous notation, biclusters are written inversely as partition pattern concepts. a bicluster, The rst condition holds for bici given the maximality condition of the tolerance block pi; The second follows from the proposition declaration. tu Example. The numerical dataset (G; M; W; I) given in Table 1 can be turned into a pattern structure as follows with = 1: (m1) = ffg1; g2; g3; g4gfg5gg (m3) = ffg1; g2; g3gfg4; g5gg (m2) = ffg2; g3; g4gfg1; g2; g3gfg5gg (m4) = ffg4; g5gfg1; g5gfg1; g2gfg3gg

Indeed, each component of a description is a maximal set of objects having pairwise similar values for a given attribute. The pattern concept lattice is given in Figure 2. We remark that (i) any concept corresponds to a bicluster, (ii) some of them correspond to a maximal bicluster, and most importantly, (iii) any maximal bicluster can be found as a concept. For example, from the concept (A1; d1) = (fm3; m4g; ffg1; g2g; fg4; g5g; fg3gg) we obtain the following biclusters: bic1 = (fg1; g2g; fm3; m4g) and bic2 = (fg4; g5g; fm3; m4g). Whereas bic2 is a maximal bicluster bic1 is not since we have that (A2; d2) = (fm1; m2; m3; m4g; ffg1; g2g; fg3g; fg4g; fg5gg) with (A2; d2) (A1; d1). In turn, bic3 = (fg1; g2g; fm1; m2; m3; m4g) is a maximal bicluster.

Remark. It is noticeable that an equivalent formal context can be built. By equivalent, we mean that the concept lattices produced by both structures are isomorphic. To obtain this formal context, we use a slight modi cation of the data transformation of [ 9 ] (pp. 92): (M; B2(G); I) st. (m; (g; h)) 2 I () m(g) ' m(h). The concept lattice is equivalent to the pattern concept lattice [ 2 ], and thus it can be used in the same way to get maximal biclusters. In our running example, such context is given in Table 1, and its associated concept lattice is given in Figure 2 (right), a lattice isomorphic to the one raised from the pattern structure (left). The proof can be done in a similar manner as it is done in [ 2 ].

(g1; g2) (g1; g3) (g1; g4) (g1; g5) (g2; g3) (g2; g4) (g2; g5) (g3; g4) (g3; g5) (g4; g5) m1 m2 m3 m4 We present another original result: any maximal bicluster of similar values is characterized as a triadic concept. The triadic context is derived from the numerical dataset by encoding the tolerance relation between the values.

Proposition 3. Given a numerical dataset (G; M; W; I), consider the derived

triadic context given by (M; G; G; Y ) s.t. (m; g1; g2) 2 Y () m(g1) ' m(g2).

There is a one-to-one correspondence between the set of all maximal biclusters (A; B), the set of all triadic concepts (B; A; A) of the derived context.

Proof. Consider a maximal bicluster (A; B). We have that 8g; h 2 A : m(g) ' m(h) () m 2 B, if and only if (by the de nition of Y ) (B; A; A) Y . We now take (B0; A0; A0) Y such that B B0 and A A0. Since (A; B) is a maximal bicluster, we have that for any pair of objects g; h 2 A0 and m 2 B0 such that g(m) ' h(m), implies that g; h 2 A and m 2 B. Let (B; A; A) be a triadic concept. We have that for any pair of objects g; h 2 A and m 2 B we have that g(m) ' h(m), this is, that 8g; h 2 A : g(m) ' h(m) () m 2 B, which is the alternative de nition of maximal bicluster. tu Example. Taking again = 1, the triadic context derived from the numerical dataset from Table 1 is given in Table 2. An example of triadic concept is: (fm3; m2; m1g; fg1; g3; g2g; fg1; g2; g3g) which is in turn the maximal bicluster (fg1; g3; g2g; fm3; m2; m1g). 5

Experiments

We experiment with the di erent FCA methods introduced in the previous section. We report preliminary results in two aspects: e ciency (running time) and compactness (number of concepts) to discuss the strengths and weaknesses of the di erent methods.

m1 g1 g2 g3 g4 g5 m2 g1 g2 g3 g4 g5 m3 g1 g2 g3 g4 g5 m4 g1 g2 g3 g4 g5 g1 g1 g1 g1 g2 g2 g2 g2 g3 g3 g3 g3 g4 g4 g4 g4 g5 g5 g5 g5

Data and experimental settings. The rst dataset, \Diagnosis"6, contains 120 objects with 8 attributes. The rst attribute provides temperature information of a given patient with a range [35:5; 41:5] (numerical). For this attribute we used = 0:1 and then = 0:3. The other 7 attributes are binary ( = 0). The second dataset, \dataSample 1.txt", is provided with the BiCat software7. It contains 420 objects and 70 numerical attributes with range [ 5:9; 6:7]. We used = 0:05 for all attributes. We provide results in Table 3 for the three different FCA methods discussed in this article, namely interval pattern structure (IPS), tolerance blocks/partition pattern structures (TBPS) and triadic concept analysis (TCA). We also report on the use of standard FCA using the discretization technique discussed at the end of Section 4.2 (FCA). We also discuss the computing of clari ed contexts, given that it can dramatically reduce the size of the context while keeping the same concept lattice (FCA-CL). A context is clari ed when there exists neither two objects with the same description, or two attributes shared by the same set of objects.

For the methods based on FCA and pattern structures (IPS, TBPS), we used a C++ version of the AddIntent algorithm [ 20 ]8. No restrictions were imposed over the size of the biclusters. The TCA method was implemented using DataPeeler [ 5 ]. All the experiments were performed using a Linux machine with Intel Xeon E7 running at 2.67GHz with 1TB of RAM.

Discussion. Results in Table 3 show that for the Diagnosis dataset, the clari ed context using standard FCA (FCA-CL) is the best of the ve methods w.r.t. execution time while for the BicAt sample 1, the best is TCA. Times are expressed as the sum of the time required to create the input representation of the dataset for the corresponding technique and its execution. In the case of FCA and FCA-CL, the pre-processing can be as high as the time required for applying the AddIntent algorithm. However, for large datasets such as the BicAt example, this times can be ignored. It is also worth noticing that the pre-processing depends on the chosen value, hence for each di erent con guration, a new pre-processing task has to be executed. This is not the case for interval and partition pattern structures the pre-processing of which is linear 6 http://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice 7 http://www.tik.ee.ethz.ch/sop/bicat/ 8 https://code.google.com/p/sephirot/ Technique FCA FCA-CL TCA IPS TBPS

Time [s] Preproc + Exec.

0.11 + 0.335 0.11 + 0.02 0.04 + 33.3 0.011 + 0.303 0.011 + 1.76 w.r.t. the number of objects (it is actually, just a change of format). We can also appreciate a more compact representation of the biclusters by the use of partition pattern structures (TBPS) and its formal context versions (FCA and FCA-CL). While TBPS is the slowest of the ve methods, it is also the cheapest one in terms of the use of machine resources, more speci cally RAM. TCA is the more expensive method in terms of machine resources and data representation, however this yields results faster. Interval pattern structures are in the middle as a good trade-o of compactness and execution time.

For this initial experimentation we have not reported the number of maximal biclusters nor the bicluster extraction algorithms that can be implemented for each di erent technique, but only in the FCA techniques themselves. Regarding the number of maximal biclusters, this is the same for each technique since all of them are bicluster enumeration techniques, i.e. all possible biclusters are extracted. Hence, the di erence among techniques is not given by the number of maximal biclusters extracted, but by the number of formal concepts found and their post-processing complexity to extract the maximal biclusters from them. In general, it is easy to observe from Propositions 1, 2 and 3 that the post-processing of TCA is linear w.r.t. the number of triadic concepts found, while for TPS is linear w.r.t. the number of interval pattern concepts times the number of columns of the numerical dataset squared and for TBPS is linear w.r.t. the number of super-sub concept relations in the tolerance block pattern concept lattice. Nevertheless, di erent strategies for bicluster extraction can be implemented for each technique rendering the comparison unfair. For example, in [ 6 ] an optimization is proposed regarding biclustering using partition pattern structures (which can be easily adapted to TBPS) which cuts in half its execution time by breaking the structure of the lattice. Similar strategies for IPS and TCA could also be implemented but are still a matter of research. 6

Conclusion

Biclustering is an important data analysis task that is used in several applications such as transcriptome analysis in biology and for the design of recommender systems. Biclustering methods produce a collection of local patterns that are easier to interpret than a global model. There are several types of biclusters and corresponding algorithms, ad hoc most of the time. In this paper, our main contribution shows how the biclusters of similar values on columns can be characterized or generated from formal concepts, pattern concepts and triadic concepts. Bringing back this problem of biclustering into formal concept analysis settings allows the usage of existing and e cient algorithms without any modi cations. However, and this is among the perspectives of research, several optimizations can be made. For example, with the triadic method, one should not generate both concepts (A; B; C) and (A; C; B): they are redundant since only concepts with B = C correspond to maximal biclusters. 7

Appendix: Proof of proposition 1

We introduce notations, before to recall and prove Proposition 1 that relates maximal biclusters to interval pattern concepts of a pattern structure. The intuition lies in the relation between the set of attributes M of (G; M; W; I)) in an interval pattern structure (G; (D; u); ). Let d = h[a1; b1]; [a2; b2]; : : : ; [an; bn]i 2 D be a pattern interval in an interval pattern structure (G; (D; u); ), where jM j = n. For any mi 2 M , we de ne: d(mi) = [ai; bi]. and jd(mi)j = jai bij. De nition 5. Let d be a pattern in an interval pattern structure (G; (D; u); ). The function attr : D 7! M is de ned as: attr(d) = fm 2 M j jd(m)j g. De nition 6. Let A G be a set of objects and m 2 M an attribute. We de ne: A(m) = fg(m) j g 2 Bg. For instance, in Table 1, if A = fg1; g2; g3g, then, A(m4) = f2; 8; 9g.

Proposition 4. For A

G, we have that, for all mi 2 M : A

= h[min(A(m1)); max(A(m1))]; : : : ; [min(A(mn)); max(A(mn))]i Proof. Since the operation u is associative and commutative, we have that A = l gi = h[min(A(m1)); max(A(m1))]; : : : ; [min(A(mn)); max(A(mn))]i

Proposition 5. Consider a numerical dataset (G; M; W; I) as an interval pat

tern structure (G; (D; u); ). For any maximal bicluster (A; B), we de ne: d = A . Then: 1. B = attr(d) and 2. (A; D) is a pattern concept in (G; (D; u); ).

Baixeries ,

Kaytoue , and

Napoli . Computing similarity dependencies with pattern structures . In M. Ojeda-Aciego and J. Outrata, editors, CLA , volume 1062 of CEUR Workshop Proceedings , pages 33 { 44 . CEUR-WS.org, 2013 .

Baixeries ,

Kaytoue , and

Napoli . Characterizing Functional Dependencies in Formal Concept Analysis with Pattern Structures . Annals of Mathematics and Arti cial Intelligence , pages 1 { 21 , Jan . 2014 .

Besson ,

Robardet ,

L. D.

Raedt , and

J.-F.

Boulicaut . Mining bi-sets in numerical data . In S. Dzeroski and J. Struyf, editors, KDID , volume 4747 of Lecture Notes in Computer Science, pages 11 { 23 . Springer, 2007 .

Califano , G. Stolovitzky, and

Tu . Analysis of gene expression microarrays for phenotype classi cation . In P. E. Bourne,

Gribskov ,

R. B.

Altman ,

Jensen ,

D. A.

Hope ,

Lengauer ,

J. C.

Mitchell ,

E. D.

Schee ,

Smith ,

Strande , and H. Weissig, editors, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, August 19-23 , 2000 , La Jolla / San Diego, CA, USA, pages 75 { 85 . AAAI, 2000 .

Cerf ,

Besson ,

Robardet , and

J.-F.

Boulicaut . Closed patterns meet n-ary relations . TKDD , 3 ( 1 ), 2009 .

Codocedo and

Napoli . Lattice-based biclustering using Partition Pattern Structures . In 21st European Conference on Arti cial Intelligence (ECAI) , 2014 .

A. V.

Freitas ,

Ayadi ,

Elloumi ,

Oliveira ,

Oliveira , and

J.-K.

Hao . Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, chapter Survey on Biclustering of Gene Expression Data . John Wiley & Sons, Inc., 2013 .

Ganter and

S. O.

Kuznetsov . Pattern structures and their projections . In ICCS '01: Proceedings of the 9th International Conference on Conceptual Structures , pages 129 { 142 . Vol. 2120 , Springer-Verlag, 2001 .

Ganter and

Wille . Formal Concept Analysis . Springer, 1999 .

10. R. Jaschke,

Hotho ,

Schmitz ,

Ganter , and

Stumme . Trias - an algorithm for mining iceberg tri-lattices . In ICDM , pages 907 { 911 , 2006 .

11. M. Kaytoue , S. O.

Kuznetsov , J.

Macko , and

Napoli . Biclustering meets triadic concept analysis . Annals of Mathematics and Arti cial Intelligence , 70 ( 1-2 ), 2014 .

12. M. Kaytoue , S. O. Kuznetsov , and

Napoli . Biclustering numerical data in formal concept analysis . In P. Valtchev and R. Jaschke, editors, ICFCA , volume 6628 of LNCS , pages 135 { 150 . Springer, 2011 .

13. M. Kaytoue , S. O.

Kuznetsov , A.

Napoli , and S.

Duplessis . Mining gene expression data with pattern structures in formal concept analysis . Information Science , 181 ( 10 ): 1989 { 2001 , 2011 .

14.

S. O.

Kuznetsov . Galois connections in data analysis: Contributions from the soviet era and modern russian research . In B. Ganter, G. Stumme, and R. Wille, editors, Formal Concept Analysis , volume 3626 of Lecture Notes in Computer Science, pages 196 { 225 . Springer, 2005 .

15.

S. O.

Kuznetsov and S. A. Obiedkov. Comparing performance of algorithms for generating concept lattices . J. Exp. Theor. Artif. Intell. , 14 ( 2-3 ): 189 { 216 , 2002 .

16.

Lehmann and

Wille . A triadic approach to formal concept analysis . In ICCS , volume 954 of LNCS , pages 32 { 43 . Springer, 1995 .

17.

Madeira and

Oliveira . Biclustering algorithms for biological data analysis: a survey . IEEE/ACM Transactions on Computational Biology and Bioinformatics , 1 ( 1 ): 24 { 45 , 2004 .

18.

Rogovschi ,

Labiod , and

Nadif . A spectral algorithm for topographical co-clustering . In IJCNN , pages 1{6 . IEEE, 2012 .

19. T. Uno,

Kiyomi , and

Arimura . Lcm ver. 2: E cient mining algorithms for frequent/closed/maximal itemsets . In R. J. B. Jr. , B. Goethals , and M. J. Zaki, editors, FIMI , volume 126 of CEUR Workshop Proceedings. CEUR-WS.org , 2004 .

20. D. van der Merwe, S. Obiedkov, and

Kourie . AddIntent: A New Incremental Algorithm for Constructing Concept Lattices . In P. Eklund, editor, Concept Lattices , volume 2961 of LNCS , pages 205 { 206 . Springer, Berlin/Heidelberg, 2004 .

21.

Veroneze ,

Banerjee , and

F. J. V.

Zuben . Enumerating all maximal biclusters in real-valued datasets . CoRR, abs/1403.3562 , 2014 .

Proof. 1. B = attr(d). We prove that m 2 attr(b) $ m 2 B. Since B = A , then, by the de nition of maximal bicluster we have that 8m 2 M : m 2 B $ jA(m)j , if and only if jmin(A(m)) max(A(m))j if and only if (by the de nition of d) m 2 attr(d). tu 2. We need to prove that A = d and that A = d. A = d holds by the de nition of d. As for A = d , we take g 2 d , which means that 8m 2 M : g(m) 2 d(m), also if m 2 B, which implies that g 2 A by de nition of maximal bicluster .