Introduction

Advances in algorithms based on CbO

Petr Krajca

krajcap@inf.upol.cz 0

Jan Outrata

jan.outrata@upol.cz 0

Vilem Vychodil?

vilem.vychodil@upol.cz 0 0 Department of Computer Science, Palacky University , Olomouc , Czech Republic Tr. 17. listopadu 12, 771 46 Olomouc , Czech Republic

325 337

The paper presents a survey of recent advances in algorithms for computing all formal concepts in a given formal context which result as modi cations or extensions of CbO. First, we present an extension of CbO, so called FCbO, and an improved canonicity test that signi cantly reduces the number of formal concepts which are computed multiple times. Second, we outline a parallel version of the proposed algorithm and discuss various scheduling strategies and their impact on the overall performance and scalability of the algorithm. Third, we discuss important data preprocessing issues and their in uence on the algorithms. Namely, we focus on the role of attribute permutations and present experimental observations about the e ciency of the proposed algorithms with respect to the number of inversions in such permutations.

Introduction

The major issue of widely-used algorithms for computing formal concepts, including CbO [12{14], NextClosure [ 5, 6 ], or UpperNeighbor [ 16 ], is that some concepts are computed multiple times which brings signi cant overhead. This paper deals with various ways to reduce the overhead. Notice that recently an increasing attention has been paid to various modi cations of CbO, see [ 1, 10, 17 ].

This paper presents a survey of recent advances in three interconnected areas. First, we present an algorithm called FCbO which achieves better performance than CbO by reducing the total number of formal concepts that are computed multiple times. The reduction is achieved by introducing an additional canonicity test which e ectively prunes the CbO tree during the computation. Second, we elaborate on issues related to parallel execution of FCbO. We have already proposed a parallel variant of CbO, so-called PCbO [ 10 ]. In this paper, we propose an analogous parallelization of FCbO and we discuss various workload distribution strategies that may have impact on the overall performance of such parallelization. Third, we focus on data preprocessing|an important issue that is often underestimated. Namely, some algorithms for FCA (including those from the CbO family) achieve better performance if attributes are processed in particular order. In this paper, we present a preliminary study of the role of attribute permutations on the performance of CbO and the derived algorithms. Notation Throughout the paper, X = f0; 1; : : : ; mg and Y = f0; 1; : : : ; ng are nite nonempty sets of objects and attributes, respectively, and I X Y is an incidence relation. The triplet hX; Y; Ii is a formal context. The concept-forming operators induced by I will be denoted by "I : 2X ! 2Y and #I : 2Y ! 2X , respectively, see [ 6 ] for details. We assume that reader has knowledge of basic algorithms for FCA. 2

FCbO: Fast Close-by-One with New Canonicity Test In this section we brie y describe the new canonicity test and a new algorithm derived from CbO which uses this test. Recall that the original canonicity test used by CbO (and NextClosure) is always used after a new formal concept is computed. For B Y and j 62 B, one checks whether

B \ Yj = D \ Yj ; where D = (B [ fjg)#I "I and Yj = fy 2 Y j y < jg. FCbO employs an additional test that is performed before D is computed, eliminating thus the computation of #I "I . Notice that (1) fails iff B j 6= ;, where

B j = (D n B) \ Yj = (B [ fjg)#I "I n B \ Yj : (1) (2) The new canonicity test exploits the fact that if (1) fails for given B and j 62 B, the monotony of #I "I yields that the test will also fail for each B0 B such that j 62 B0 and B j * B0. The conclusion can be done without computing D. If B j B0, we are still compelled to perform the original canonicity test. Thus, the new (additional) canonicity test is based on the following assertion: Lemma 1 (See [ 17 ]). Let B Y , j 62 B, and B ; j 6= ;. Then, for each B0 B such that j 62 B0 and B ; j 6 B0, we have B0 ; j 6= ;. tu

FCbO can be seen as an extended version of CbO in that we propagate the information about sets (2) which take part in the new test. The information is propagated in the top-down direction. In order to apply the new test, we have to change the search strategy of the algorithm from the depth- rst search (as it is in CbO) to a combined depth- rst and breadth- rst search. FCbO is represented by a recursive procedure FastGenerateFrom, see Algorithm 1, which accepts three arguments: a formal concept hA; Bi (an initial formal concept), an attribute y 2 Y ( rst attribute to be processed), and a set fNy Y j y 2 Y g of subsets of attributes Y the purpose of which is to carry information about sets (2). Each invocation of FastGenerateFrom has its own local queue used to store information about computed concepts. Unlike CbO, if the canonicity tests succeed (line 7 and line 10), we do not call FastGenerateFrom recursively but we store information about the concept in the queue (line 11). After each attribute is processed, we perform the recursive calls, see lines 17{19. The new canonicity test is performed in line 7 based on information stored in Nj 's. The original canonicity test is performed in line 10. If the test in line 10 fails, we update the contents of Mj , see line 13. Note that Mj 's can be seen as local copies

Algorithm 1: Procedure FastGenerateFrom(hA; Bi, y, fNy j y 2 Y g) 1 list hA; Bi // concept hA; Bi is processed, e.g., listed or stored // check halting condition of the current call 2 if B = Y or y > n then 3 return 4 end

// process all attributes beginning with y 5 for j from y upto n do 6 set Mj to Nj // Mj is a pointer to Nj // perform new canonicity test 7 if j 62 B and Nj \ Yj B \ Yj then

// compute new concept hC; Di = hA \ fjg#I ; (B [ fjg)#I"I i 8 set C to A \ fjg#I 9 set D to C"I // perform original canonicity test if B \ Yj = D \ Yj then // store new concept for further processing put hhC; Di; ji to queue 13 14 end 15 end 16 end

// perform recursive calls of FastGenerateFrom 17 while get hhC; Di; ji from queue do 18 FastGenerateFrom(hC; Di; j + 1; fMy j y 2 Y g) 19 end

// terminate current call 20 return // update information about implied attributes set Mj to D // Mj becomes a pointer to D of Nj's which are used as the third argument for consecutive calls of FastGenerateFrom. Sets Nj are used instead of (2) because it is actually easier (and more e cient) to maintain a set of pointers to intents than to compute (and allocate memory for) sets (2) during the computation.

FCbO is correct: when invoked with h;#I ; ;#I"I i, y = 0, and fNy = ; j y 2 Y g, Algorithm 1 lists all formal concepts in hX; Y; Ii in the same order as CbO, each of them exactly once. Let us note that FCbO can be turned into a \Fast NextClosure" (i.e., an algorithm that lists concepts in the lexicographical order [ 5 ]) by either (i) using a stack instead of a queue or by (ii) modifying the loop in line 5 so that it goes \from n downto y". See [ 17 ] for further details on FCbO. Example 1. Consider a context with X = f0; : : : ; 3g, Y = f0; : : : ; 5g, and I = fh0; 0i; h0; 1i; h0; 2i; h1; 0i; h1; 2i; h1; 3i; h1; 4i; h1; 5i; h2; 0i; h2; 1i; h2; 4i; h3; 1i; h3; 2ig. This formal context induces 12 formal concepts denoted C1; : : : ; C12. In case of both CbO and FCbO, the computation can be depicted by a tree. Moreover, a hC10, 2i 3 5 4 2 C5

hC1, 0i 1 3 5 4 2

0 C8

C9 hC12, 3i

Experimental Evaluation We have evaluated FCbO and compared CbO and FCbO using various real data sets and arti cial data sets. The impact of the new canonicity test is presented in Table 1 comparing the total numbers of closures computed by CbO and FCbO in selected benchmark data sets [ 2, 8 ]. The table includes numbers of concepts and ratios of the number of computed closures to the number of (distinct) formal concepts in the data, i.e., the frequency of successful canonicity tests. Apparently, FCbO has a higher rate of successful canonicity tests than CbO. Thus, in terms of the number of computed closures, FCbO is more e cient than CbO. Since the total number of computed closures directly in uences the speed of the algorithm, FCbO is (usually) faster than CbO [ 17 ]. The reduction of total time needed for computing all formal concepts is apparent from Table 2. The table shows total time (in seconds) needed to mushroom tic-tac-toe debian tags anon. web size 8; 124 119 958 29 14; 315 475 32; 710 295 density 19 % 34 % < 1 % 1 % FCbO

CbO

NextClosure UpperNeighbor

Berry's [ 3 ] analyze the data sets. For the purpose of comparison the table contains also other well-known algorithms. The experiments were performed on an Apple MacPro computer equipped with two quad-core processors (Intel Xeon, 2.8 GHz) and 16 GB of RAM and all algorithms were implemented in ANSI C using bitarray representation [ 11 ]. Notice that in the worst case, FCbO collapses into CbO (e.g., in case of I being the inequality relation on X = Y ). FCbO is a polynomial time-delay algorithm [ 7, 9 ] because the additional canonicity test has a linear time-delay overhead compared to CbO, see [ 17 ] for further details on FCbO and its performance. 3

PFCbO: Parallel FCbO and Workload Distribution This section is devoted to parallelization issues of FCbO. Recall that in [ 10 ], we have described PCbO which results by a parallelization of CbO. FCbO can be turned into a parallel algorithm in much the same way as the original CbO can be turned into PCbO. Since the procedure of parallelization is fairly similar to that presented in [ 10 ], we focus mainly on issues that are not discussed in [ 10 ]. Namely, we compare several strategies to balance the workload distribution among independent processors and compare their e ciency.

Following the ideas from [ 10 ], a parallel variant of FCbO consists of three stages: First, we compute and process all concepts that are derivable in less than L steps. Second, we store all concepts derivable in exactly L steps in a new queue. Third, we distribute concepts from the queue among P independent processors and we let each of the processors compute the remaining concepts using FCbO. Typically, each processor r has its own queue denoted queuer containing concepts assigned to this processor. A parallel algorithm based on these ideas shall be called Parallel Fast Close-byOne (PFCbO).

Clearly, the practical e ciency of both PCbO and PFCbO depends on the choice of the strategy that distributes concepts among processors during the third step of the computation. The decision how to assign concepts to particular queue is generally di cult since we do not know the distribution of formal concepts in the search space of all formal concepts until we actually compute them all and reveal the structure of the call tree. As a consequence, the distribution of workload may be in some cases unbalanced. In [ 10 ], we have used a simple round-robin principle which turned out to be reasonably e cient. Nevertheless, there are other schemes of the workload distribution that can be considered: (i) round-robin|concepts are distributed to queues attached to each processor, in the way that n-th concept is placed into a queuer where r = (n mod P )+1 and P is the number of processors. For instance, if we consider P = 4 and concepts C1; : : : ; C10, they are assigned to queues as follows: queue1 = fC1; C5; C9g; queue3 = fC3; C7g; queue2 = fC2; C6; C10g; queue4 = fC4; C8g: (ii) zig-zag|this strategy is similar to the previous strategy but it uses a di erent formula to determine the queuer. The queuer is given by r = min n mod z; z (n mod z) + 1 (3) where z = 2 P + 1 assuming that P is number of processors. For P = 4 and concepts C1; : : : ; C10 the distribution of concepts is queue1 = fC1; C8; C9g; queue3 = fC3; C6g; queue2 = fC2; C7; C10g; queue4 = fC4; C5g: r = (iii) blocks|this workload distribution scheme divides the queue of all concepts into chunks of approximately equal size and these \blocks of concepts" are redistributed into the queues of independent processors. In this case, the n-th concept is placed into queuer, where (n P )

Q with P being the number of processors, Q being the number of all concepts, and dxe being the usual ceiling function. For instance, in case of C1; : : : ; C10 (i.e., Q = 10) and four queues (i.e., P = 4), we get: (4) queue1 = fC1; C2g; queue3 = fC6; C7g; queue2 = fC3; C4; C5g; queue4 = fC8; C9; C10g: (iv) fair|all concepts remain stored in one shared queue and each processor gets concepts from the queue one by one. The bene t of this scheme is that it allows to react on the revealing structure of the call tree. On the other hand, this method of distributing concepts requires synchronization among processors while accessing this queue. Note that in contrast to the abovedescribed schemes, this scheme has no xed structure and the workload is distributed non-deterministically. (v) random|the workload is spread among processors randomly. We are considering this strategy to be referential and it is included for the purpose of comparison.

Experimental Evaluation In order to evaluate the strategies of workload distribution, we have tested our algorithm for each strategy using various data sets and various number of processors. Table 3 depicts the time needed to compute all formal concepts using particular strategy. Surprisingly, there are only small random (5000 random (10000 debian tags anon. web mushroom tic-tac-toe 100 10) 100 15) di erences among the considered schemes of the workload distribution, i.e., the ordinary round-robin used in [ 10 ] is indeed adequate for the job. Nevertheless, the fair strategy seems to be the most e cient. One can see that the round-robin and zig-zag strategies provide performance slightly better than the random workload distribution. On the other hand, the blocks scheme of distribution provides performance even worse than the random distribution and seems to be inappropriate for PFCbO. 4

Data Preprocessing Issues

Algorithms for computing concepts can be classi ed in many ways, see, e.g. [ 15 ]. An important attribute of algorithms for FCA is whether their performance depends on the order of objects and attributes in the input data table. Therefore, an algorithm for computing formal concepts shall be called (permutation) resistant whenever all isomorphic copies of a formal context hX; Y; Ii with Y = f0; 1; : : : ; ng require the same number of elementary computation steps in order to compute all concepts. For our purposes, an elementary computation step will be represented by computation of a single xpoint of the concept-forming operators "I and #I .

One can easily see that, e.g., Lindig's UpperNeighbor algorithm [ 16 ] is resistant. On the other hand, CbO and FCbO are not resistant. Indeed, a di erent order of attributes in a data table can yield di erent CbO and FCbO trees that may have di erent numbers of nodes (notice that the loop in line 5 of Algorithm 1 processes attributes from left to right). Since CbO and FCbO are not resistant, a proper ordering of attributes before computation can further reduce the number of concepts that are computed multiple times, thus improving the e ciency. In this section, we investigate particular permutations of attributes and explore the impact of inversions on the number of computed closures.

In order to describe various formal contexts with respect to the structure of the data table, we introduce a notion of an ordered formal context and inversion: De nition 1. An ordered formal context is a formal context hX; Y; Ii where Y = f0; : : : ; ng and for all attributes jf0g#I j jf1g#I j jfng#I j: (5)

A pair of attributes hy1; y2i 2 Y

an inversion.

Y such that jfy1g#I j 6 jfy2g#I j shall be called

Verbally, the attributes in an ordered formal context are sorted in the ascending order according to their support, i.e., the number of objects having these attributes. As a consequence of the previous de nition, an ordered formal context contains no inversions.

From the point of view of formal concepts and concept lattices, the order of objects and attributes in which they appear in the data table is not essential. Therefore, one can reorder attributes in an arbitrary way. From the computational point of view, however, it may happen that certain orderings of attributes yield better results in conjunction with particular argorithms than other orderings. In case of our algorithms, the order has an important impact on the process of the execution of both CbO and FCbO since the canonicity test is driven by the order of attributes. The following assertions show that for an ordered formal context with pairwise distinct columns, the canonicity tests succeed for all attribute concepts. We rst prove a technical claim: Lemma 2. Let hX; Y; Ii be an ordered formal context with Y = f0; : : : ; ng. Then, for each k; j 2 Y such that k < j, we have k 2 fjg#I "I iff fkg#I = fjg#I . Proof. Note that attributes from Y are integers and \<" denotes the usual strict linear order on the set of all integers. Suppose that k 2 fjg#I "I , i.e., fkg fjg#I "I . By the antitony of #I , we get fkg#I fjg#I "I #I = fjg#I . Thus, it remains to show the converse inclusion. Since hX; Y; Ii is ordered and k < j, we get jfkg#I j jfjg#I j, see (5). Hence, jfkg#I j jfjg#I j and fkg#I fjg#I yield fkg#I = fjg#I . Conversely, if fkg#I = fjg#I then obviously k 2 fjg#I "I , proving the claim.

Applying Lemma 2, we get: Theorem 1. Let hX; Y; Ii be an ordered formal context where Y = f0; : : : ; ng and fag#I 6= fbg#I for any a; b 2 Y . Then for each j 2 Y such that j 62 ;#I "I , ;#I "I \ Yj = fjg#I "I \ Yj ; where Yj = fy 2 Y j y < jg.

Proof. Take j 2 Y such that j 62 ;#I "I . Observe that condition (6) holds true iff there is no attribute k 2 Y such that k 62 ;#I "I , k < j, and k 2 fjg#I "I . Thus, consider any k 2 Y such that k < j. Since hX; Y; Ii is ordered, our assumption j 62 ;#I "I yields k 62 ;#I "I . By the assumption, fkg#I 6= fjg#I , i.e., Lemma 2 yields k 62 fjg#I "I , nishing the proof. tu

Theorem 1 shows that for an ordered formal context with pairwise distinct columns, invocations of FastGenerateFrom in the rst level of recursion always succeeds and generates concepts. Moreover, from the proof of Theorem 1 it follows that in any ordered formal context, the rst derivation [ 10 ] does not exists for attribute j if there is an attribute k such that k < j and fkg#I = fjg#I . tu (6) 16 · 106 CbO FCbO

FCbO 0

This has a practical consequence for the parallel variants of CbO and FCbO in case of ordered contexts, because it allows us to determine the number of concepts generated during the rst stages of the algorithms. If the number of attributes is signi cantly larger than the number of processors, and this condition is usually ful lled, it is su cient to compute only the rst derivations and then distribute the workload among all processors.

Furthermore, our empirical experiments have shown an interesting tendency that while processing ordered formal contexts, canonicity tests fail less frequently than in case of contexts containing inversions. In addition, the experiments have shown that with the increasing number of inversions in a data table, the average number of computed closures grows. For instance, Fig. 2 shows how the number of inversions in the mushroom data set a ects the total number of computed closures. The rst graph (at the top) depicts this dependency for CbO and FCbO. The second graph (at the bottom) provides a more detailed view for FCbO. Similar tendency can be observed for other benchmark data sets. Remark 1. Let us note that the ordering of attributes introduced by (5) has already been used in [ 4 ] but the purpose of the ordering in [ 4 ] is di erent. In [ 4 ], the authors use this particular ordering of attributes in a parallel version of Ganter's NextClosure algorithm to achieve soundness of the algorithm (each 0 · 107 normal uniform 0 concept is then listed only once) while in our case, [ 4 ] is used for the sake of increased e ciency.

Remark 2. We have observed a general tendency that certain data sets are more a ected by the above-discussed phenomenon than others. For instance, if 1's in a data table are approximately uniformly spread among attributes (i.e., each attribute has approximately the same support), the ordering of attributes (usually) does not have a considerable e ect on decreasing the number of closures. Fig. 3 depicts how the increasing number of inversions a ects the number of computed closures in two arti cial data sets where 1's are distributed (i) approximately uniformly among the attributes and (ii) approximately normally among the attributes. Both data sets have the same parameters in that they consist of 1000 objects, 100 attributes, and contain 15 % of 1's, however, the distributions of 1's among the attributes are quite di erent. As one can see from Fig. 3, the impact of the number of inversions on the number of computed closures is more signi cant in case of normally distributed 1's among the attributes. Experimental Evaluation From our observations it follows that it is desirable to incorporate a preprocessing step which transforms a formal context into a corresponding ordered formal context. In order to evaluate the bene ts of this preprocessing step, we have used similar approach as in case of evaluation of mushroom tic-tac-toe debian tags anon. web PFCbO (P = 1) PFCbO (P = 2) PFCbO (P = 4) PFCbO (P = 8)

PCbO (P = 1) PCbO (P = 2) PCbO (P = 4) PCbO (P = 8) the new canonicity test. We have focused on the total numbers of closures and concepts computed by FCbO while processing various ordered and unordered data sets. The results are presented in Table 4 which also includes corresponding ratios.

Apparently, reordering of attributes reduces the number of computed closures, and thus, can reduce time of computation. Note that the Ganter's algorithm [ 5, 6 ] is in principle equivalent to CbO. As such, it is also not permutation resistant. Thus, the preprocessing step which reorders attributes can also increase its performance. 5

Overall Evaluation

So far, we have proposed and evaluated several improvements and re nements of the original CbO and PCbO algorithms, namely, new canonicity test, workload distribution schemes, and reordering attributes. However, we have evaluated the impact of each improvement separately. Therefore, we conclude this paper with the evaluation of PFCbO which includes all these improvements.

Table 5 table shows the total time (in seconds) needed to compute all formal concepts in the benchmark data sets using PCbO and PFCbO run on the Apple MacPro computer, equipped with eight processor cores. The parameter P indicates the number of used processors for particular experiment.

Fig. 4 demonstrates the scalability of PFCbO, i.e., the ability to decrease the time of computation by using more processors. In the depicted two experiments, we have used computer equipped with Sun UltraSPARC T1 processor having eight cores (each capable to process up to 4 threads simultaneously) and 8 GB of RAM. Fig. 4 (at the top) shows relative speed up for data sets having 10000 objects, 10 % density of 1's in the data table and various counts of attributes. Fig. 4 (at the bottom) shows relative speed up for data sets having 1000 objects, 100 attributes, and various densities of 1's in the data tables. The 1's in both data sets spread approximately normally among the attributes. Note that each graph contains a certain point from which the increasing number of processors does not allow to take advantage of more processors and performance of the number of processors 100 attrs. 150 attrs. 200 attrs. 250 attrs. 100 attrs. 150 attrs. 200 attrs. algorithm may even decline due to the overhead related to the management of multiple threads of execution. However, this is a quite common behavior of parallel algorithms.

12 16 20 24 28

32 number of processors

1. Andrews

: In-Close, a Fast Algorithm for Computing Formal Concepts . In: Rudolph , Dau, Kuznetsov (Eds.): Supplementary Proceedings of ICCS '09 , CEUR

483( 2009 ), 14 pages. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol- 483 /paper1.pdf

2. Asuncion

, Newman

D.:

UCI Machine Learning Repository . University of California, Irvine, School of Information and Computer Sciences, 2007 .

3. Berry

, Bordat

J.-P.

, Sigayret

: A local approach to concept generation . Annals of Mathematics and Arti cial Intelligence , 49 ( 2007 ), 117 { 136 .

4. Fu

, Mephu Nguifo E. : A Parallel Algorithm to Generate Formal Concepts for Large Data . ICFCA 2004 , LNCS 2961, pp. 394 { 401 .

5. Ganter

: Two basic algorithms in concept analysis . (Technical Report FB4- Preprint No. 831) . TH Darmstadt , 1984 .

6. Ganter

, Wille

: Formal concept analysis . Mathematical foundations . Berlin: Springer, 1999 .

7. Goldberg L . A.: E cient Algorithms for Listing Combinatorial Structures . Cambridge University Press, 1993 .

8. Hettich

, Bay S . D.: The UCI KDD Archive University of California, Irvine, School of Information and Computer Sciences, 1999 .

9. Johnson D. S, Yannakakis

, Papadimitriou

C. H.

: On generating all maximal independent sets . Information Processing Letters 27 ( 3 )( 1988 ), 119 { 123 .

10. Krajca

, Outrata

, Vychodil

: Parallel Recursive Algorithm for FCA . In: Belohlavek, Kuznetsov (Eds.): Proc. CLA 2008 , CEUR WS 433 ( 2008 ), 71 { 82 . http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol- 433 /paper6.pdf

11. Krajca

, Vychodil

: Comparison of data structures for computing formal concepts . In: Proc. MDAI 2009 , LNAI 5861 ( 2009 ), 114 { 125 .

12. Kuznetsov

S.:

Interpretation on graphs and complexity characteristics of a search for speci c patterns . Automatic Documentation and Mathematical Linguistics , 24 ( 1 )( 1989 ), 37 { 45 .

13. Kuznetsov

S.:

A fast algorithm for computing all intersections of objects in a nite semi-lattice (Bystryi$ algoritm postroeni vseh pereseqenii$ obektov iz koneqnoi$ polurexetki , in Russian). Automatic Documentation and Mathematical Linguistics , 27 ( 5 )( 1993 ), 11 { 21 .

14. Kuznetsov

: Learning of Simple Conceptual Graphs from Positive and Negative Examples . PKDD 1999 , pp. 384 { 391 .

15. Kuznetsov

, Obiedkov

S.:

Comparing performance of algorithms for generating concept lattices . J. Exp. Theor. Artif. Int. , 14 ( 2002 ), 189 { 216 .

16. Lindig

: Fast concept analysis . Working with Conceptual Structures{ {Contributions to ICCS 2000 , pp. 152 { 161 , 2000 . Aachen: Shaker Verlag.

17. Outrata

, Vychodil

: Fast algorithm for computing xpoints of galois connections induced by object-attribute relational data (in preparation).