=Paper=
{{Paper
|id=None
|storemode=property
|title=A Closure Algorithm Using a Recursive Decomposition of the Set of Moore Co-families
|pdfUrl=https://ceur-ws.org/Vol-959/paper9.pdf
|volume=Vol-959
|dblpUrl=https://dblp.org/rec/conf/cla/ColombIRR11
}}
==A Closure Algorithm Using a Recursive Decomposition of the Set of Moore Co-families==
A closure algorithm using a recursive decomposition of the set of Moore co-families Pierre Colomb1 , Alexis Irlande2 , Olivier Raynaud1 , Yoan Renaud3 1 Clermont Université, Université Blaise Pascal, Campus des Cézeaux 63173 Clermont-Ferrand, France pierre@colomb.me, raynaud@isima.fr 2 Universidad Nacional de Colombia Bogota, Colombia irlande@lirmm.fr 3 INSA de Lyon, Bâtiment Blaise Pascal Campus de la Doua, 69621 Villeurbanne, France yoan.renaud@insa-lyon.fr Abstract. Given a set Un = {1, ..., n}, a collection M of subsets of Un that is closed under intersection and contains Un is known as a Moore family. The set of Moore families for a fixed n is in bijection with the set of Moore co-families (union-closed families containing the empty set) denoted itself Mn . This paper follows the work initiated in [8] and [9] about the recursive decomposition of the lattice of Moore co-families. We first show that each Moore co-family can be represented by a decom- position tree and we use this principle to design an original algorithm to close under union any given family. Then we discuss about the time complexity and give some experimental results. 1 Introduction The concept of collection of sets on a ground set Un closed under intersection appears with different names depending of the scientific fields. The name ‘Moore families’ was first used by Birkhoff in [4] referring to E.H. Moore’s researches. But, very frequently, such a collection is called ‘closure spaces’ (or ‘closure sys- tems’) or ‘convexity spaces’. This concept is applied in numerous fields in pure or applied mathematics and computer science. For mathematical researches in algebra and topology we can cite [7, 19, 20]. For researches in order and lattice theory we have to cite [4, 10] for their works on closure operators. Formally a closure operator is an extensive, isotone and idempotent function on 2Un , and a closure system is then the sets of its fixed points. In particular it is shown that any closure system is a complete lattice. From 1937 Birkhoff in [3] gave a com- pact representation of ‘quasi ordinal spaces’ (in other words of collections closed by intersection and union also called distributive lattices). More recently the col- lection appears again as the main concept in computer science with researches in relational databases ([11]), in data analysis and formal concept analysis ([13, 1, 15]). More precisely, Ganter and Wille define a mathematical framework for clas- sification and Barbut and Monjardet used the Galois lattice for equivalent tasks. In the same time, in 1985 ([12]) such collections are called ‘knowledge spaces’ by Doignon et Falmagne. An important fact is that the collection of Moore families on Un is itself a Moore family or a closure system. Indeed, the system composed of Moore families contains one maximum element (2Un , all subsets of Un ) and the intersection of two Moore families is a Moore family itself. To get an over- all view of the properties of this closure system, see the survey of Caspard and Monjardet [6]. Previously in the introduction, we have defined a Moore family on Un as a collection of sets containing Un and closed by intersection. But for reasons of legi- bility of the results and simplification of expressions, this article deals with the set of families closed by union and containing the empty set called Moore co-families and denoted by Mn . Basically, the set of Moore families is in bijection with this set. For a given Moore family, one only has to complement every set to obtain a Moore co-family. For example, the Moore family {{1}, {1, 2}, {1, 3}, {1, 2, 3}} on U3 corresponds to the Moore co-family {∅, {2}, {3}, {2, 3}} and vice versa. In [8], Colomb & al counted the exact number of Moore co-families for n = 7 and stated a recursive decomposition theorem of the lattice of Moore co-families in [9]. In the same article authors state that the set Mn is endowed with the quotient partition associated with the operator h (each class of the partition contains all the families which have the same image by h) and prove that each class has a distributive lattice structure. This operator h is the main concept underlying the recursive description of Mn . More recently in [2] authors give a complete characterization of the set h(M) \ M. In the present article we pursue investigations involving Moore co-families by using the recursive decomposition theorem. In the third section we describe suc- cinctly the theorem and we show that each Moore co-family can be represented by a decomposition tree. In the fourth section we describe an original algorithm to generate a Moore co-family from the set of its join-irreducible elements. Then we discuss about the time complexity of the process. Some experimental results are given in section five. In the rest of the paper, we denote elements by numbers (1, 2, 3, . . . ). Sets are denoted by capital letters (A, B, C, . . . ). Families of sets are denoted by calligraphic letters (A, B, C, . . . ). Finally, we denote the sets of families of sets by black board letters (A, B, C, . . . ). 2 Definitions and notations For any integer n ≥ 1, let Un denote the set {1, . . . , n}. Let M be a family, we note (M, ⊆) the corresponding partial order. Two sets M, M 0 in M such that neither M ⊆ M 0 and M 0 ⊆ M are said incomparable in (M, ⊆). A family where every pair of sets is incomparable is called an antichain. We say that M is covered by M 0 in (M, ⊆), denoted M ≺ M 0 , if M ⊂ M 0 and there is not M ” ∈ M such that M ⊂ M ” and M ” ⊂ M 0 . Given a family M, a subfamily I of M is an ideal of M if it satisfies the following implication for any pair M and M 0 in M, M ⊆ M 0 and M 0 ∈ I ⇒ M ∈ I. In other words, an ideal of M is some antichain of M and everything below it. We shall use IM to denote the sets of ideals on M. Given a set X in a family M, there exists a unique ideal I of M with X as a maximum set. Let IM (X) denote this ideal. A set J ∈ M is called a join-irreducible if J covers only one set. The set of all join-irreducible sets of M is denoted JM . Let M be a Moore co-family and x an element, we denote Mx (resp. Mx̄ ) the restriction of M to its sets containing x (resp. to its sets not containing x). The empty set is added to Mx . By extension, we denote by Jx (resp. Jx̄ ) the set of join-irreducible elements of M which contain the element x (resp. which do not contain the element x). 3 Recursive decomposition of the Moore co-families lattice In previous work, Colomb & al. gave a recursive definition of the Moore co- families on Un , for any n ≥ 1 (the only Moore co-family on U0 = ∅, is {∅}). Definition 1. For any integer n such that n ≥ 1, we define g1,n : Mn−1 −→ Mn M 7−→ {X ∈ 2Un | ∃M ∈ M \ {∅}such that X = M ∪ {n}} ∪ {∅}, g2,n : Mn−1 −→ Mn M 7−→ {X ∈ 2Un | ∃M ∈ M such that X = M ∪ {n}} ∪ {∅}, hn : Mn−1 −→ Mn−1 M 7−→ {X ∈ 2Un−1 | ∀M ∈ g1,n (M) \ {∅}, X ∪ M ∈ g1,n (M)}. We may use h, g1 and g2 instead of hn , g1,n and g2,n when no confusion is possible. Function g2 consists in adding element n to every set of a family M in Mn−1 (thus including the singleton {n}) plus the empty set. Function g1 behaves similarly but removes the singleton {n}. With these functions defined, we can state Theorem 1, giving a description of Mn with respect to Mn−1 . Theorem 1 (Colomb & al. [9]). For any integer n such that n ≥ 1, Mn = M∈Mn−1 {g1 (M) ∪ M0 | M0 ∈ IMn−1 (h(M))} ∪ S 00 00 S M∈Mn−1 {g2 (M) ∪ M | M ∈ IMn−1 (M)} In other words there exists a natural bi-partition of Mn . Families not con- taining the singleton {n} under the form g1 (M) ∪ M0 with M0 a sub-Moore co-family of h(M) and families containing {n} under the form g2 (M) ∪ M00 with M00 a sub-Moore co-family of M. 3.1 Decomposition tree of a Moore co-family Another interpretation is that for any element x in Un and any Moore co-family M in Mn , M can be written Mx̄ ∪ gi,x (M0 ) with M0 such that gi,x (M0 ) cor- responds to Mx . By applying recursively this decomposition principle to M we obtain a decomposition binary tree of M (each leaf is an empty set which cannot be further decomposed). Basically each leaf corresponds to a set of the initial family M: to obtain the set corresponding to a leaf, one has only to apply it- eratively the different functions g1,x or g2,x that can be found in the path from the chosen leaf to the root of the tree. An example of decomposition of Moore co-family is given in Figure 1. 1234 123 134 234 12 13 23 24 1 3 ø 1 g2,1 234 234 23 34 23 24 2 3 3 ø ø 2 g1,2 2 g2,2 34 34 34 3 3 4 3 3 ø ø ø ø 3 g2,3 3 g2,3 3 g2,3 3 g2,3 4 4 4 4 ø ø ø ø ø 3 ø ø 1 ø 12 ø 4 g2,4 4 g2,4 4 g2,4 4 g2,4 ø ø ø ø ø ø ø ø 24 23 234 13 134 123 1234 Fig. 1. Decomposition of a Moore co-family 4 A new algorithm to compute a union-closed family In this section, we present an algorithm to generate a Moore co-family M from its set of join-irreducible elements denoted JM . This algorithm is based on the theorem 1. 4.1 Closure under union algorithm Problem of generation of closed-sets from a representative family has been well- studied these last years. Most of existing algorithms are based on a decompo- sition strategy [14, 5, 17, 16]. As previously stated, generation of closed-sets and co-closed sets can be treated in the same way (i.e. one only have to complement every sets of input and output families to obtain both closed and co-closed fam- ilies). The question we ask here is, given a family of sets J , how to compute its associated Moore co-family M, i.e. the smallest Moore co-family that con- tains all sets of J . We choose to denote J the given set because the smallest representative family for a Moore co-family is the family of its join-irreducible sets. We propose an algorithm based on the decomposition of a Moore co-family that only uses families of join-irreducible sets. In other words, for each step of the recursive decomposition, given a local set JM , we compute families of join-irreducible sets JMx̄ and JM0 corresponding to Mx̄ and M0 such that M = Mx̄ ∪ gi (M0 ). Leaves of the decomposition tree obtained after the whole recursive decomposition correspond to the closure of the initial set M. In the following, we describe how to compute JMx̄ and JM0 . Proposition 1. Let M, M0 in Mn , x in Un and i in [1, 2] with M = Mx̄ ∪ gi,x (M0 ). Then – JMx̄ = Jx̄ . – JM0 ⊆ {J\{x} | J ∈ Jx } ∪ {J1 ∪ J2 | J1 ∈ {J\{x} | J ∈ Jx }, J2 ∈ Jx̄ } Proof. – JMx̄ = Jx̄ Any join-irreducible element of M not containing x is also join-irreducible of Mx̄ . Indeed, every predecessors of each set in M not containing x, doesn’t contain x itself. Similarly, there is no new join-irreducible element in JMx̄ . – JM0 ⊆ {J\{x} | J ∈ Jx } ∪ {J1 ∪ J2 | J1 ∈ {J\{x} | J ∈ Jx }, J2 ∈ Jx̄ } Let us show that gi (JM0 ) ⊆ Jx ∪ {J1 ∪ J2 | J1 ∈ Jx , J2 ∈ Jx̄ }. By contradiction, let J be a set in gi,x (JM0 ) such that J 6∈ Jx ∪{J1 ∪J2 | J1 ∈ Jx , J2 ∈ Jx̄ }. So, we have x ∈ J, J 6∈ Jx and J 6∈ Jx̄ . Hence, J 6∈ JM . Let S = {s1 , s2 , ..., sn } (resp. S 0 = {s01 , s02 , ..., s0m }) be the family of maximal join-irreducible sets of Mx̄ (resp. of Mx ) such that ∀i ∈ [1, n], si ⊂ J (resp. ∀j ∈ [1, m], s0j ⊆ J). Since J 6∈ Jx ∪ {J1 ∪ J2 | J1 ∈ {J\{x} | J ∈ Jx }, J2 ∈ Jx̄ } we have 6 ∃i ∈ [1, n] and 6 ∃j ∈ [1, m] such that si ∪ s0j = J or s0j = J. So, ∃i, j ∈ [1, n] and ∃k, l ∈ [1, m] such that si ∪ s0k is incomparable with sj ∪ s0l and such that ∀u ∈ [1, n] and ∀v ∈ [1, m], either su ∪ s0v ⊆ si ∪ s0k , or su ∪ s0v ⊆ sj ∪ s0l . But, with J = si ∪s0k ∪sj ∪s0l , we can say that J covers si ∪s0k and sj ∪s0l . Hence, J is not a join-irreducible set of Mx , J \ {x} is not a join-irreducible set of M0 and we conclude that J does not belongs to gi,x (JM0 ). Contradiction. t u Straightforward from proposition 1 we can design an algorithm to close any given input family (see algorithm 1). For the first step of algorithm Co-closure, P is initialized to Un that represents all possible choices of an element for the next decomposition. S is initialized to the empty set and allows to store element that would form a co-closed set. V erif is initialized to true. An example of an execution is given in figure 2. 4.2 Discussion about the time complexity of algorithm Co-closure. Basically, the decomposition tree obtained at the end of the whole process of Algorithm 1 is a binary tree. Internal nodes have zero, one or two children. Only the last case induces the algorithm to compute a direct product which create a new path in the tree corresponding to a new closed set. For this reason we can say that Algorithm 1 computes as many direct products as the number of different closed sets of the final result. By this way, the most important part of the whole time complexity, which corresponds to the computation of direct products, can be dispatched on each internal node with two children. Locally, the algorithm computes one direct product between two families whose the size is variable. Let S be an hypothetic maximal size of a family involved in a product. Then, treatment of an internal node is done with runtime of O(n.S 2 ). But one have to know that the size of the family resulting of the product is never superior to the number of leaves of the sub tree rooted to this internal node. That means that the local complexity is polynomially linked with the result. We give in the last section the experimental results. They are far to be favorable to our approach and one explication could be the following: we said that the size of the direct product computed for each internal node is linked to the number of families which have to be generated further, but the process to compute this product is not linear with the size of the local result of this product. This point give us a huge space to improve our global process. That means that Algorithm 1 makes the link between computing a closure and computing a direct product under constraints to be defined. Algorithm 1: Co-closure(J , P , S, V erif ) Input: J a family of sets; P , S sets of elements; V erif a Boolean. begin if J =6 {∅} and J =6 {} then Choose an element Sx ∈ P ; JMx̄ ← Jx̄ ; Co-closure(JMx̄ , JMx̄ , S, V erif ); T ← {J\{x} | J ∈ Jx }; JM0 ← T ∪ {T ∪ J | T ∈ T , J ∈ JMx̄ }; if {x} 6∈ J then V erif ← f alse; else V erif ← true; S Co-closure(JM0 , JM0 , S ∪ {x}, V erif ); else if V erif = true and J = 6 {} then return(S); S ← {}; end 134 12 23 24 1 3 1 g2,1 34 23 24 2 3 3 g1,2 2 g2,2 2 34 34 3 3 4 3 3 3 g2,3 3 g2,3 3 g2,3 3 g2,3 4 4 4 4 ø 3 1 12 4 g2,4 4 g2,4 4 g2,4 g2,4 4 24 23 234 13 134 123 1234 Fig. 2. Decomposition tree corresponding to the execution of the algorithm on the input family {{1}, {1, 2}, {3}, {2, 3}, {1, 3, 4}, {2, 4}} 5 Experimental results 5.1 Experimental design In order to assess performances, the approach described previously has been implemented in C. The executable file was generated with compiler GCC 4.6. The experiments were performed on a Intel Core2 Q9300 processor with 2.2 GHz of speed and 8 Go of memory. Several well-known benchmark real-world data-sets were chosen from [21]. The following table summarizes the data-sets caracteristics. Data-sets # Attributes # Sets # Attributes per set # Closed set Mushroom 119 8124 23 238709 Chess 75 3196 37 930851336 Connect 129 67557 43 1415737994 To compare performances of Co-closure and previous approaches with respect to the size of family, we implemented a version of Norris’s algorithm (see [18]) as well as a version of Ganter’s algorithm called Next-closure (see [14]). Note that we have used the natural one to one mapping between Moore and co-Moore families to compute closed sets using Co-Closure algorithm. 5.2 Results We have compared execution time of these different algorithms on parts of bench- mark data-sets. In that way, we have executed algorithms on the first m sets of each benchmark data-sets by varying m. On figures 3, 4, and 5 we give numbers of co-closed sets we obtained by execution of the three algorithms for each size of the considered data-sets partition. In the last three rows we give the execution time for each algorithm. #sets # closed Ganter Norris Co-Closure 100 10867 20ms 10ms 160ms 200 73033 80ms 80ms 1s56 500 1051399 1s54 2s98 7s3 1000 38558373 2m04 2m48 31m52 1500 183655113 13m24 1h04 ? 2000 292107880 26m36 ? ? 2500 386235477 46m25 ? ? 3196 930851336 2h33 ? ? Fig. 3. Chess #sets # closed Ganter Norris Co-Closure 10 75 20ms 20ms 20ms 20 270 20ms 20ms 30ms 50 1208 30ms 20ms 40ms 100 3459 30ms 30ms 110ms 200 7086 30ms 3ms 380ms 500 17781 60ms 5ms 2s 1000 32513 210ms 100ms 6s 1500 48414 460ms 170ms 15s 2000 58982 750ms 250ms 30s 2500 72008 1s23 360ms 45s 3000 80901 1s75 480ms 79s 3500 94350 2s32 650ms 112s 4000 104104 2s99 810ms 137s 4500 122950 3s 1s21 194s 5000 136401 4s 1s47 237s 5500 150948 5s 1s78 304s 6000 156573 6s 1s92 348s 6500 195677 7s 2s80 517s 7000 214950 8s 3s30 593s 7500 230882 10s 3s73 718s 8000 237874 11s 3s94 815s 8124 238709 12s 3s96 818s Fig. 4. Mushroom #sets # closed Ganter Norris Co-Closure 100 13406 410ms 430ms 550ms 200 63360 470ms 440ms 2s92 500 445676 1s 1s48 2m09 1000 1069569 2s 8s 14m26 2000 4732622 22s 56s ? 5000 22543073 3m51 16m50 ? 10000 69916189 23m18 3h58 ? 20000 242644240 2h50 ? ? 67557 1415737994 4j ? ? Fig. 5. Connect 6 Conclusion In [8] authors have computed the exact size of |M7 |. From this first challenge has derived a recursive decomposition theorem which has been formalized in [9]. In the present article we first show that each Moore co-family can be represented by a decomposition tree and we use this principle to design an original algorithm to close under union any given family. The process has been implemented and the results on well-known benchmarks are given in the last section. Experimentations state the correctness of the process but show that time complexity and space complexity are not favorable. However, we think the process is interesting for two main reasons. Firstly, the decomposition tree obtained by the process is based on the recursive definition of the set of Moore co-families and corresponds to a new approach. And secondly the computing of each closure is shown to be linked to the computing of a direct product under constraints to be defined. It will be interesting to use some better practices concerning the direct product to improve the process. In that case the time complexity for each generated set will correspond to the time complexity needed for the algorithmic treatment of an internal node with two children. So, any new performance to compute the local direct product should lead to a more efficient algorithm. References 1. Barbut, M., Monjardet, B.: Ordre et classification. Hachette (1970) 2. Beaudou, L., Colomb, P., Raynaud, O. In: submitted to ISAAC. (2011) 3. Birkhoff, G.: Rings of sets. Duke Mathematical Journal 3 (1937) 443–454 4. Birkhoff, G.: Lattice Theory. Third edn. American Mathematical Society (1967) 5. Bordat, J.P.: Calcul pratique du treillis de galois d’une correspondance. Math. Sci. Hum. 96 (1986) 31–47 6. Caspard, N., Monjardet, B.: The lattices of closure systems, closure operators, and implicational systems on a finite set : a survey. Discrete Applied Mathematics 127 (2003) 241–269 7. Cohn, P.: Universal Algebra. Harper and Row, New York (1965) 8. Colomb, P., Irlande, A., Raynaud, O.: Counting of Moore families on n=7. In: ICFCA, LNAI 5986, Agadir, Marocco. (2010) 9. Colomb, P., Irlande, A., Raynaud, O., Renaud, Y.: About the recursive décompo- sition of the lattice of moore co-families. In: ICFCA, Nicosia, Cyprius. (2011) 10. Davey, B.A., Priestley, H.A.: Introduction to lattices and orders. Second edn. Cambridge University Press (2002) 11. Demetrovics, J., Libkin, L., Muchnik, I.: Functional dependencies in relational databases: A lattice point of view. Discrete Applied Mathematics 40(2) (1992) 155–185 12. Doignon, J.P., Falmagne, J.C.: Knowledge Spaces. Springer, Berlin (1999) 13. Duquenne, V.: Latticial structure in data analysis. Theoretical Computer Science 217 (1999) 407–436 14. Ganter, B.: Two basic algorithms in concept analysis., Preprint 831, Technische Hochschule Darmstadt (1984) 15. Ganter, B., Wille, R.: Formal concept analysis, mathematical foundation, Berlin- Heidelberg-NewYork et al.:Springer (1999) 16. Gely, A.: A generic algorithm for generating closed sets of a binary relation. In: ICFCA’05. (2005) 17. Lindig, C.: Fast concept analysis. In: Working with Conceptual Structures - Contributions to ICCS 2000, Shaker Verlag (August 2000) 152–161 18. Norris, E.M.: An algorithm for computing the maximal rectangles in a binary relation. Revue Roumaine de Mathématiques Pures et Appliquées 23(2) (1978) 243–250 19. G.Sierksma: Convexity on union of sets. Compositio Mathematica volume 42 (1981) 391–400 20. Ven, L.V.D.: Theory of convex structures. North-Hollande, Amsterdam (1993) 21. : Uci machine learning repository.