Random extents and random closure systems Bernhard Ganter Institut für Algebra Technische Universität Dresden Abstract. We discuss how to randomly generate extents of a given formal context. Our basic method involves counting the generating sets of an extent, and we show how this can be done using the Möbius function. We then show how to generate closure systems on seven elements uniformly at random. 1 Introduction Let Random(0,1] denote an operator that generates a random number between 0 and 1 with equal probability. From such a (memoryless) random number gen- erator an operator Random_subset(S ) can be derived that produces, upon each invocation, a random subset of a given nite set S , such that all subsets are equally likely (see, e.g., [6]). Building on this we derive in this article an operator that randomly selects a 1 closed set from a given closure system on a nite set. Note that this is a trivial task for moderately sized systems of which you can label the closed sets by numbers 1, . . . , n. For such you could simply randomly pick a number between 1 and n and select the closed set labeled by this number. Since the size of a closure system is at most exponential in the size of its carrier, this trivial algorithm clearly requires polynomial time. However, a potentially exponential list of closed sets must be pre-computed and stored. For example we aim at generating closure systems at random2 . But there are many closure systems, even for small carrier sets. On seven elements the number was recently computed by Colomb, Irlande, and Raynaud [3] to be 14 087 648 235 707 352 472. Maintaining a list of this size is not an inviting idea, and thus the trivial approach is not very realistic. Our motivation comes from recent experimental computer investigations by D. Borchmann that yielded surprising results. Borchmann raised the question if these were artefacts caused by the non-uniform choice of the random input data. Have a look at Figure 1. It shows ve diagrams, each with 27 rows and 13 columns, corresponding to the possible number of meet reducible and irreducible closed sets in a closure system on a ve element set (the trivial system with zero irreducibles is omitted). A system with r reducibles and i irreducibles corresponds 1 That is, from an intersection-closed familiy of sets. Such families are also called Moore families. 2 The family of all closure systems on a xed set is itself a closure system. a b c d e Fig. 1. Closure systems on ve elements by their number of meet-irreducible (horizontal) and reducible closed sets (vertical). Tile a shows the possible values, tile b the true relative frequencies, tile c and d come from random contexts and tile e from picking systems uniformly at random. to the cell in the r -th row from the bottom and the i-th column from the left. The rst diagram depicts which combinations of r and i are possible, while the other four display relative frequencies (the darker, the higher). The second diagram shows the true frequencies, counted over all 1 385 552 closure systems on ve elements. The other three show frequencies of randomly chosen closure systems (1 000 000 samples each). For the diagram in the middle, the systems were made by putting random crosses in a 5×13context. The fourth diagram was obtained by putting random crosses with random density in a formal context with a random number of columns. The fth diagram shows the distribution of a sample picked with uniform distribution. We use the language of Formal Concept Analysis [4] and, in particular, that every closure system is the set Ext(G, M, I) of all extents of some formal context (G, M, I). We construct an operator Random_extent(G, M, I ) which selects, upon each invocation, randomly an extent of (G, M, I), with equal probability for all extents. The closure operator for the extents will be denoted by X 7→ X 00 . 00 If X = Y , then Y is called agenerating set of the extent X (not necessarily a minimal one). The number of generating sets of an extent E shall be denoted by egen(E). We extend this denition to arbitrary subsets so that egen(Y ) := |{X ⊆ G | X 00 = Y 00 }| gives the number of generating sets of the extent generated by Y . Of course then, egen(Y ) = egen(Y 00 ). Therefore if E is an extent then obviously X 1 = 1. egen(Y ) {Y |Y 00 =E} Computing the function egen() is a nontrivial task. We shall discuss this below. Our method could theoretically be applied to many instances, such as generat- ing random partitions, random subgroups, etc. However, its runtime performance is very bad. For most such situations algorithms are known that are much more ecient than what we suggest. Indeed, we do not believe that our method will be very useful in practice. Our contribution is meant as a challenge to come up with a more ecient approach. We are grateful to the referees for several useful hints. We were unaware of the paper by Boley, Gärtner, and Grosskreutz [2], which addresses the same problem, but with a dierent and more general approach. It may well be that their algorithm yields better results even for generating random closure systems. We have also learnt that the problem of generating random extents is known to belong to a (dicult) complexity class: it is equivalent to the #RHΠ1 -hard problem of counting formal concepts (again, see [2] and the references given there). We already knew (because our colleagues of the stochastics group told us so and recommended the book by Asmussen and Glynn [1] as a standard reference) that our approach is an instance of the so-called acceptance-rejection method. 2 Random Extent Our innocent looking algorithm for generating a random extent of a given formal context (G, M, I) goes like this: Algorithm 1 Random_extent: Generating a random extent Input: A formal context (G, M, I). Output: A random extent of (G, M, I) repeat S := Random_subset(G) 1 until Random(0,1] ≤ egen (S) return S 00 . What the algorithm does essentially is to pick a random subset and output 3 its closure with probability one over the number of generating sets . It is quite elementary to prove that it does what it is supposed to do: Proposition 1 The algorithm Random_extent generates extents of (G, M, I) with equal probability. The proposition is an instance of the following lemma from elementary stochas- tics, for which we provide a proof. To obtain the proposition from the lemma, let 3 One of the referees pointed out that a much simpler algorithm with the same number of expected iterations is obtained by replacing the until statement by until S is closed. We see however no straightforward way to a recursive version of this algorithm. A be the set of all subsets of G, let B be the set of all extents, and let f be the map that associates a subset to the extent it generates. Lemma 1 Let f : A → B be a surjective (i.e., onto) map between nite sets A and B and let Random(A) be an operator that picks elements from A with equal probability. Then Algorithm 2 outputs elements of B with equal probability. Algorithm 2 Random image: Random image of a mapping Input: An onto map f : A → B and an operator Random(A) Output: A random element of B repeat a := Random(A) r := Random(0,1] b := f (a); until r ≤ |f −11(b)| return b. Proof It is obvious that the algorithm produces elements of B . In order that a given element b is produced in one iteration of the loop, the element a must belong −1 to f (b) and, independently, r ≤ |f −11(b)| . The probability that this happens is |f −1 (b)| 1 1 · −1 = , |A| |f (b)| |A| independently of b. The probability that some element is selected after one step |B| thus is . The probability that the element b is produced after k steps is |A|  k−1 |B| 1 1− · . |A| |A| The probability that b is produced is ∞  k−1 X |B| 1 |A| 1 1 1− · = · = , k=1 |A| |A| |B| |A| |B| as claimed.  The expected number of iterations until success is |A| #subsets = . |B| #extents The algorithm may therefore need quite some time. For example, would this algo- rithm be applied to the standard context of closure systems to generate a random closure system on a 6-element set, it requires, on average, 121 402 088 iterations 6 of the loop, since that context has 2 − 1 objects and 75 973 751 474 extents ([5]). For closure systems on a seven-element set the average number of loop iterations for obtaining a single random closure system would be 12 077 330 482 260 320 447. As already mentioned we shall develop a better method for this case below. Before we do so, we study the problem of computing the value of egen(A). 3 Counting generating sets and hitting sets The algorithm in the previous section uses the number egen(A) of a given extent A, and that by itself is not easy. Of course, each such generating set must be a subset of A. On the other hand, a subset S ⊆ A is a generating set of A i it is not contained in a lower neighbor of A. It is worthwhile to consider the formal context (A, N , ∈), where N is the family of lower neighbor extents of A. For this context, the elements of N are precisely the maximal extents below A, and thus the generating sets of A are the same as before. Counting generating sets thereby has been reduced to counting generating sets of the unit element in a co-atomistic lattice. Every subset of A is generating set of exactly one extent of (A, N , ∈). The |A| total number of generating sets thus is 2 . Indeed, for every extent B we obtain X egen(E) = 2|B| , E≤B where E runs over extents. By Möbius inversion we obtain X egen(A) = µ(E, A) · 2|E| , E≤A where µ is the Möbius function of the lattice B(A, N , ∈). The evaluation of this formula poses no algorithmic diculties. Using the standard Next_intent algorithm ([4]) to generate the extents in descending order, and using, for every constructed extent E , the same algorithm again for producing all extents F between E and A, suces to compute the Möbius function by the well known recursion X µ(E, A) = − µ(F, A). E 2n−1 ) do if F [i] = 2 then j := 2n−1 − 1 while success and (j > 0) do meet := i and j if (j 6= meet) and (F [meet] 6= 1) then success := (Random(0,1] < 0.5) F [meet] := 1 j := j − 1 i := i − 1 until success return F . We have implemented Algorithm 5 for n = 7 and present rst experimental results. Note that the number of random samples produced by this experiment is small compared to the number of closure systems: we have generated less than 0.000 000 000 000 4% of all closure systems on seven points. 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 5 10 15 20 25 30 35 5 10 15 20 25 30 35 Fig. 2. A random sample of 50 000 closure systems on a seven element set, plotted according to their number of irreducible closed sets (horizontal) and reducible closed sets (vertical). The left image shows which sizes occurred at least once. The right image expresses higher frequencies by darker shadings. The computation took one night on a 1.4 GHz PC. We did not even attempt to generate random closure systems on eight elements using Algorithm 5. We believe that a substantially better idea is needed for that case and beyond. References 1. S. Asmussen and P. W. Glynn. Stochastic Simulation. Springer-Verlag, New York, 2007. 2. Mario Boley, Henrik Grosskreutz, and Thomas Gärtner. Formal concept sampling for counting and thresholdfree local pattern mining. In Proc. of the SIAM Int. Conf. on Data Mining (SDM 2010). SIAM, 2010. 3. Pierre Colomb, Alexis Irlande, and Olivier Raynaud. Counting of Moore families for n = 7. In ICFCA'10, pages 7287, 2010. 4. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis - mathematical foundations. Springer Verlag, 1999. 5. Michel Habib and Lhouari Nourine. The number of Moore families on n = 6. Discrete Mathematics, pages 291296, 2005. 6. Albert Nijenhuis and Herbert S. Wilf. Combinatorial algorithms. Academic Press, 1975.