=Paper= {{Paper |id=None |storemode=property |title=Random Extents and Random Closure Systems |pdfUrl=https://ceur-ws.org/Vol-959/paper21.pdf |volume=Vol-959 |dblpUrl=https://dblp.org/rec/conf/cla/Ganter11 }} ==Random Extents and Random Closure Systems== https://ceur-ws.org/Vol-959/paper21.pdf
                               Random extents
                         and random closure systems
                                           Bernhard Ganter

                                            Institut für Algebra
                                      Technische Universität Dresden




         Abstract. We discuss how to randomly generate extents of a given formal context. Our
         basic method involves counting the generating sets of an extent, and we show how this
         can be done using the Möbius function. We then show how to generate closure systems
         on seven elements uniformly at random.




1       Introduction
Let Random(0,1] denote an operator that generates a random number between
0 and 1 with equal probability. From such a (memoryless) random number gen-
erator an operator Random_subset(S ) can be derived that produces, upon
each invocation, a random subset of a given nite set S , such that all subsets are
equally likely (see, e.g., [6]).
      Building on this we derive in this article an operator that randomly selects a
                                                    1
closed set from a given closure system                  on a nite set.
      Note that this is a trivial task for moderately sized systems of which you can
label the closed sets by numbers 1, . . . , n. For such you could simply randomly
pick a number between 1 and n and select the closed set labeled by this number.
Since the size of a closure system is at most exponential in the size of its carrier,
this trivial algorithm clearly requires polynomial time. However, a potentially
exponential list of closed sets must be pre-computed and stored.
      For example we aim at generating                  closure systems at random2 . But there
are many closure systems, even for small carrier sets. On seven elements the
number was recently computed by Colomb, Irlande, and Raynaud [3] to be
14 087 648 235 707 352 472. Maintaining a list of this size is not an inviting
idea, and thus the trivial approach is not very realistic.
      Our motivation comes from recent experimental computer investigations by
D. Borchmann that yielded surprising results. Borchmann raised the question if
these were artefacts caused by the non-uniform choice of the random input data.
      Have a look at Figure 1. It shows ve diagrams, each with 27 rows and 13
columns, corresponding to the possible number of meet reducible and irreducible
closed sets in a closure system on a ve element set (the trivial system with zero
irreducibles is omitted). A system with r reducibles and i irreducibles corresponds

 1
     That is, from an intersection-closed familiy of sets. Such families are also called Moore families.
 2
     The family of all closure systems on a xed set is itself a closure system.
                 a                     b                      c                    d                     e




Fig. 1. Closure systems on ve elements by their number of meet-irreducible (horizontal) and reducible
closed sets (vertical). Tile a shows the possible values, tile b the true relative frequencies, tile c and d
come from random contexts and tile e from picking systems uniformly at random.




to the cell in the r -th row from the bottom and the i-th column from the left. The
rst diagram depicts which combinations of r and i are possible, while the other
four display relative frequencies (the darker, the higher). The second diagram
shows the true frequencies, counted over all 1 385 552 closure systems on ve
elements. The other three show frequencies of randomly chosen closure systems
(1 000 000 samples each). For the diagram in the middle, the systems were made
by putting random crosses in a 5×13context. The fourth diagram was obtained
by putting random crosses with random density in a formal context with a
random number of columns. The fth diagram shows the distribution of a sample
picked with uniform distribution.
    We use the language of Formal Concept Analysis [4] and, in particular, that
every closure system is the set Ext(G, M, I) of all extents of some formal context
(G, M, I). We construct an operator Random_extent(G, M, I ) which selects,
upon each invocation, randomly an extent of (G, M, I), with equal probability
for all extents.
    The closure operator for the extents will be denoted by


                                                X 7→ X 00 .
        00
If X = Y , then Y is called agenerating set of the extent X (not necessarily a
minimal one). The number of generating sets of an extent E shall be denoted by
egen(E). We extend this denition to arbitrary subsets so that

                               egen(Y ) := |{X ⊆ G | X 00 = Y 00 }|

gives the number of generating sets of the extent generated by Y . Of course then,
egen(Y ) = egen(Y 00 ). Therefore if E is an extent then obviously
                                           X             1
                                                               = 1.
                                                      egen(Y )
                                       {Y |Y 00 =E}
Computing the function egen() is a nontrivial task. We shall discuss this below.
     Our method could theoretically be applied to many instances, such as generat-
ing random partitions, random subgroups, etc. However, its runtime performance
is very bad. For most such situations algorithms are known that are much more
ecient than what we suggest. Indeed, we do not believe that our method will
be very useful in practice. Our contribution is meant as a challenge to come up
with a more ecient approach.
     We are grateful to the referees for several useful hints. We were unaware
of the paper by Boley, Gärtner, and Grosskreutz [2], which addresses the same
problem, but with a dierent and more general approach. It may well be that their
algorithm yields better results even for generating random closure systems. We
have also learnt that the problem of generating random extents is known to belong
to a (dicult) complexity class: it is equivalent to the #RHΠ1 -hard problem
of counting formal concepts (again, see [2] and the references given there). We
already knew (because our colleagues of the stochastics group told us so and
recommended the book by Asmussen and Glynn [1] as a standard reference) that
our approach is an instance of the so-called             acceptance-rejection method.

2      Random Extent
Our innocent looking algorithm for generating a random extent of a given formal
context (G, M, I) goes like this:



Algorithm 1 Random_extent: Generating a random extent
Input: A formal context (G, M, I).
Output: A random extent of (G, M, I)
  repeat
      S := Random_subset(G)
                           1
    until Random(0,1] ≤ egen (S)
    return S 00 .


     What the algorithm does essentially is to pick a random subset and output
                                                                                       3
its closure with probability one over the number of generating sets . It is quite
elementary to prove that it does what it is supposed to do:

Proposition 1 The algorithm Random_extent generates extents of (G, M, I)
with equal probability.
     The proposition is an instance of the following lemma from elementary stochas-
tics, for which we provide a proof. To obtain the proposition from the lemma, let

3
    One of the referees pointed out that a much simpler algorithm with the same number of expected
    iterations is obtained by replacing the until statement by until S is closed. We see however no
    straightforward way to a recursive version of this algorithm.
A be the set of all subsets of G, let B be the set of all extents, and let f be the
map that associates a subset to the extent it generates.

Lemma 1 Let f : A → B be a surjective (i.e., onto) map between nite sets A
and B and let Random(A) be an operator that picks elements from A with equal
probability. Then Algorithm 2 outputs elements of B with equal probability.

Algorithm 2 Random image: Random image of a mapping
Input: An onto map f : A → B and an operator Random(A)
Output: A random element of B
  repeat
      a := Random(A)
      r := Random(0,1]
        b := f (a);
 until r ≤ |f −11(b)|
 return b.



Proof    It is obvious that the algorithm produces elements of B . In order that a
given element b is produced in one iteration of the loop, the element a must belong
     −1
to f    (b) and, independently, r ≤ |f −11(b)| . The probability that this happens is

                                |f −1 (b)|     1       1
                                           · −1     =     ,
                                   |A|      |f (b)|   |A|
independently of b. The probability that some element is selected after one step
        |B|
thus is     . The probability that the element b is produced after k steps is
        |A|
                                         k−1
                                      |B|         1
                                   1−          ·     .
                                      |A|        |A|
The probability that b is produced is

                         ∞        k−1
                        X      |B|         1    |A| 1      1
                            1−          ·     =    ·    =     ,
                        k=1
                               |A|        |A|   |B| |A|   |B|

as claimed.                                                                        
The expected number of iterations until success is

                                    |A|   #subsets
                                        =          .
                                    |B|   #extents
The algorithm may therefore need quite some time. For example, would this algo-
rithm be applied to the standard context of closure systems to generate a random
closure system on a 6-element set, it requires, on average, 121 402 088 iterations
                                     6
of the loop, since that context has 2 − 1 objects and 75 973 751 474 extents ([5]).
For closure systems on a seven-element set the average number of loop iterations
for obtaining a single random closure system would be 12 077 330 482 260 320 447.
As already mentioned we shall develop a better method for this case below. Before
we do so, we study the problem of computing the value of egen(A).



3    Counting generating sets and hitting sets
The algorithm in the previous section uses the number egen(A) of a given extent
A, and that by itself is not easy. Of course, each such generating set must be a
subset of A. On the other hand, a subset S ⊆ A is a generating set of A i it is
not contained in a lower neighbor of A. It is worthwhile to consider the formal
context
                                     (A, N , ∈),
where  N is the family of lower neighbor extents of A. For this context, the
elements of N are precisely the maximal extents below A, and thus the generating
sets of A are the same as before. Counting generating sets thereby has been
reduced to counting generating sets of the unit element in a co-atomistic lattice.
   Every subset of A is generating set of exactly one extent of (A, N , ∈). The
                                         |A|
total number of generating sets thus is 2 . Indeed, for every extent B we obtain

                               X
                                     egen(E) = 2|B| ,
                               E≤B


where E runs over extents. By Möbius inversion we obtain

                                       X
                          egen(A) =          µ(E, A) · 2|E| ,
                                       E≤A


where µ is the Möbius function of the lattice B(A, N , ∈).
    The evaluation of this formula poses no algorithmic diculties. Using the
standard Next_intent algorithm ([4]) to generate the extents in descending
order, and using, for every constructed extent E , the same algorithm again for
producing all extents F between E and A, suces to compute the Möbius function
by the well known recursion

                                             X
                           µ(E, A) = −             µ(F, A).
                                         E 2n−1 ) do
           if F [i] = 2 then
                 j := 2n−1 − 1
                 while success and (j > 0) do
                      meet := i and j
                      if (j 6= meet) and (F [meet] 6= 1) then
                           success := (Random(0,1] < 0.5)
                           F [meet] := 1
                      j := j − 1
           i := i − 1
 until success
 return F .
    We have implemented Algorithm 5 for n = 7 and present rst experimental
results. Note that the number of random samples produced by this experiment
is small compared to the number of closure systems: we have generated less than
0.000 000 000 000 4% of all closure systems on seven points.


            85                                           85
            80                                           80
            75                                           75
            70                                           70
            65                                           65
            60                                           60
            55                                           55
            50                                           50
            45                                           45
            40                                           40
            35                                           35
            30                                           30
            25                                           25
            20                                           20
            15                                           15
            10                                           10
             5                                            5

                   5   10 15 20 25 30 35                        5   10 15 20 25 30 35

Fig. 2. A random sample of 50 000 closure systems on a seven element set, plotted according to their
number of irreducible closed sets (horizontal) and reducible closed sets (vertical). The left image shows
which sizes occurred at least once. The right image expresses higher frequencies by darker shadings.




    The computation took one night on a 1.4 GHz PC. We did not even attempt
to generate random closure systems on eight elements using Algorithm 5. We
believe that a substantially better idea is needed for that case and beyond.



References
1. S. Asmussen and P. W. Glynn. Stochastic Simulation. Springer-Verlag, New York, 2007.
2. Mario Boley, Henrik Grosskreutz, and Thomas Gärtner. Formal concept sampling for counting and
   thresholdfree local pattern mining. In Proc. of the SIAM Int. Conf. on Data Mining (SDM 2010).
   SIAM, 2010.
3. Pierre Colomb, Alexis Irlande, and Olivier Raynaud.        Counting of Moore families for n = 7.   In
   ICFCA'10, pages 7287, 2010.
4. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis - mathematical foundations. Springer
   Verlag, 1999.
5. Michel Habib and Lhouari Nourine. The number of Moore families on n = 6. Discrete Mathematics,
   pages 291296, 2005.
6. Albert Nijenhuis and Herbert S. Wilf. Combinatorial algorithms. Academic Press, 1975.