A Markov Chain Approach to Random
             Generation of Formal Concepts

                            Dmitry V. Vinogradov1,2
      1
       All-Russia Institute for Scientific and Technical Information (VINITI),
       Intelligent Information Systems Laboratory, Moscow 125190, Russia
                                   vin@viniti.ru
    2
      Russian State University for Humanities, Intelligent Robotics Laboratory,
                              Moscow 125993, Russia
         http://isdwiki.rsuh.ru/index.php/User:Vinogradov.Dmitry


      Abstract. A Monte Carlo approach using Markov Chains for random
      generation of concepts of a finite context is proposed. An algorithm sim-
      ilar to CbO is used. We discuss three Markov chains: non-monotonic,
      monotonic, and coupling ones. The coupling algorithm terminates with
      probability 1. These algorithms can be used for solving various informa-
      tion retrieval tasks in large datasets.

      Keywords: formal context, formal concept, Markov chain, termination
      with probability 1


1   Introduction
In many natural problems of information retrieval the piece of information which
we are looking for is not contained in few documents. The query generates a
huge amount of relevant documents and the task is to generates a cluster of
related documents together with the set of common terms describing its common
meaning. There are many choices for such cluster. So, the user has to look for
plausible answers to his query.
     JSM-method is a logical device to provide plausible reasoning for generation,
verification and falsification of such clusters and for explanation of whole collec-
tion of all relevant documents by means of accepted clusters. The first variant
of JSM method was presented by Prof. V.K. Finn in 1983 [1] (in Russian). FCA
corresponds to the generation (“induction”) step of JSM method. See [5] for
details. This correspondence allows to use FCA algorithms in JSM-method and
vice versa. For example, the well-known algorithm “Close-by-One” (CbO) was
initially introduced by S.O. Kuznetsov in [4] for JSM-method and later trans-
lated into FCA framework. The state of art for JSM-method is represented in
[2].
     In our opinion, the main drawback of ’old-fashioned’ JSM-method is the
computational complexity of JSM algorithms, especially for the induction step.
Paper [7] presents a results of comparison between various (partially improved
by the survey’s authors) variants of famous deterministic algorithms of FCA.
128      Dmitry V. Vinogradov

Paper [6] provides theoretical bounds on computational complexities of various
JSM tasks.
   The development of JSM-method has resulted in intelligent systems of JSM
type that were applied in various domains such as sociology, pharmacology,
medicine, information retrieval, etc. In practice there were situations when a
JSM system generates more than 10,000 formal concepts (JSM similarities) from
a context with about 100 objects. In our opinion the importance of all gener-
ated concepts is doubtful, because when experts manually select important JSM
causes they reject majority of generated JSM similarities.
   In this paper we propose Monte Carlo algorithms using Markov Chain ap-
proach for random generation of concepts of a finite context. In other words, we
replace the lattice of all concepts by small number of its random elements.


2     Background

2.1    Basic definitions and facts of FCA

Here we recall some basic definitions and facts of Formal Concept Analysis
(FCA) [3].
   A (finite) context is a triple (G, M, I) where G and M are finite sets and
I ⊆ G × M . The elements of G and M are called objects and attributes,
respectively. As usual, we write gIm instead of hg, mi ∈ I to denote that object
g has attribute m.
   For A ⊆ G and B ⊆ M , define

                                A0 = {m ∈ M |∀g ∈ A(gIm)},                                        (1)
                                  0
                               B = {g ∈ G|∀m ∈ B(gIm)};                                           (2)

so A0 is the set of attributes common to all the objects in A and B 0 is the set of
objects possesing all the attributes in B. The maps (·)0 : A 7→ A0 and (·)0 : B 7→
B 0 are called derivation operators (polars) of the context (G, M, I).
    A concept of the context (G, M, I) is defined to be a pair (A, B), where
A ⊆ G, B ⊆ M , A0 = B, and B 0 = A. The first component A of the concept
(A, B) is called the extent of the concept, and the second component B is
called its intent. The set of all concepts of the context (G, M, I) is denoted by
B(G, M, I).

Example 1 (Boolean cube with n atoms). Consider the context (G, M, I), where
G = {g1 , . . . , gn }, M = {m1 , . . . , mn }, and

                                        gj Imk ⇔ j 6= k.                                          (3)

Then ({gj1 , . . . , gjk })0 = M \{mj1 , . . . , mjk }, ({mj1 , . . . , mjk })0 = G\{gj1 , . . . , gjk },
A00 = A for all A ⊆ G, and B 00 = B for all B ⊆ M . Hence B(G, M, I) has element
({gj1 , . . . , gjk }, M \ {mj1 , . . . , mjk }) for every {gj1 , . . . , gjk } ⊆ G.
      A Markov Chain Approach to Random Generation of Formal Concepts             129

   Let (G, M, I) be a context. For concepts (A1 , B1 ) and (A2 , B2 ) in B(G, M, I)
we write (A1 , B1 ) ≤ (A2 , B2 ), if A1 ⊆ A2 . The relation ≤ is a partial order on
B(G, M, I).
   A subset A ⊆ G is the extent of some concept if and only if A00 = A in which
case the unique concept of which A is the extent is (A, A0 ). Similarly, a subset
B of M is the intent of some concept if and only if B 00 = B and then the unique
concept with intent B is (B 0 , B).
   It is easy to check that A1 ⊆ A2 implies A01 ⊇ A02 and for concepts (A1 , A01 )
and (A2 , A02 ) reverse implication is valid too, because A1 = A001 ⊆ A002 = A2 .
Hence, for (A1 , B1 ) and (A2 , B2 ) in B(G, M, I)

                  (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 ⇔ B2 ⊆ B1 .                    (4)

   Fix a context (G, M, I). In the following, let J be an index set. We assume
that Aj ⊆ G and Bj ⊆ M , for all j ∈ J.

Lemma 1. [3] Assume that (G, M, I) is a context and let A ⊆ G, B ⊆ M and
Aj ⊆ G and Bj ⊆ M , for all j ∈ J. Then

                                 A ⊆ A00    and B ⊆ B 00 ,                         (5)
               A1 ⊆ A2 ⇒ A01 ⊇ A02          and B1 ⊆ B2 ⇒ B10 ⊇ B20 ,              (6)
                          0     000                        0          000
                              A =A          and B = B ,                            (7)
                     [          \                 [         \
                            0
                 (       Aj ) =   A0j       and (   Bj )0 =   Bj0 ,                (8)
                  j∈J              j∈J                    j∈J               j∈J

                                   A ⊆ B 0 ⇔ A0 ⊇ B.                               (9)

Proposition 1. [3] Let (G, M, I) be a context. Then (B(G, M, I), ≤) is a lattice
with join and meet given by
                       _                 [          \
                         (Aj , Bj ) = ((   Aj )00 ,   Bj ),                (10)
                          j∈J                    j∈J            j∈J
                          ^                      \              [
                                (Aj , Bj ) = (         Aj , (        Bj )00 );    (11)
                          j∈J                    j∈J           j∈J


Corollary 1. For context (G, M, I) the lattice (B(G, M, I), ≤) has (M 0 , M ) as
the bottom element and (G, G0 ) as the top element. In other words, for all
(A, B) ∈ B(G, M, I) the following inequalities hold:

                            (M 0 , M ) ≤ (A, B) ≤ (G, G0 ).                       (12)


2.2   The “Close-by-One” operations: definition and properties

With the help of expressions for the infinum and the supremum operations in
B(G, M, I) given by Proposition 1 we can introduce local steps of our Markov
chains:
130     Dmitry V. Vinogradov

Definition 1. For (A, B) ∈ B(G, M, I), g ∈ G, and m ∈ M define

                    CbO((A, B), g) = ((A ∪ {g})00 , B ∩ {g}0 ),                (13)
                                                 0            00
                   CbO((A, B), m) = (A ∩ {m} , (B ∪ {m}) ).                    (14)

so CbO((A, B), g) is equal to (A, B) ∨ ({g}00 , {g}0 ) and CbO((A, B), m) is equal
to (A, B) ∧ ({m}0 , {m}00 ).
   We call these operations CbO because the first one is used in Close-by-One
(CbO) Algorithm to generate all the elements of B(G, M, I), see [4] for details.
Lemma 2. Assume that (G, M, I) is a context and let (A, B) ∈ B(G, M, I),
g ∈ G, and m ∈ M . Then

                       g ∈ A ⇒ CbO((A, B), g) = (A, B),                        (15)
                      m ∈ B ⇒ CbO((A, B), m) = (A, B),                         (16)
                       g∈
                        / A ⇒ (A, B) < CbO((A, B), g),                         (17)
                      m∈
                       / B ⇒ CbO((A, B), m) < (A, B).                          (18)

            / A then A ⊂ A ∪ {g} ⊆ (A ∪ {g})00 by (5). By definition of the order
Proof. If g ∈
between concepts this inclusion and (13) imply (17). Relation (18) is proved in
the same way, the rest is obvious.
Lemma 3. Assume that (G, M, I) is a context and let (A1 , B1 ), (A2 , B2 ) ∈
B(G, M, I), g ∈ G, and m ∈ M . Then

         (A1 , B1 ) ≤ (A2 , B2 ) ⇒ CbO((A1 , B1 ), g) ≤ CbO((A2 , B2 ), g),    (19)
        (A1 , B1 ) ≤ (A2 , B2 ) ⇒ CbO((A1 , B1 ), m) ≤ CbO((A2 , B2 ), m).     (20)

Proof. If A1 ⊆ A2 then A1 ∪ {g} ⊆ A2 ∪ {g}. Hence (6) implies (A2 ∪ {g})0 ⊆
(A1 ∪ {g})0 . Second part of (6) implies (A1 ∪ {g})00 ⊆ (A2 ∪ {g})00 . By definition
of the order between concepts this is (19). Relation (20) is proved in the same
way by using (4).


3     Markov Chain Algorithms
Now we represent Markov chain algorithms for random generation of formal
concepts.
   Data: context (G, M, I), external function CbO( , )
   Result: random concept (A, B) ∈ B(G, M, I)
   t := 0; (A, B) := (M 0 , M );
   while (t < T ) do
       select random element x ∈ (G \ A) t (M \ B);
       (A, B) := CbO((A, B), x);
       t := t + 1;
   end
                 Algorithm 1: Non-monotonic Markov chain
     A Markov Chain Approach to Random Generation of Formal Concepts         131

Example 2 (Random walk on Boolean cube). Consider the context (G, M, I) of
Example 1. Then Non-monotonic Markov chain corresponds to Random Walk
on Boolean Cube of all the concepts in B(G, M, I).

Definition 2. An (order) ideal of partially odered set (poset) (S, ≤) is a subset
J of S such that
                      ∀s ∈ S ∀r ∈ J[s ≤ r ⇒ s ∈ J].                          (21)
A Markov chain St with values into poset (S, ≤) is called monotonic if for
every pair of start states a ≤ b (a, b ∈ S) and every order ideal J ⊆ S

                     P[S1 ∈ J|S0 = a] ≥ P[S1 ∈ J|S0 = b].                   (22)

Proposition 2. There exists the context (G, M, I) such that Non-monotonic
Markov chain for (B(G, M, I), ≤) isn’t monotonic one.

  See [8] for the proof of Proposition 2. The following Markov chain is always
monotonic one.
  Data: context (G, M, I), external function CbO( , )
  Result: random concept (AT , BT ) ∈ B(G, M, I)
  t := 0; X := G t M ; (A, B) := (M 0 , M );
  while (t < T ) do
      select random element x ∈ X;
      (A, B) := CbO((A, B), x);
      t := t + 1;
  end
                  Algorithm 2: Monotonic Markov chain

Proposition 3. For every context (G, M, I) the Monotonic Markov chain for
(B(G, M, I), ≤) is monotonic one.

   The proof of Proposition 3 can be found in [8].The Monotonic Markov chain
algorithm has another advantage: random selection of elements of the static set
X := G t M . However both previous algorithms have common drawback: the
unknown value T for the termination time of the calculation. In the Monte Carlo
Markov Chain (MCMC) theory this value corresponds to the mixing time of the
chain. For some special Markov chains (for instance, for the chain of Example 2)
the mixing time is estimated by sophisticated methods. In general case, it is an
open problem. The following algorithm has not this problem at all.
   Data: context (G, M, I), external function CbO( , )
   Result: random concept (A, B) ∈ B(G, M, I)
   X := G t M ; (A, B) := (M 0 , M ); (C, D) = (G, G0 );
   while ((A 6= C) ∨ (B 6= D)) do
       select random element x ∈ X;
       (A, B) := CbO((A, B), x); (C, D) := CbO((C, D), x);
   end
                   Algorithm 3: Coupling Markov chain
132    Dmitry V. Vinogradov

   Intermediate value of quadruple (A, B) ≤ (C, D) on step t corresponds to
Markov chain state Yt . The At , Bt , Ct and Dt are first, second, third, and
fourth components, respectively.
Definition 3. A coupling length for context (G, M, I) is defined by

                              L = min(| G |, | M |).                         (23)

A choice probability of fixed object or attribute in context (G, M, I) is equal to
                                       1
                               p=            .                               (24)
                                    |G|+|M |
Lemma 4. If | G |<| M | then for every integer r and every pair of start states
(A, B) ≤ (C, D) ((A, B), (C, D) ∈ B(G, M, I))

 P[Ar = Ar+L = Cr+L &Br = Br+L = Dr+L |Yr = (A, B) ≤ (C, D)] ≥ pL . (25)

If | G |≥| M | then for every integer r and every pair of start states (A, B) ≤
(C, D) ((A, B), (C, D) ∈ B(G, M, I))

 P[Ar+L = Cr+L = Cr &Br+L = Dr+L = Dr |Yr = (A, B) ≤ (C, D)] ≥ pL . (26)

Proof. For coupling to (A, B) ≤ (A, B) it suffices to get an element from A \ C,
but | A \ C |≤| G |= L. Relation (26) is proved in a similar way.              t
                                                                               u
Theorem 1. The coupling Markov chain has the probability of coupling (termi-
nation) before n steps with limit 1 when n → ∞.
Proof. Lemma 3 implies that (A(k−1)·L , B(k−1)·L ) ≤ (C(k−1)·L , D(k−1)·L ). Let
r = (k − 1) · L. Then Lemma 4 implies that P[Ak·L 6= Ck·L ∨ Bk·L 6= Dk·L |Yr =
(Ar , Br ) ≤ (Cr , Dr )] ≤ (1 − pL ). After k independent repetitions we have
P[Ak·L 6= Ck·L ∨ Bk·L 6= Dk·L |Y0 = (M 0 , M ) ≤ (G, G0 )] ≤ (1 − pL )k . But
when k → ∞ we have (1 − pL )k → 0.                                            t
                                                                              u


Conclusions
In this paper we have described a Monte Carlo approach using Markov Chains
for random generation of concepts of a finite context. The basic steps of proposed
Markov chains are similar to ones of algorithm CbO. We discuss three Markov
chains: non-monotonic, monotonic, and coupling ones. The coupling algorithm
terminates with probability 1.


Acknowledgements.
The author would like to thank Prof. Victor K. Finn and Tatyana A. Volkova
for helpful discussions. The research was supported by Russian Foundation for
Basic Research (project 11-07-00618a) and Presidium of the Russian Academy
of Science (Fundamental Research Program 2012-2013).
      A Markov Chain Approach to Random Generation of Formal Concepts             133

References
1. V.K. Finn, About Machine-Oriented Formalization of Plausible Reasonings in F.
   Beckon-J.S. Mill Style. Semiotika I Informatika, 20, 35–101 (1983)
2. V.K. Finn, The Synthesis of Cognitive Procedures and the Problem of Induction.
   Autom. Doc. Math. Linguist., 43, 149–195 (2009)
3. B. Ganter, R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer-
   Verlag, 1999
4. S.O. Kuznetsov, A Fast Algorithm for Computing All Intersections of Objects in
   a Finite Semi-Lattice, Automatic Documentation and Mathematical Linguistics,
   vol.27, no.5, 11-21, 1993.
5. S.O. Kuznetsov, Mathematical aspects of concept analysis. Journal of Mathematical
   Science, Vol. 80, Issue 2, pp. 1654-1698, 1996.
6. S.O. Kuznetsov, Complexity of Learning in Concept Lattices from Positive and
   Negative Examples. Discrete Applied Mathematics, 2004, no. 142(1-3), pp. 111-125.
7. S.O. Kuznetsov and S.A. Obiedkov, Comparing Performance of Algorithms for Gen-
   erating Concept Lattices. Journal of Experimental and Theoretical Artificial Intel-
   ligence, Vol. 14, no. 2-3, pp. 189-216, 2002.
8. D.V. Vinogradov, Random generation of hypotheses in the JSM method using sim-
   ple Markov chains. Autom. Doc. Math. Linguist., 46, 221–228 (2012)