A Markov Chain Approach to Random Generation of Formal Concepts Dmitry V. Vinogradov1,2 1 All-Russia Institute for Scientific and Technical Information (VINITI), Intelligent Information Systems Laboratory, Moscow 125190, Russia vin@viniti.ru 2 Russian State University for Humanities, Intelligent Robotics Laboratory, Moscow 125993, Russia http://isdwiki.rsuh.ru/index.php/User:Vinogradov.Dmitry Abstract. A Monte Carlo approach using Markov Chains for random generation of concepts of a finite context is proposed. An algorithm sim- ilar to CbO is used. We discuss three Markov chains: non-monotonic, monotonic, and coupling ones. The coupling algorithm terminates with probability 1. These algorithms can be used for solving various informa- tion retrieval tasks in large datasets. Keywords: formal context, formal concept, Markov chain, termination with probability 1 1 Introduction In many natural problems of information retrieval the piece of information which we are looking for is not contained in few documents. The query generates a huge amount of relevant documents and the task is to generates a cluster of related documents together with the set of common terms describing its common meaning. There are many choices for such cluster. So, the user has to look for plausible answers to his query. JSM-method is a logical device to provide plausible reasoning for generation, verification and falsification of such clusters and for explanation of whole collec- tion of all relevant documents by means of accepted clusters. The first variant of JSM method was presented by Prof. V.K. Finn in 1983 [1] (in Russian). FCA corresponds to the generation (“induction”) step of JSM method. See [5] for details. This correspondence allows to use FCA algorithms in JSM-method and vice versa. For example, the well-known algorithm “Close-by-One” (CbO) was initially introduced by S.O. Kuznetsov in [4] for JSM-method and later trans- lated into FCA framework. The state of art for JSM-method is represented in [2]. In our opinion, the main drawback of ’old-fashioned’ JSM-method is the computational complexity of JSM algorithms, especially for the induction step. Paper [7] presents a results of comparison between various (partially improved by the survey’s authors) variants of famous deterministic algorithms of FCA. 128 Dmitry V. Vinogradov Paper [6] provides theoretical bounds on computational complexities of various JSM tasks. The development of JSM-method has resulted in intelligent systems of JSM type that were applied in various domains such as sociology, pharmacology, medicine, information retrieval, etc. In practice there were situations when a JSM system generates more than 10,000 formal concepts (JSM similarities) from a context with about 100 objects. In our opinion the importance of all gener- ated concepts is doubtful, because when experts manually select important JSM causes they reject majority of generated JSM similarities. In this paper we propose Monte Carlo algorithms using Markov Chain ap- proach for random generation of concepts of a finite context. In other words, we replace the lattice of all concepts by small number of its random elements. 2 Background 2.1 Basic definitions and facts of FCA Here we recall some basic definitions and facts of Formal Concept Analysis (FCA) [3]. A (finite) context is a triple (G, M, I) where G and M are finite sets and I ⊆ G × M . The elements of G and M are called objects and attributes, respectively. As usual, we write gIm instead of hg, mi ∈ I to denote that object g has attribute m. For A ⊆ G and B ⊆ M , define A0 = {m ∈ M |∀g ∈ A(gIm)}, (1) 0 B = {g ∈ G|∀m ∈ B(gIm)}; (2) so A0 is the set of attributes common to all the objects in A and B 0 is the set of objects possesing all the attributes in B. The maps (·)0 : A 7→ A0 and (·)0 : B 7→ B 0 are called derivation operators (polars) of the context (G, M, I). A concept of the context (G, M, I) is defined to be a pair (A, B), where A ⊆ G, B ⊆ M , A0 = B, and B 0 = A. The first component A of the concept (A, B) is called the extent of the concept, and the second component B is called its intent. The set of all concepts of the context (G, M, I) is denoted by B(G, M, I). Example 1 (Boolean cube with n atoms). Consider the context (G, M, I), where G = {g1 , . . . , gn }, M = {m1 , . . . , mn }, and gj Imk ⇔ j 6= k. (3) Then ({gj1 , . . . , gjk })0 = M \{mj1 , . . . , mjk }, ({mj1 , . . . , mjk })0 = G\{gj1 , . . . , gjk }, A00 = A for all A ⊆ G, and B 00 = B for all B ⊆ M . Hence B(G, M, I) has element ({gj1 , . . . , gjk }, M \ {mj1 , . . . , mjk }) for every {gj1 , . . . , gjk } ⊆ G. A Markov Chain Approach to Random Generation of Formal Concepts 129 Let (G, M, I) be a context. For concepts (A1 , B1 ) and (A2 , B2 ) in B(G, M, I) we write (A1 , B1 ) ≤ (A2 , B2 ), if A1 ⊆ A2 . The relation ≤ is a partial order on B(G, M, I). A subset A ⊆ G is the extent of some concept if and only if A00 = A in which case the unique concept of which A is the extent is (A, A0 ). Similarly, a subset B of M is the intent of some concept if and only if B 00 = B and then the unique concept with intent B is (B 0 , B). It is easy to check that A1 ⊆ A2 implies A01 ⊇ A02 and for concepts (A1 , A01 ) and (A2 , A02 ) reverse implication is valid too, because A1 = A001 ⊆ A002 = A2 . Hence, for (A1 , B1 ) and (A2 , B2 ) in B(G, M, I) (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 ⇔ B2 ⊆ B1 . (4) Fix a context (G, M, I). In the following, let J be an index set. We assume that Aj ⊆ G and Bj ⊆ M , for all j ∈ J. Lemma 1. [3] Assume that (G, M, I) is a context and let A ⊆ G, B ⊆ M and Aj ⊆ G and Bj ⊆ M , for all j ∈ J. Then A ⊆ A00 and B ⊆ B 00 , (5) A1 ⊆ A2 ⇒ A01 ⊇ A02 and B1 ⊆ B2 ⇒ B10 ⊇ B20 , (6) 0 000 0 000 A =A and B = B , (7) [ \ [ \ 0 ( Aj ) = A0j and ( Bj )0 = Bj0 , (8) j∈J j∈J j∈J j∈J A ⊆ B 0 ⇔ A0 ⊇ B. (9) Proposition 1. [3] Let (G, M, I) be a context. Then (B(G, M, I), ≤) is a lattice with join and meet given by _ [ \ (Aj , Bj ) = (( Aj )00 , Bj ), (10) j∈J j∈J j∈J ^ \ [ (Aj , Bj ) = ( Aj , ( Bj )00 ); (11) j∈J j∈J j∈J Corollary 1. For context (G, M, I) the lattice (B(G, M, I), ≤) has (M 0 , M ) as the bottom element and (G, G0 ) as the top element. In other words, for all (A, B) ∈ B(G, M, I) the following inequalities hold: (M 0 , M ) ≤ (A, B) ≤ (G, G0 ). (12) 2.2 The “Close-by-One” operations: definition and properties With the help of expressions for the infinum and the supremum operations in B(G, M, I) given by Proposition 1 we can introduce local steps of our Markov chains: 130 Dmitry V. Vinogradov Definition 1. For (A, B) ∈ B(G, M, I), g ∈ G, and m ∈ M define CbO((A, B), g) = ((A ∪ {g})00 , B ∩ {g}0 ), (13) 0 00 CbO((A, B), m) = (A ∩ {m} , (B ∪ {m}) ). (14) so CbO((A, B), g) is equal to (A, B) ∨ ({g}00 , {g}0 ) and CbO((A, B), m) is equal to (A, B) ∧ ({m}0 , {m}00 ). We call these operations CbO because the first one is used in Close-by-One (CbO) Algorithm to generate all the elements of B(G, M, I), see [4] for details. Lemma 2. Assume that (G, M, I) is a context and let (A, B) ∈ B(G, M, I), g ∈ G, and m ∈ M . Then g ∈ A ⇒ CbO((A, B), g) = (A, B), (15) m ∈ B ⇒ CbO((A, B), m) = (A, B), (16) g∈ / A ⇒ (A, B) < CbO((A, B), g), (17) m∈ / B ⇒ CbO((A, B), m) < (A, B). (18) / A then A ⊂ A ∪ {g} ⊆ (A ∪ {g})00 by (5). By definition of the order Proof. If g ∈ between concepts this inclusion and (13) imply (17). Relation (18) is proved in the same way, the rest is obvious. Lemma 3. Assume that (G, M, I) is a context and let (A1 , B1 ), (A2 , B2 ) ∈ B(G, M, I), g ∈ G, and m ∈ M . Then (A1 , B1 ) ≤ (A2 , B2 ) ⇒ CbO((A1 , B1 ), g) ≤ CbO((A2 , B2 ), g), (19) (A1 , B1 ) ≤ (A2 , B2 ) ⇒ CbO((A1 , B1 ), m) ≤ CbO((A2 , B2 ), m). (20) Proof. If A1 ⊆ A2 then A1 ∪ {g} ⊆ A2 ∪ {g}. Hence (6) implies (A2 ∪ {g})0 ⊆ (A1 ∪ {g})0 . Second part of (6) implies (A1 ∪ {g})00 ⊆ (A2 ∪ {g})00 . By definition of the order between concepts this is (19). Relation (20) is proved in the same way by using (4). 3 Markov Chain Algorithms Now we represent Markov chain algorithms for random generation of formal concepts. Data: context (G, M, I), external function CbO( , ) Result: random concept (A, B) ∈ B(G, M, I) t := 0; (A, B) := (M 0 , M ); while (t < T ) do select random element x ∈ (G \ A) t (M \ B); (A, B) := CbO((A, B), x); t := t + 1; end Algorithm 1: Non-monotonic Markov chain A Markov Chain Approach to Random Generation of Formal Concepts 131 Example 2 (Random walk on Boolean cube). Consider the context (G, M, I) of Example 1. Then Non-monotonic Markov chain corresponds to Random Walk on Boolean Cube of all the concepts in B(G, M, I). Definition 2. An (order) ideal of partially odered set (poset) (S, ≤) is a subset J of S such that ∀s ∈ S ∀r ∈ J[s ≤ r ⇒ s ∈ J]. (21) A Markov chain St with values into poset (S, ≤) is called monotonic if for every pair of start states a ≤ b (a, b ∈ S) and every order ideal J ⊆ S P[S1 ∈ J|S0 = a] ≥ P[S1 ∈ J|S0 = b]. (22) Proposition 2. There exists the context (G, M, I) such that Non-monotonic Markov chain for (B(G, M, I), ≤) isn’t monotonic one. See [8] for the proof of Proposition 2. The following Markov chain is always monotonic one. Data: context (G, M, I), external function CbO( , ) Result: random concept (AT , BT ) ∈ B(G, M, I) t := 0; X := G t M ; (A, B) := (M 0 , M ); while (t < T ) do select random element x ∈ X; (A, B) := CbO((A, B), x); t := t + 1; end Algorithm 2: Monotonic Markov chain Proposition 3. For every context (G, M, I) the Monotonic Markov chain for (B(G, M, I), ≤) is monotonic one. The proof of Proposition 3 can be found in [8].The Monotonic Markov chain algorithm has another advantage: random selection of elements of the static set X := G t M . However both previous algorithms have common drawback: the unknown value T for the termination time of the calculation. In the Monte Carlo Markov Chain (MCMC) theory this value corresponds to the mixing time of the chain. For some special Markov chains (for instance, for the chain of Example 2) the mixing time is estimated by sophisticated methods. In general case, it is an open problem. The following algorithm has not this problem at all. Data: context (G, M, I), external function CbO( , ) Result: random concept (A, B) ∈ B(G, M, I) X := G t M ; (A, B) := (M 0 , M ); (C, D) = (G, G0 ); while ((A 6= C) ∨ (B 6= D)) do select random element x ∈ X; (A, B) := CbO((A, B), x); (C, D) := CbO((C, D), x); end Algorithm 3: Coupling Markov chain 132 Dmitry V. Vinogradov Intermediate value of quadruple (A, B) ≤ (C, D) on step t corresponds to Markov chain state Yt . The At , Bt , Ct and Dt are first, second, third, and fourth components, respectively. Definition 3. A coupling length for context (G, M, I) is defined by L = min(| G |, | M |). (23) A choice probability of fixed object or attribute in context (G, M, I) is equal to 1 p= . (24) |G|+|M | Lemma 4. If | G |<| M | then for every integer r and every pair of start states (A, B) ≤ (C, D) ((A, B), (C, D) ∈ B(G, M, I)) P[Ar = Ar+L = Cr+L &Br = Br+L = Dr+L |Yr = (A, B) ≤ (C, D)] ≥ pL . (25) If | G |≥| M | then for every integer r and every pair of start states (A, B) ≤ (C, D) ((A, B), (C, D) ∈ B(G, M, I)) P[Ar+L = Cr+L = Cr &Br+L = Dr+L = Dr |Yr = (A, B) ≤ (C, D)] ≥ pL . (26) Proof. For coupling to (A, B) ≤ (A, B) it suffices to get an element from A \ C, but | A \ C |≤| G |= L. Relation (26) is proved in a similar way. t u Theorem 1. The coupling Markov chain has the probability of coupling (termi- nation) before n steps with limit 1 when n → ∞. Proof. Lemma 3 implies that (A(k−1)·L , B(k−1)·L ) ≤ (C(k−1)·L , D(k−1)·L ). Let r = (k − 1) · L. Then Lemma 4 implies that P[Ak·L 6= Ck·L ∨ Bk·L 6= Dk·L |Yr = (Ar , Br ) ≤ (Cr , Dr )] ≤ (1 − pL ). After k independent repetitions we have P[Ak·L 6= Ck·L ∨ Bk·L 6= Dk·L |Y0 = (M 0 , M ) ≤ (G, G0 )] ≤ (1 − pL )k . But when k → ∞ we have (1 − pL )k → 0. t u Conclusions In this paper we have described a Monte Carlo approach using Markov Chains for random generation of concepts of a finite context. The basic steps of proposed Markov chains are similar to ones of algorithm CbO. We discuss three Markov chains: non-monotonic, monotonic, and coupling ones. The coupling algorithm terminates with probability 1. Acknowledgements. The author would like to thank Prof. Victor K. Finn and Tatyana A. Volkova for helpful discussions. The research was supported by Russian Foundation for Basic Research (project 11-07-00618a) and Presidium of the Russian Academy of Science (Fundamental Research Program 2012-2013). A Markov Chain Approach to Random Generation of Formal Concepts 133 References 1. V.K. Finn, About Machine-Oriented Formalization of Plausible Reasonings in F. Beckon-J.S. Mill Style. Semiotika I Informatika, 20, 35–101 (1983) 2. V.K. Finn, The Synthesis of Cognitive Procedures and the Problem of Induction. Autom. Doc. Math. Linguist., 43, 149–195 (2009) 3. B. Ganter, R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer- Verlag, 1999 4. S.O. Kuznetsov, A Fast Algorithm for Computing All Intersections of Objects in a Finite Semi-Lattice, Automatic Documentation and Mathematical Linguistics, vol.27, no.5, 11-21, 1993. 5. S.O. Kuznetsov, Mathematical aspects of concept analysis. Journal of Mathematical Science, Vol. 80, Issue 2, pp. 1654-1698, 1996. 6. S.O. Kuznetsov, Complexity of Learning in Concept Lattices from Positive and Negative Examples. Discrete Applied Mathematics, 2004, no. 142(1-3), pp. 111-125. 7. S.O. Kuznetsov and S.A. Obiedkov, Comparing Performance of Algorithms for Gen- erating Concept Lattices. Journal of Experimental and Theoretical Artificial Intel- ligence, Vol. 14, no. 2-3, pp. 189-216, 2002. 8. D.V. Vinogradov, Random generation of hypotheses in the JSM method using sim- ple Markov chains. Autom. Doc. Math. Linguist., 46, 221–228 (2012)