Σοφια: how to make FCA polynomial? Aleksey Buzmakov1,2 , Sergei Kuznetsov2 , and Amedeo Napoli1 1 LORIA (CNRS – Inria NGE – Université de Lorraine), Vandœuvre-lès-Nancy, France 2 National Research University Higher School of Economics, Moscow, Russia aleksey.buzmakov@loria.fr, skuznetsov@hse.ru, amedeo.napoli@loria.fr Abstract. In pattern mining, one of the most important problems is fighting exponential explosion of the set of patterns. A typical solution is generating only a part of all patterns satisfying some criteria. The most well-known criterion is support of a pattern, which has the monotonic- ity property allowing one to generate only frequent (highly supported) patterns. Many other useful criteria are not monotonic, which makes it difficult to generate best patterns efficiently. In this paper we introduce the notion of “generalized monotonicity” and Σοφια algorithm that al- low to generate top patterns in polynomial time modulo basic operations, e.g., measure computation, for criteria that are not monotonic. This ap- proach is applicable not only to itemsets, but to complex descriptions such as sequences, graphs, numbers or interval tuples, etc. In this paper we consider stability and Δ-measures which are not monotonic. In the experiments, we compute top best patterns w.r.t. these measures and obtain very promising results. 1 Introduction To solve the problem of exponential explosion of patterns valid in a dataset many kinds of interestingness measures were proposed [1]. For example, pattern support, i.e., the number of objects covered by the pattern, is one of the most well-known measures of pattern quality. Among others stability of a formal con- cept [2] can be mentioned. Unlike support this measure is not monotonic w.r.t. the order of pattern inclusion and it is hard to generate only most interesting patterns w.r.t. these measures, so one has to find a large set of patterns and then postprocess it, choosing the best ones. Due to the increasing importance of pattern mining, efficient approaches of finding best patterns are appearing. In [3] authors introduce an approach for efficiently searching the most interesting associations w.r.t. lift or leverage of a pattern. Another approach is searching for cosine interesting patterns [4]. The cosine interestingness of a pattern is not a monotonic measure but the authors take advantage of a conditional anti-monotonic property of cosine interestingness to efficiently mine interesting patterns. However, all of the mentioned approaches are not polynomial in the worst case. In this paper we introduce a new algorithm Σοφια (Sofia, for Searching for Optimal Formal Intents Algorithm) for extracting top best patterns of different kinds, i.e., itemsets, string, graph patterns, etc. Σοφια algorithm is applicable to a class of measures, including classical monotonic measures, stability, δ-freeness [5], etc. For itemset mining, our algorithm can find top best patterns w.r.t. a measure from this class in polynomial time, modulo complexity of measure computation. For more complex description the time is polynomial modulo complexity of basic operations (intersecting and testing containment on descriptions, computation of a measure). 2 Preliminaries FCA is a formalism convenient for describing models of itemset mining and knowledge discovery [6]. Here we briefly define pattern structures and the corre- sponding notations [7]. A pattern structure is a triple P = (G, (D, u), δ), where G is a set d of objects, (D, u) is a meet-semilattice of descriptions such that (∀X ⊆ G) X ∈ D and δ : G → D maps an object to a description. The intersection u gives similarity of two descriptions. Let us denote the derivative operators of the Galois connection between 2G and D by (·) (see [7]). A pattern concept of a pattern structure (G, (D, u), δ) is a pair (A, d), where A ⊆ G, called pattern extent and d ∈ D, called pattern intent, such that A = d and d = A. The set of all pattern concepts is partially ordered w.r.t. inclusion on extents, i.e., (A1 , d1 ) ≤ (A2 , d2 ) iff A1 ⊆ A2 (or, equivalently, d2 v d1 ), making a lattice, called pattern lattice. For real datasets, the number of patterns can be large. In order to reduce the most interesting concepts different measures can be used. In this paper we rely on stability [2], which measures the independence of a concept intent w.r.t. randomness in data. Because of limited space we do not discuss this measure in details here. Moreover, since concept stability is hard to compute, we rely on an estimate of concept stability which can be computed in polynomial time for a single concept [8]. The approach proposed in this paper is based on projections introduced for reducing complexity of computing pattern lattices [7]. A projection operator ψ : D → D is an “interior operator”, i.e. it is (1) monotonic (x v y ⇒ ψ(x) v ψ(y)), (2) contractive (ψ(x) v x) and (3) idempotent (ψ(ψ(x)) = ψ(x)). An o-projected pattern structure (projected pattern structure for simplicity) ψ((G, (D, u), δ)) is a pattern structure ψ(P) = (G, (Dψ , uψ ), ψ ◦ δ), where Dψ = ψ(D) = {d ∈ D | ∃d˜ ∈ D : ψ(d) ˜ = d} and ∀x, y ∈ D, x uψ y := ψ(x u y) [9]. Given a projection ψ we say that the fixed set of ψ is the set of all elements from D which are mapped to themselves by ψ. The fixed set of ψ is denoted by ψ(D) = {d ∈ D | ψ(d) = d}. Any element outside of the fixed set of ψ is pruned from the description space. We say that a projection ψ1 is simpler than a projection ψ2 , denoted by ψ1 < ψ2 , if ψ1 (D) ⊂ ψ2 (D), i.e., ψ2 prunes less descriptions than ψ1 . Our algorithm is based on this order on projections. The simpler a projection ψ is, the less patterns we can find in ψ(P), and the less computational efforts one should take. Thus, we compute a set of patterns for a simpler projection, then we remove unpromising patterns and extend our pattern structure and the found patterns to a more detailed projection. This allows to reduce the size of patterns within a simpler projection in order to reduce the computational complexity of more detailed projection. 3 Σοφια Algorithm 3.1 Monotonicity w.r.t. a Projection Our algorithm is based on the projection monotonicity, a new idea introduced in this paper. Many interestingness measures for patterns, e.g., stability, are not monotonic w.r.t. subsumption order on patterns, i.e., given patterns X and Y such that X v Y , and a nonmonotonic measure M, one does not necessarily have M(X) ≥ M(Y ). For instance, support is a monotonic measure w.r.t. pattern order and it allows for efficient generation of patterns with support higher than a threshold [10]. The projection monotonicity is a generalization of standard monotonicity and allows for efficient work with a wider set of interestingness measures. Definition 1. Given a pattern structure P and a projection ψ, a measure M is called monotonic w.r.t. the projection ψ, if (∀p ∈ ψ(P))(∀q ∈ P, ψ(q) = p)Mψ (p) ≥ M(q), (1) where Mψ (p) is the measure M of pattern p computed in ψ(P). Here, for any pattern p of a projected pattern structure we check that a preimage q of p for ψ, e.g. p = ψ(q), has a measure smaller than the measure of p. It should be noticed that a measure M for a pattern p can yield different values if M is computed in P or in ψ(P). Thus we use the notation Mψ for the measure M computed in ψ(P). An important example is given by binary data or formal contexts (G, M, I). In this case, a projection ψm corresponds to the removal of an attribute m ∈ M , i.e., ψm (B) = B ∩ (M \ {m}) for any B ⊆ M . So Definition 1 means that the interestingness of an itemset p w.r.t. a measure M computed in (G, M \ {m}, I \ G×{m}) should be higher than the interestingness of the itemsets p and p∪{m} (the preimages of p for ψm ) w.r.t. the measure M computed in (G, M, I). If the value of a measure for a pattern does not depend on a projection this definition is related to a classical monotonic measure. Indeed, because of contractivity of ψ (ψ(p) v p), for any monotonic measure one has M(ψ(p)) ≥ M(p). Thus, given a measure M monotonic w.r.t. a projection ψ, if p is a pattern such that Mψ (p) < θ, then M(q) < θ for any preimage q of p for ψ. Hence, if, given a pattern p of ψ(P), one can find all patterns q of P such that ψ(q) = p, it is possible to find the patterns of ψ(P) and then to filter them w.r.t. Mψ and a threshold, and finally to compute the preimages of filtered patterns. 3.2 Monotonicity w.r.t. a Chain of Projections However, given just one projection, it can be hard to efficiently discover the patterns, because the projection is either hard to compute or the number of unpromising patterns that can be pruned is not high. Hence we introduce a chain of projections ψ0 < ψ1 < · · · < ψk = 1, where the whole pattern lattice for ψ0 (P) can be easily computed and 1 is the identity projection, i.e., (∀x)1(x) = x. For example, to find frequent itemsets, we typically search for small itemsets and, then, extend them to larger ones. It corresponds to extension to a more detailed projection. Let us discuss what is a chain of projections in the case of a binary context K = (G, M, I) with M = {m1 , · · · , mN }. It can be seen that any subcontext Ks = (G, N, I ∩ G × N ), where N ⊆ M , corresponds to a projection ψ such that ψ(B ⊆ M ) = B ∩ N . If we put Mi = {m1 , · · · , mi }, then we can consider a chain of projections corresponding to the subset of attributes M1 , M2 , · · · , M . The corresponding projections are properly ordered. Now we define the projection monotonicity of M w.r.t. a chain of projections. Definition 2. Given a pattern structure P and a chain of projections ψ0 < ψ1 < · · · < ψk = 1, a measure M is called monotonic w.r.t. the chain of projections if M is monotonic w.r.t. all ψi for 0 ≤ i ≤ k. 3.3 Algorithms Given a measure monotonic w.r.t. a chain of projections, if we are able to find all preimages of any element in the fixed set of ψi that belong to a fixed set of ψi+1 , then we can find all patterns of P with a value of M higher than a given threshold θ. We call this algorithm θ-Σοφια and its pseudocode is given in Fig. 1. In lines 11-12 we find all patterns for ψ0 (P) satisfying the constraint that a value of M is higher than a threshold. Then in lines 13-15 we iteratively extend projections from smaller to bigger ones. The extension is done by constructing the set Pi of preimages of the set Pi−1 (lines 2-5) and then by removing the patterns that do not satisfy the constraint from the set Pi (lines 6-9). The algorithm is sound and complete, because first, when we compute the set of preimages of a pattern p, the pattern p is a preimage of itself (ψ(p) = p) and second, if we remove a pattern p from the set P, then the value M(p) < θ and, hence, the measure value of any preimage of p is less than θ by the projection chain monotonicity of M. The worst-case time complexity of θ-Σοφια algorithm is T(θ-Σοφια) = T(F indP atterns(ψ0 ))+ + k · max |Pi | · (T(P reimages) + T(M)), (2) 0