=Paper= {{Paper |id=Vol-1649/147 |storemode=property |title=Evaluating Association Rules in Boolean Matrix Factorization |pdfUrl=https://ceur-ws.org/Vol-1649/147.pdf |volume=Vol-1649 |authors=Jan Outrata, Martin Trnecka |dblpUrl=https://dblp.org/rec/conf/itat/OutrataT16 }} ==Evaluating Association Rules in Boolean Matrix Factorization== https://ceur-ws.org/Vol-1649/147.pdf
ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 147–154
http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 J. Outrata, M. Trnecka



                     Evaluating Association Rules in Boolean Matrix Factorization

                                                           Jan Outrata and Martin Trnecka*

                                                         Department of Computer Science
                                                   Palacký University Olomouc, Czech Republic
                                               jan.outrata@upol.cz, martin.trnecka@gmail.com

      Abstract: Association rules, or association rule mining,                difference (error) of the Boolean product of A and B from I
      is a well-established and popular method of data mining                 (usualy zero), and the discrete basic problem (DBP) [11],
      and machine learning successfully applied in many dif-                  where the error is sought to be minimal for a prescribed
      ferent areas since mid-nineties. Association rules form                 (fixed) k. The formal definitions of the problems are given
      a ground of the Asso algorithm for discovery of the first               below. For the former problem, greedy approach algo-
      (presumably most important) factors in Boolean matrix                   rithms seem popular. One of the most efficient ones, G RE -
      factorization. In Asso, the confidence parameter of associ-             C ON D, is proposed in [3] (where it is called Algorithm 2).
      ation rules heavily influences the quality of factorization.            Another one, G RE E SS [4], refines G RE C ON D based on a
      However, association rules, in a more general form, appear              deeper insight into the problem. For the latter problem,
      already in GUHA, a knowledge discovery method devel-                    probably the most known (and a basic one) algorithm is
      oped since mid-sixties. In the paper, we evaluate the use               A SSO, proposed in [11], together with a few of its vari-
      of various (other) types of association rules from GUHA                 ants. Other BMF algorithms proposed in the recent data
      in Asso and, from the other side, a possible utilization of             mining literature that can be tailored for either of the two
      (particular) association rules in other Boolean matrix fac-             problems are e.g. H YPER [16] or PA NDA [10].
      torization algorithms not based on the rules. We compare
                                                                                 G RE C ON D (and G RE E SS and their variants) finds fac-
      the quality of factorization produced by the modified algo-
                                                                              tors as maximal rectangles, Boolean matrices whose en-
      rithms with those produced by the original algorithms.
                                                                              tries with 1 form a maximal rectangular area (full of 1s),
                                                                              upon a suitable permutation of rows and columns. This
      1     Introduction                                                      concept comes from the geometric view on BMF and for-
                                                                              mal concept analysis (FCA). The view tells us that finding
      1.1   The problem and algorithms                                        a decomposition of I means finding a coverage of 1s in I
                                                                              by rectangles full of 1s [3, 4] and in FCA maximal rectan-
      Boolean matrix factorization (BMF), called also Boolean                 gles full of 1s correspond to so-called formal concepts [5].
      matrix decomposition, aims to find for a given n × m                    We will use maximal rectangles as factors to describe the
      object-attribute incidence Boolean matrix I a n × k object-             algorithms now. The G RE C ON D algorithm, in its seek for
      factor Boolean matrix A and a k × m factor-attribute                    a factor, starts with the empty set of attributes to which a
      Boolean matrix B such that the Boolean matrix product                   selected attribute with possibly other attributes are repeat-
      (see below) of A and B (approximately) equals I. A de-                  edly added. The constructed set of attributes together with
      composition of I into A and B may be interpreted as a                   the set of all objects having all the attributes determine a
      discovery of k factors exactly or approximately explaining              maximal rectangle (form a formal concept). The selected
      the data. Interpreting the matrices I, A and B as the object-           attribute is such that the rectangle with the attributes added
      attribute, object-factor and factor-attribute incidence ma-             covers as many still uncovered 1 in I as possible. The at-
      trices, respectively, A and B explain I as follows: the ob-             tributes are added repeatedly as long as the number of still
      ject i has the attribute j (the entry Ii j corresponding to the         uncovered 1s in I covered by the rectangle grows (note
      row i and the column j is 1) if and only if there exists fac-           the greedy approach here). Further factors are sought the
      tor l such that l applies to i and j is one of the particular           same way and the algorithm stops when the prescribed
      manifestations of l. The optimization version of this ba-               number of 1s in I is covered by the maximal rectangles
      sic BMF problem demands k to be as small as possible.                   corresponding to factors (i.e. the algorithm is designed for
      The least k for which an exact decomposition of I exists is             the AFP). Characteristic vectors of object sets determin-
      called the Boolean rank (or Schein rank) of I.                          ing found maximal rectangles then constitute columns of
         In the literature, two common variants of the optimiza-              matrix A and characteristic vectors of attribute sets of the
      tion problem are defined and dealt with: the approximate                maximal rectangles constitute rows of matrix B.
      factorization problem (AFP) [3], where the number k of
      factors is sought to be minimal for a prescribed maximal                   A SSO, on the other hand, starts in its seek for each of (at
                                                                              most) the prescribed number k of factors with a selected set
          * Support by grant No. GA15-17899S of the Czech Science Founda-
                                                                              of attributes and searches for a set of objects such that the
      tion is acknowledged. M. Trnecka also acknowledges partial support by   Boolean product of the characteristic vectors of the two
      grant No. PrF_2016_027 of IGA of Palacký University Olomouc.            sets, as a rectangle (not necessarily maximal) correspond-
148                                                                                                                               J. Outrata, M. Trnecka

      ing to the factor, covers as many 1s in I as possible. At the     obtained from the modified algorithms with those obtained
      same time, however, the rectangle must also overcover as          from their respective original versions.
      few 0 in I by 1 as possible (i.e. the algorithm is designed          The rest of the paper is organized as follows. In the fol-
      for the DBP). Note here that A SSO is admitted to over-           lowing section 2 we briefly precise the BMF problems in-
      cover 0 in I in its aim to cover 1s, while G RE C ON D does       troduced above and recall GUHA, namely the quantifiers
      not do that (thereby it is said it performs a from-bellow         and general association rules. Then, in section 3 the mod-
      decomposition of I). Characteristic vectors of the selected       ification of A SSO with GUHA association rules and the
      sets of attributes constitute rows of matrix B while charac-      modification of G RE C ON D with rows of particular asso-
      teristic vectors of the corresponding found sets of objects       ciation matrix as initial attributes sets are presented. The
      constitute columns of matrix A. The sets of attributes are        modified algorithms are experimentally evaluated in sec-
      selected, under the same conditions as for the search of          tion 4 and section 5 draws a conclusion and future research
      objects, among candidate sets represented by rows of the          directions.
      attribute-attribute matrix called association matrix. This
      m × m Boolean matrix, created in the first step of the algo-      2     Basic notions of BMF and GUHA
      rithm, contains 1 in the row corresponding to an attribute i
      and the column corresponding to an attribute j if the asso-       2.1     Boolean Matrix Factorization
      ciation rule i ⇒ j has the (so-called) confidence – the ratio
                                                                        We precise the basic BMF notions and problems recalled
      of the number of objects having both i and j to the number
                                                                        in the beginning of the previous section. Denote by
      of objects having i – greater than a user-defined parameter
                                                                        {0,1}n×m the set of all n × m Boolean matrices (i.e. with
      τ; otherwise it contains 0 in the row and the column.
                                                                        entries either 1 or 0). The basic problem in BMF is to
                                                                        find for a given I ∈ {0,1}n×m matrices A ∈ {0,1}n×k and
      1.2   Associations in BMF                                         B ∈ {0,1}k×m for which
      The association rules in A SSO, also known from asso-                              I (approximately) equals A ○ B,                     (1)
                                                                        where ○ is the Boolean matrix product, i.e.
      ciation rule mining [1], are of one type of relationship
      between two (Boolean) attributes in (Boolean) object-
                                                                                          (A ○ B)i j = max min(Ail ,Bl j ).
      attribute data. There are other types, in general between                                             k

      two logical formulas above attributes instead of between                                             l=1
      just two single attributes, introduced in the literature. After   The approximate equality in (1) is assessed by means of
      all, general association rules whose validity in data is de-      the well-known L1 -norm ∣∣I∣∣ = ∑m,n
                                                                                                           i, j=1 ∣Ii j ∣ and the corre-
      fined through a function of the numbers of objects having         sponding distance (error) function E(I,A ○ B) defined by
      and not having both or one of the attributes (or, in general,                                                  m,n
      satisfying and not satisfying the formulas above attributes)               E(I,A ○ B) = ∣∣I − A ○ B∣∣ = ∑ ∣Ii j − (A ○ B)i j ∣.
      are introduced and operated with in GUHA [7, 8, 9, 13],                                                       i, j=1
      one of the oldest, less known but most sophisticated,
                                                                           In BMF, one is usually interested in two parts of the
      method of knowledge discovery. In GUHA, several log-
                                                                        error function E, Eu corresponding to 1s in I that are 0s
                                                                        (and hence uncovered) in A ○ B and Eo corresponding to 0s
      ical and statistical functions, called quantifiers, used to in-
                                                                        in I that are 1s (and hence overcovered) in A ○ B:
      terpret several different types of association rules are in-
      troduced. Actually, one of the quantifiers (the basic one),
      founded implication, interprets the association rule used in               E(I,A ○ B) = Eu (I,A ○ B) + Eo (I,A ○ B), where
      A SSO (and association rule mining, see below).                                Eu (I,A ○ B) = ∣{⟨i, j⟩; Ii j = 1,(A ○ B)i j = 0}∣,
                                                                                       Eo (I,A ○ B) = ∣{⟨i, j⟩; Ii j = 0,(A ○ B)i j = 1}∣,
         The topic of this paper is twofold. First, we pick up
      several (other) types of association rules, interpreted by
      selected GUHA quantifiers, and use them in place of the           or, more often, in the relative errors
                                                                            eu (l) = Eu (I,A ○ B)/∣∣I∣∣,         eo (l) = Eo (I,A ○ B)/∣∣I∣∣. (2)
      A SSO’s association rule in the A SSO algorithm. Sec-
      ond, vice versa, we take the concept of association matrix
      from A SSO and utilize it in greedy-approach algorithms.          eu and eo are functions of the first l factors delivered by
      Namely, we take a particular association matrix and use           the particular algorithm and measure how well the data is
      the rows of the matrix as characteristic vectors of candi-        explained by the l factors. The value of 1 − eu represents
      dates to initial sets of attributes to which further attributes   the (pure) coverage of data by the observed factors. We
      are added in G RE C ON D instead of the empty set. Both           will use 1 − eu and eo in the experiments section 4 below.
      modifications of the algorithms are novel ideas not pre-          Note that the value of c = 1 − eu − eo represents the overall
      viously discussed in the literature. The main purpose of          coverage of data by the factors and is commonly used to
      the paper is to evaluate the use of various types of associ-      assess quality of decompositions delivered by BMF algo-
      ation rules in A SSO algorithm and the use of association         rithms [3, 4, 6, 11].
      matrix in G RE C ON D algorithm. The evaluation is done              The two optimization BMF problems are defined as fol-
      by experimental comparison of quality of decompositions           lows:
Evaluating Association Rules in Boolean Matrix Factorization                                                                                   149

      Definition 1 (Approximate Factorization Problem,                         Function q which assigns to any four-fold table 4ft(i, j,
      AFP [3]). Given I ∈ {0,1}n×m and prescribed error ε ≥ 0,              I) a logical value 0 or 1 defines a so-called (generalized,
      find A ∈ {0,1}n×k and B ∈ {0,1}k×m with k as small as                 GUHA) quantifier. There are several different quantifiers,
      possible such that ∣∣I − A ○ B∣∣ ≤ ε.                                 summarized e.g. in [13], logical and statistical, which in-
                                                                            terpret different types of association rules (with different
                                                                            meaning of the association ≈ between attributes):
         AFP emphasizes the need to account for (and thus to ex-
      plain) a prescribed (presumably reasonably large) portion
      of data, which is specified by ε.                                       • founded (p-)implication, ⇒ p (for ≈)

                                                                                                                        ≥ p,
      Definition 2 (Discrete Basis Problem, DBP [11]). Given                                                         a
                                                                                                               1 if a+b
      I ∈ {0,1}n×m and a positive integer k, find A ∈ {0,1}n×k and                           q(a,b,c,d) = {
                                                                                                               0 otherwise.
      B ∈ {0,1}k×m that minimize ∣∣I − A ○ B∣∣.
        DBP emphasizes the importance of the first k (pre-                       Parameter 0 < p ≤ 1 has a meaning of threshold for the
      sumably most important) factors. A throughout study of                     confidence of the association rule i ⇒ p j, i.e. the ratio
      Boolean matrix factorization from the point of views of                    of the number of objects having in I both attributes i
      the two problems is provided in [4].                                       and j to the number of objects having i. Founded
                                                                                 implication interprets the association rule used in the
                                                                                 original A SSO algorithm (with p denoted as τ in-
      2.2   GUHA                                                                 stead) and association rule mining, which is thus a
      We will only recall the necessary notions from the GUHA                    particular case of GUHA general association rules.
      (General Unary Hypotheses Automaton) method [7, 8]                      • double founded implication, ⇔ p
      which are required to describe the various types of asso-
      ciation rules used in the modified A SSO algorithm. For                                                        a
                                                                                                              1 if a+b+c ≥ p,
      throughout treatise of the foundations of the method (of                              q(a,b,c,d) = {
                                                                                                              0 otherwise.
      mechanized formation of hypotheses, in particular its ob-
      servational predicate calculi), see books [9] or [13].                     Compared to founded implication, double founded
         GUHA, more precisely its A SSOC procedure [9] (do not                   implication, to evaluate to 1, requires that the number
      confuse with the A SSO algorithm!) or more enhanced                        of objects having in I both i and j is at least 100 ⋅ p%
      4 FT-M INER procedure [14] for Boolean data, inputs                        of the number of objects having i or j.
      (Boolean) object-attribute incidence data with Boolean at-
                                                                              • founded equivalence, ≡ p
      tributes which we represent by a n × m Boolean matrix I.
      (General) association rule (over a given set of attributes) is                                                a+d
                                                                                                                          ≥ p,
                                                                                           q(a,b,c,d) = {
                                                                                                             1 if a+b+c+d
      an expression                                                                                          0 otherwise.
                                  i≈ j
      where i and j are attributes. (Note that in its full form,                 Meaning: At least 100 ⋅ p% among all objects in I
      GUHA general association rule is an expression ϕ ≈ ψ                       have the same attributes.
      where ϕ and ψ are arbitrary complex logical formulas                    • E-equivalence, ∼δE
      above the attributes. We consider the simplified case with
      just single attributes for the formulas, as in the associa-                                                   b     c
                                                                                                                             ) < δ,
                                                                                       q(a,b,c,d) = {
                                                                                                         1 if max( a+b , c+d
      tion rules used in A SSO and association rule mining.) i ≈ j                                       0 otherwise.
      is true in I if there is an association between i and j inter-
      preted by a function q assigning to the four-fold table 4ft(i,             Meaning: Less than 100 ⋅ δ % (0 < δ ≤ 0.5) of objects
       j, I) corresponding to i ≈ j and I the value 1 (logical true).            among the objects having i do not have j and among
      4ft(i, j, I) is the quadruple                                              the objects not having i have j.

             ⟨a,b,c,d⟩ =                                                      • negative Jaccard distance
                  ⟨ f r(i ∧ j), f r(i ∧ ¬ j), f r(¬i ∧ j), f r(¬i ∧ ¬ j)⟩                                           b+c
                                                                                                                         ≥ p,
                                                                                            q(a,b,c,d) = {
                                                                                                              1 if b+c+d
      where f r(i ∧ j) is the number of objects having both i and                                             0 otherwise.
       j in I (rows in I in which there is 1 in both columns corre-
                                                                                 This is our new quantifier resembling Jaccard dis-
      sponding to i and j) and ¬i is an attribute corresponding to
                                                                                 tance dissimilarity measure used in data mining
      the negation of attribute i (i.e. the column in I correspond-
                                                                                 (which is one minus Jaccard similarity coeffi-
      ing to i in which 1s are replaced by 0s and vice versa).
                                                                                 cient [15] which in turn is equal to double founded
          4ft(i, j, I) is usually depicted as
                                                                                 implication above). Compared to double founded im-
                 I             j                  ¬j                             plication, this quantifier, to evaluate to 1, requires that
                 i      a = f r(i ∧ j)     b = f r(i ∧ ¬ j)                      at least 100⋅ p% objects have i or j among the objects
                 ¬i    c = f r(¬i ∧ j)    d = f r(¬i ∧ ¬ j).                     not having i or j.
150                                                                                                                   J. Outrata, M. Trnecka

         In fact, the above presented quantifiers, except for the     (generic) version of the original algorithm in [11] is com-
      last one, are simplified versions of quantifiers defined        puting the association matrix Q (lines 1–5) using the given
      in [9] where additional parameter s > 0 is included:            quantifier qτ (a,b,c,d) interpreting a (general) association
                                                                      rule i ≈ j instead of using the confidence of the association
           • founded implication, ⇒ p,s                               rule i ⇒τ j.
                                            a
                                      1 if a+b ≥ p and a ≥ s,
                    q(a,b,c,d) = {                                    3.2     G RE C ON D using association matrix
                                      0 otherwise.
                                                                      Due to the particular way of finding factors in the G RE -
      and similarly for the other quantifiers. For association rule
                                                                      C ON D algorithm, namely as maximal rectangles, we need
      i ≈ j, s has a meaning of threshold for the so-called sup-
                                                                      to use a particular association matrix of which the rows
      port of the rule – the number of objects having in I both
                                                                      are used as characteristic vectors of candidates to initial at-
      attributes i and j (or, if normalized as in association rule
                                                                      tribute sets in the algorithm. The matrix used is computed
      mining, the ratio of the number to the number of all objects
                                                                      using the GUHA quantifier founded implication with pa-
      in I).
                                                                      rameter p set to 1; hence the confidence of the interpreted
                                                                      association rule i ⇒ p j must be 1 for the rule to be true
      3      The modified algorithms                                  (which precisely coincides with the notion of attribute im-
                                                                      plication between attributes i and j in formal concept anal-
      3.1     A SSO with GUHA association rules                       ysis, see [5]). This, at the same time, means that the asso-
                                                                      ciation matrix is the special case of the association matrix
                                                                      of the A SSO algorithm with τ = 1.
      The modified A SSO algorithm involving GUHA (general)
      association rule interpreted by a GUHA quantifier is de-
                                                                         The modified G RE C ON D algorithm using the associa-
      picted in Algorithm 1.
                                                                      tion matrix is depicted in Algorithm 2.

          Algorithm 1: Modified A SSO algorithm
                                                                          Algorithm 2: Modified G RE C ON D algorithm
           Input: A Boolean matrix I ∈ {0,1}n×m , a positive
                                                                           Input: A Boolean matrix I ∈ {0,1}n×m and a
                  integer k, a threshold value τ ∈ (0,1],
                                                                                  prescribed error ε ≥ 0
                  real-valued weights w+ , w− and a quantifier qτ
                                                                           Output: Boolean matrices A ∈ {0,1}n×k and
                  (with parameter τ) interpreting i ≈ j
                                                                                    B ∈ {0,1}k×m
           Output: Boolean matrices A ∈ {0,1}n×k and
                    B ∈ {0,1}k×m                                      1  Q ← empty m × m Boolean matrix
                                                                         for i = 1,...,m do
         for i = 1,...,m do
                                                                      2
                                                                             for j = 1,...,m do
      1
             for j = 1,...,m do
                                                                       3
                                                                                  if i ⇒1 j is true in I then
      2
                  Qi j = qτ (a,b,c,d)
                                                                       4
                                                                                       Qi j = 1
       3
                                                                       5
       4     end
                                                                       6          end
       5 end
       6 A ← empty n × k Boolean matrix
                                                                       7     end
       7 B ← empty k × m Boolean matrix
                                                                       8 end
                                                                       9 A ← empty n × k Boolean matrix
       8 for l = 1,...,k do
                                                                      10 B ← empty k × m Boolean matrix
             (Qi_ ,e) ← argmaxQi_ , e∈{0,1}n×1
                                                                      11 while ∣∣I − A ○ B∣∣ > ε do
       9


                  cover([ ],[A e],I,w+ ,w− )
                             B                                        12     D ← argmaxQi_ cover(Qi_ ,I,A,B)
                                                                             V ← cover(D,I,A,B)
                           Qi_
                                                                      13

             A ← [A e], B ← [ ]
                                 B                                    14     while there is j such that D j = 0 and
                                                                               cover(D + [ j],I,A,B) > V do
      10
                                Qi_
      11 end                                                          15           j ← argmax j,D j =0 cover(D + [ j],I,A,B)
      12 return A and B                                               16          D ← (D + [ j])↓↑
                                                                      17          V ← cover(D,I,A,B)
         Qi_ denotes the ith row vector of the (Boolean) associ-      18     end
      ation matrix Q and the function cover(B,A,I,w+ ,w− ) is
                                                                             A ← [A D↓ ], B ← [ ]
                                                                                                  B
                                                                      19
      defined as                                                                                  D
                                                                      20 end
                    w+ ∣{⟨i, j⟩; Ii j = 1,(A ○ B)i j = 1}∣            21 return A and B
                         −w− ∣{⟨i, j⟩; Ii j = 0,(A ○ B)i j = 1}∣.

         The original algorithm was described in the introduc-          D j denotes the jth item of (row Boolean) vector D ∈
      tion section 1. The only modification in Algorithm 1 to the     {0,1}1×m , [ j] ∈ {0,1}1×m denotes the (row Boolean) vec-
Evaluating Association Rules in Boolean Matrix Factorization                                                                          151

               Dataset      k    dens A      dens B      dens I                    Dataset         Size            ∣∣I∣∣
               Set C1      40     0.07        0.04        0.10                     DNA           4590×392        26527
               Set C2      40     0.07        0.06        0.15                     Mushroom      8124×119       186852
               Set C3      40     0.11        0.05        0.20                     Zoo            101×28           862

                         Table 1: Synthetic data                                           Table 2: Real data


      tor with jth item equal to 1 and all other items equal to 0,     4.1   A SSO with GUHA association rules
      the function cover(D,I,A,B) is defined as
                                                                       We observe the values of 1−eu (l) (2) for l = 0,...,k where
                         ∣∣(D↓ × D↓↑ ) ⋅ (I − A ○ B)∣∣                 k is the number of factors delivered by a particular algo-
                                                                       rithm. Clearly, for l = 0 (no factors, A and B are “empty”)
                                                                       we have 1 − eu (l) = 0. In accordance with general require-
      and the (formal concept-forming [5]) operators C↑ and D↓
      for (column Boolean) vector C ∈ {0,1}n×1 and vector D,
                                                                       ments on BMF, for a good factorization algorithm 1−eu (l)
      respectively, are defined as
                                                                       should be increasing in l, should have relatively large val-
          C↑   = +[ j] ∈ {0,1}1×m ; for each i,Ci = 1 ∶ Ii j = 1,      ues for small l, and it is desirable that for l = k we have
                                                                       I = A ○ B, i.e. the data is fully explained by all k fac-
               = +[i] ∈ {0,1}n×1 ; for each j,D j = 1 ∶ Ii j = 1.
                                                                       tors computed (in which case 1 − eu (l) = 1). For synthetic
          D↓
         Again, the original algorithm was described in the intro-     datasets C1 , C2 and C3 , values of 1 − eu (l) are shown in
      duction section 1. The only modifications in Algorithm 2         Figures 1, 2 and 3, respectively.
      to the (generic) version of the original algorithm in [3] are       As we mentioned above the A SSO algorithm admits
      computing the particular association matrix Q (lines 1–8)        overcovering of 0s of input data matrix. The number of
      using the quantifier founded implication with p = 1 inter-       overcovered 0s is a very important value and the values of
      preting the association rule i ⇒1 j and using the rows of        eo (l) (2) for synthetic datasets C1 ,C2 and C3 are shown in
      the matrix as characteristic vectors of candidates to initial    Figures4, 5 and 6. Let us note that the results marked as
      attribute sets (line 12) in the factor construction (lines 12–   “founded implication” are in fact results for the original
      19).                                                             A SSO algorithm. Note also that all variants of A SSO re-
                                                                       quire us to set τ and (one of) w+ and w−, see Algorithm 1.
                                                                       Based on some previous experimental results (see [4]) we
      4    Experimental evaluation                                     fixed w+ = w− = 1 and used the value of τ for which the
                                                                       particular algorithm produces the best results. In most
                                                                       cases, the best choice was 0.8 < τ < 1. This observation
      In this section, we provide an experimental evaluation of
      the modified algorithms and their comparison to the orig-
                                                                       corresponds with results in [4].
      inal versions, the A SSO algorithm and the G RE C ON D al-
                                                                          We can see that the original algorithm is outperformed
                                                                       in terms of both coverage (1 − eu ) and overcoverage (eo )
      gorithm. Due to the lack of space we do not present a
      comparison with other algorithms and approaches to the
                                                                       by the modification utilizing double founded implication.
      general BMF. A comparison that includes both the algo-
                                                                       This modification produces large values of coverage and
      rithms can be found for example in [4].
                                                                       compared to the original A SSO algorithm commits smaller
         As in the typical experiment scenario—which occurs
                                                                       overcoverage error. This is true for both synthetic and real
      in various BMF papers—we use both synthetic and real
                                                                       datasets. Very promising is also the modification utilizing
      datasets. Experiments on synthetic datasets enable us to
                                                                       the negative Jaccard distance quantifier.
      analyze performance of the algorithms on data with the
                                                                          Modifications utilizing founded equivalence and E-
      same and known characteristics—we can analyze results
                                                                       equivalence, however, do not perform well, for synthetic
      in average case. On real data, we can study meaning of
                                                                       datasets. In case of dataset C1 —the most sparse one—
      obtained results. Let us also note, that synthetic data are
                                                                       both modifications commit extremely large overcover er-
      artificial while real data are influenced by real factors.
                                                                       ror, the values are beyond the scale of Figure 4. In cases
      Synthetic data. We created 1000 of randonly generated            of C2 and C3 , Figures 2 and 3, they produce poor coverage
      datasets. Every dataset Xi has 500 rows and 250 columns          while the overcoverage error is not much different from the
      and was obtained as a Boolean product Xi = Ai ○ Bi of ma-        modifications utilizing other quantifiers, for higher num-
      trices Ai and Bi that were generated randomly with param-        ber of factors. On the other side, for real datasets the re-
      eters shown in Table 1. The inner dimmension k was for           sults are comparable with the other modifications (Figures
      all Xi set to 40, i.e. the expected number of factors is 40.     7, 8), with significantly smaller overcoverage errors (Fig-
                                                                       ures 10, 11). The only exception is the Mushroom dataset
      Real data. We used datasets DNA [12], Mushroom [2],              where founded equivalence is again beyond the scale of
      and Zoo [2], the characteristics of which are shown in Ta-       Figure 10. Due to the limited space we do not include
      ble2. All of them are well known and widely used in the          results for the DNA dataset which are very close to the
      literature on BMF.                                               results obtained for the Zoo dataset.
152                                                                                                                                                             J. Outrata, M. Trnecka


                  1                                                                                           2

                 0.9                                                                                         1.8

                 0.8                                                                                         1.6

                 0.7                                                                                         1.4




                                                                                              overcoverage
                 0.6                                                                                         1.2
      coverage




                 0.5                                                                                          1

                 0.4                                                                                         0.8

                 0.3                                                                                         0.6
                                                            founded implication                                                                       founded implication
                 0.2                                        double founded implication                       0.4                                      double founded implication
                                                            founded equivalence                                                                       founded equivalence
                 0.1                                        negative Jaccard distance                        0.2                                      negative Jaccard distance
                                                            E−equivalence                                                                             E−equivalence
                  0                                                                                           0
                       0       5    10     15     20        25      30       35          40                        0     5     10    15     20        25      30       35          40
                                            number of factors                                                                         number of factors



                           Figure 1: Coverage for synthetic dataset C1                                             Figure 4: Overcoverage for synthetic dataset C1

                  1                                                                                           2

                 0.9                                                                                         1.8

                 0.8                                                                                         1.6

                 0.7                                                                                         1.4
                                                                                              overcoverage




                 0.6                                                                                         1.2
      coverage




                 0.5                                                                                          1

                 0.4                                                                                         0.8

                 0.3                                                                                         0.6
                                                            founded implication                                                                       founded implication
                 0.2                                        double founded implication                       0.4                                      double founded implication
                                                            founded equivalence                                                                       founded equivalence
                 0.1                                        negative Jaccard distance                        0.2                                      negative Jaccard distance
                                                            E−equivalence                                                                             E−equivalence
                  0                                                                                           0
                       0       5    10     15     20        25      30       35          40                        0     5     10    15     20        25      30       35          40
                                            number of factors                                                                         number of factors



                           Figure 2: Coverage for synthetic dataset C2                                             Figure 5: Overcoverage for synthetic dataset C2

                  1                                                                                           2

                 0.9                                                                                         1.8

                 0.8                                                                                         1.6

                 0.7                                                                                         1.4
                                                                                              overcoverage




                 0.6                                                                                         1.2
      coverage




                 0.5                                                                                          1

                 0.4                                                                                         0.8

                 0.3                                                                                         0.6
                                                            founded implication                                                                       founded implication
                 0.2                                        double founded implication                       0.4                                      double founded implication
                                                            founded equivalence                                                                       founded equivalence
                 0.1                                        negative Jaccard distance                        0.2                                      negative Jaccard distance
                                                            E−equivalence                                                                             E−equivalence
                  0                                                                                           0
                       0       5    10     15     20        25      30       35          40                        0     5     10    15     20        25      30       35          40
                                            number of factors                                                                         number of factors



                           Figure 3: Coverage for synthetic dataset C3                                             Figure 6: Overcoverage for synthetic dataset C3
Evaluating Association Rules in Boolean Matrix Factorization                                                                                                                                       153


                   1                                                                                              1

                  0.9                                                                                            0.9

                  0.8                                                                                            0.8

                  0.7                                                                                            0.7




                                                                                                  overcoverage
                  0.6                                                                                            0.6
       coverage




                  0.5                                                                                            0.5

                  0.4                                                                                            0.4

                  0.3                                                                                            0.3
                                                                founded implication                                                                             founded implication
                  0.2                                           double founded implication                       0.2                                            double founded implication
                                                                founded equivalence                                                                             founded equivalence
                  0.1                                           negative Jaccard distance                        0.1                                            negative Jaccard distance
                                                                E−equivalence                                                                                   E−equivalence
                   0                                                                                              0
                        0         20            40            60              80         100                           0         20             40            60               80         100
                                                number of factors                                                                               number of factors



                            Figure 7: Coverage for Mushroom dataset                                                    Figure 10: Overcoverage for Mushroom dataset

                   1                                                                                              1

                  0.9                                                                                            0.9

                  0.8                                                                                            0.8

                  0.7                                                                                            0.7
                                                                                                  overcoverage




                  0.6                                                                                            0.6
       coverage




                  0.5                                                                                            0.5

                  0.4                                                                                            0.4

                  0.3                                                                                            0.3
                                                                founded implication                                                                             founded implication
                  0.2                                           double founded implication                       0.2                                            double founded implication
                                                                founded equivalence                                                                             founded equivalence
                  0.1                                           negative Jaccard distance                        0.1                                            negative Jaccard distance
                                                                E−equivalence                                                                                   E−equivalence
                   0                                                                                              0
                        0              5              10                 15                  20                        0              5               10                  15                  20
                                                number of factors                                                                               number of factors



                               Figure 8: Coverage for Zoo dataset                                                          Figure 11: Overcoverage for Zoo dataset

                   1                                                                                              1

                  0.9                                                                                            0.9

                  0.8                                                                                            0.8

                  0.7                                                                                            0.7

                  0.6                                                                                            0.6
      coverage




                                                                                                  coverage




                  0.5                                                                                            0.5

                  0.4                                                                                            0.4

                  0.3                                                                                            0.3

                  0.2                                                                                            0.2

                  0.1                                                  GreConD                                   0.1                                                    GreConD
                                                                       GreConD implication                                                                              GreConD implication
                   0                                                                                              0
                        0       20         40         60          80           100       120                           0    50   100      150     200     250     300      350      400   450
                                                number of factors                                                                               number of factors


      Figure 9: Original and modified G RE C ON D on Mushroom                                     Figure 12: Original and modified G RE C ON D on DNA
      dataset                                                                                     dataset
154                                                                                                                      J. Outrata, M. Trnecka

         Presented results are representative. We performed the         which computes, or updates, the matrix in the process of
      same evaluations for several other datasets which we have         searching for factors instead of computing it once before
      not included in the paper and observed the same type of           the search. Second, we will look for a way how to use in
      results described above.                                          the utilization of association matrix the so-called essential
                                                                        elements of the input data matrix, which are crucial for the
                                                                        G RE E SS algorithm [4] (which improves the G RE C ON D
      4.2   G RE C ON D using association matrix
                                                                        algorithm).
      Do to the limited space we do not present a comparison
      of the original G RE C ON D algorithm and the modifica-           References
      tion utilizing the association matrix (Algorithm 2) on the
      synthetic datasets. On each Xi the modified G RE C ON D            [1] Agrawal R., Imieliński T., Swami A.: Mining association
      slightly outperforms the original algorithm from the stand-            rules between sets of items in large databases, Proc. ACM
                                                                             SIGMOD 1993, 207–216.
      point of coverage. Moreover, the modified algorithm also
      tends to produce less factors, i.e. outperforms the origi-         [2] Bache K., Lichman M., UCI Machine Learning Reposi-
                                                                             tory [http://archive.ics.uci.edu/ml], Irvine, CA: University
      nal G RE C ON D from both the AFP and DBP views (see
                                                                             of California, School of Information and Computer Sci-
      Section 2).
         Values of 1 − eu for the Mushroom and DNA datasets
                                                                             ence, 2013.
                                                                         [3] Belohlavek R., Vychodil V., Discovery of optimal factors in
      are shown in Figures 9 and 12, respectively. We can see
                                                                             binary data via a novel method of matrix decomposition, J.
      that the modified algorithm first looses for small numbers             Comput. Syst. Sci. 76(1)(2010), 3–20 (preliminary version
      of factors but in the end, it outperforms the original G RE -          in Proc. SCIS & ISCIS 2006).
      C ON D algorithm—i.e. it outperforms G RE C ON D from              [4] Belohlavek R., Trnecka M., From-below approximations
      the AFP view. We observed similar behavior also for the                in Boolean matrix factorization: Geometry and new algo-
      other real datasets.                                                   rithm, J. Comput. Syst. Sci. 81(8)(2015), 1678–1697.
                                                                         [5] Ganter B., Wille R., Formal Concept Analysis: Mathemat-
      Time complexity. We implemented all used algorithms in                 ical Foundations, Springer, Berlin, 1999.
      MATLAB with critical parts written in C. Theoretical time          [6] Geerts F., Goethals B., Mielikäinen T., Tiling databases,
      complexity is not of our primary concern. Practically, it              Proc. Discovery Science 2004, 278–289.
      follows from our observations that the modification of the         [7] Hájek P., Holeňa M., Rauch J.: The GUHA method and its
      G RE C ON D algorithm is slightly faster than the original             meaning for data mining. Journal of Computer and System
      algorithm and all modifications of the A SSO algorithm are             Science 76(1)(2010), 34-–48.
      equally fast as the original.                                      [8] Hájek P., Havel I., Chytil M.: The GUHA method of
                                                                             automatic hypotheses determination. Computing 1(1966),
                                                                             293—308.
      5     Conclusion                                                   [9] Hájek P., Havránek T.: Mechanizing Hypothesis Forma-
                                                                             tion (Mathematical Foundations for a General Theory),
      We evaluated the use of various types of (general) associa-            Springer-Verlag 1978, 396 pp. New edition available on-
      tion rules from the GUHA knowledge discovery method in                 line at http://www.cs.cas.cz/~hajek/guhabook/.
      the Boolean matrix factorization (BMF) algorithm A SSO            [10] Lucchese C., Orlando S., Perego R., Mining top-K patterns
      and an utilization of (particular) association rules in the            from binary datasets in presence of noise, SIAM DM 2010,
      greedy-approach BMF algorithm G RE C ON D which is not                 165–176.
      based on association rules. The comparison of the qual-           [11] Miettinen P., Mielikäinen T., Gionis A., Das G., Mannila
      ity of factorization produced by the modified algorithms               H., The discrete basis problem, IEEE Trans. Knowledge
      with those produced by the original algorithms on both                 and Data Eng. 20(10)(2008), 1348–1362 (preliminary ver-
      synthetic and selected real data showed that our modified              sion in Proc. PKDD 2006).
      algorithms outperform, for some types of rules, the origi-        [12] Myllykangas S. et al, 2006, DNA copy number am-
      nal ones. Namely the double founded implication and (our               plification profiling of human neoplasms, Oncogene
      new) negative Jaccard distance quantifiers interpreting the            25(55)(2006), 7324–7332.
      association rules in A SSO perform better than the founded        [13] Rauch J.: Observational Calculi and Association Rules.
      implication quantifier used in the original A SSO. Also the            Springer-Verlag, 2013.
      utilization of association matrix in the factor search initial-   [14] Rauch J., Šimůnek M.: Mining for 4ft rules, in: Proceed-
      ization stage of the G RE C ON D algorithm improves factor-            ings of Discovery Science, Springer-Verlag, 2000.
      ization results produced by the algorithm.                        [15] Tan P.-N., Steinbach M., Kumar V.: Introduction to Data
         The observed results encourage us to the following fu-              Mining. Addison Wesley, Boston, MA, 2006.
      ture research directions. First, as the role of association       [16] Xiang Y., Jin R., Fuhry D., Dragan F. F., Summarizing
      matrix is crucial for the A SSO algorithm (and, as we have             transactional databases with overlapped hyperrectangles,
      seen, the algorithm can be improved by using other types               Data Mining and Knowledge Discovery 23(2011), 215–251
                                                                             (preliminary version in Proc. ACM KDD 2008).
      of association rules), we have an idea about algorithm