<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Evaluating Association Rules in Boolean Matrix Factorization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Outrata</string-name>
          <email>jan.outrata@upol.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Trnecka</string-name>
          <email>martin.trnecka@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Palacký University Olomouc</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>147</fpage>
      <lpage>154</lpage>
      <abstract>
        <p>Association rules, or association rule mining, is a well-established and popular method of data mining and machine learning successfully applied in many different areas since mid-nineties. Association rules form a ground of the Asso algorithm for discovery of the first (presumably most important) factors in Boolean matrix factorization. In Asso, the confidence parameter of association rules heavily influences the quality of factorization. However, association rules, in a more general form, appear already in GUHA, a knowledge discovery method developed since mid-sixties. In the paper, we evaluate the use of various (other) types of association rules from GUHA in Asso and, from the other side, a possible utilization of (particular) association rules in other Boolean matrix factorization algorithms not based on the rules. We compare the quality of factorization produced by the modified algorithms with those produced by the original algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <sec id="sec-1-1">
        <title>The problem and algorithms</title>
        <p>Boolean matrix factorization (BMF), called also Boolean
matrix decomposition, aims to find for a given n × m
object-attribute incidence Boolean matrix I a n × k
objectfactor Boolean matrix A and a k × m factor-attribute
Boolean matrix B such that the Boolean matrix product
(see below) of A and B (approximately) equals I. A
decomposition of I into A and B may be interpreted as a
discovery of k factors exactly or approximately explaining
the data. Interpreting the matrices I, A and B as the
objectattribute, object-factor and factor-attribute incidence
matrices, respectively, A and B explain I as follows: the
object i has the attribute j (the entry Ii j corresponding to the
row i and the column j is 1) if and only if there exists
factor l such that l applies to i and j is one of the particular
manifestations of l. The optimization version of this
basic BMF problem demands k to be as small as possible.
The least k for which an exact decomposition of I exists is
called the Boolean rank (or Schein rank) of I.</p>
        <p>
          In the literature, two common variants of the
optimization problem are defined and dealt with: the approximate
factorization problem (AFP) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], where the number k of
factors is sought to be minimal for a prescribed maximal
*Support by grant No. GA15-17899S of the Czech Science
Foundation is acknowledged. M. Trnecka also acknowledges partial support by
grant No. PrF_2016_027 of IGA of Palacký University Olomouc.
difference (error) of the Boolean product of A and B from I
(usualy zero), and the discrete basic problem (DBP) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
where the error is sought to be minimal for a prescribed
(fixed) k. The formal definitions of the problems are given
below. For the former problem, greedy approach
algorithms seem popular. One of the most efficient ones,
GRECOND, is proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] (where it is called Algorithm 2).
Another one, GREESS [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], refines GRECOND based on a
deeper insight into the problem. For the latter problem,
probably the most known (and a basic one) algorithm is
ASSO, proposed in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], together with a few of its
variants. Other BMF algorithms proposed in the recent data
mining literature that can be tailored for either of the two
problems are e.g. HYPER [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] or PANDA [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          GRECOND (and GREESS and their variants) finds
factors as maximal rectangles, Boolean matrices whose
entries with 1 form a maximal rectangular area (full of 1s),
upon a suitable permutation of rows and columns. This
concept comes from the geometric view on BMF and
formal concept analysis (FCA). The view tells us that finding
a decomposition of I means finding a coverage of 1s in I
by rectangles full of 1s [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ] and in FCA maximal
rectangles full of 1s correspond to so-called formal concepts [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
We will use maximal rectangles as factors to describe the
algorithms now. The GRECOND algorithm, in its seek for
a factor, starts with the empty set of attributes to which a
selected attribute with possibly other attributes are
repeatedly added. The constructed set of attributes together with
the set of all objects having all the attributes determine a
maximal rectangle (form a formal concept). The selected
attribute is such that the rectangle with the attributes added
covers as many still uncovered 1 in I as possible. The
attributes are added repeatedly as long as the number of still
uncovered 1s in I covered by the rectangle grows (note
the greedy approach here). Further factors are sought the
same way and the algorithm stops when the prescribed
number of 1s in I is covered by the maximal rectangles
corresponding to factors (i.e. the algorithm is designed for
the AFP). Characteristic vectors of object sets
determining found maximal rectangles then constitute columns of
matrix A and characteristic vectors of attribute sets of the
maximal rectangles constitute rows of matrix B.
        </p>
        <p>ASSO, on the other hand, starts in its seek for each of (at
most) the prescribed number k of factors with a selected set
of attributes and searches for a set of objects such that the
Boolean product of the characteristic vectors of the two
sets, as a rectangle (not necessarily maximal)
corresponding to the factor, covers as many 1s in I as possible. At the
same time, however, the rectangle must also overcover as
few 0 in I by 1 as possible (i.e. the algorithm is designed
for the DBP). Note here that ASSO is admitted to
overcover 0 in I in its aim to cover 1s, while GRECOND does
not do that (thereby it is said it performs a from-bellow
decomposition of I). Characteristic vectors of the selected
sets of attributes constitute rows of matrix B while
characteristic vectors of the corresponding found sets of objects
constitute columns of matrix A. The sets of attributes are
selected, under the same conditions as for the search of
objects, among candidate sets represented by rows of the
attribute-attribute matrix called association matrix. This
m × m Boolean matrix, created in the first step of the
algorithm, contains 1 in the row corresponding to an attribute i
and the column corresponding to an attribute j if the
association rule i ⇒ j has the (so-called) confidence – the ratio
of the number of objects having both i and j to the number
of objects having i – greater than a user-defined parameter
τ; otherwise it contains 0 in the row and the column.
1.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Associations in BMF</title>
        <p>
          The association rules in ASSO, also known from
association rule mining [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], are of one type of relationship
between two (Boolean) attributes in (Boolean)
objectattribute data. There are other types, in general between
two logical formulas above attributes instead of between
just two single attributes, introduced in the literature. After
all, general association rules whose validity in data is
defined through a function of the numbers of objects having
and not having both or one of the attributes (or, in general,
satisfying and not satisfying the formulas above attributes)
are introduced and operated with in GUHA [
          <xref ref-type="bibr" rid="ref13 ref7 ref8 ref9">7, 8, 9, 13</xref>
          ],
one of the oldest, less known but most sophisticated,
method of knowledge discovery. In GUHA, several
logical and statistical functions, called quantifiers, used to
interpret several different types of association rules are
introduced. Actually, one of the quantifiers (the basic one),
founded implication, interprets the association rule used in
ASSO (and association rule mining, see below).
        </p>
        <p>The topic of this paper is twofold. First, we pick up
several (other) types of association rules, interpreted by
selected GUHA quantifiers, and use them in place of the
ASSO’s association rule in the ASSO algorithm.
Second, vice versa, we take the concept of association matrix
from ASSO and utilize it in greedy-approach algorithms.
Namely, we take a particular association matrix and use
the rows of the matrix as characteristic vectors of
candidates to initial sets of attributes to which further attributes
are added in GRECOND instead of the empty set. Both
modifications of the algorithms are novel ideas not
previously discussed in the literature. The main purpose of
the paper is to evaluate the use of various types of
association rules in ASSO algorithm and the use of association
matrix in GRECOND algorithm. The evaluation is done
by experimental comparison of quality of decompositions
obtained from the modified algorithms with those obtained
from their respective original versions.</p>
        <p>The rest of the paper is organized as follows. In the
following section 2 we briefly precise the BMF problems
introduced above and recall GUHA, namely the quantifiers
and general association rules. Then, in section 3 the
modification of ASSO with GUHA association rules and the
modification of GRECOND with rows of particular
association matrix as initial attributes sets are presented. The
modified algorithms are experimentally evaluated in
section 4 and section 5 draws a conclusion and future research
directions.
2
2.1</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Basic notions of BMF and GUHA</title>
      <sec id="sec-2-1">
        <title>Boolean Matrix Factorization</title>
        <p>We precise the basic BMF notions and problems recalled
in the beginning of the previous section. Denote by
{0, 1}n×m the set of all n × m Boolean matrices (i.e. with
entries either 1 or 0). The basic problem in BMF is to
find for a given I ∈ {0, 1}n×m matrices A ∈ {0, 1}n×k and
B ∈ {0, 1}k×m for which</p>
        <p>I (approximately) equals A ○ B,
(1)
where ○ is the Boolean matrix product, i.e.</p>
        <p>k
(A ○ B)i j = max min(Ail , Bl j).</p>
        <p>l=1
The approximate equality in (1) is assessed by means of
m,n
the well-known L1-norm SSISS = ∑i, j=1 SIi jS and the
corresponding distance (error) function E(I, A ○ B) defined by
E(I, A ○ B) = SSI − A ○ BSS =
m,n
Q SIi j − (A ○ B)i jS.
i, j=1</p>
        <p>In BMF, one is usually interested in two parts of the
error function E, Eu corresponding to 1s in I that are 0s
(and hence uncovered) in A ○ B and Eo corresponding to 0s
in I that are 1s (and hence overcovered) in A ○ B:
E(I, A ○ B) = Eu(I, A ○ B) + Eo(I, A ○ B), where</p>
        <p>Eu(I, A ○ B) = S{⟨i, j⟩ ; Ii j = 1, (A ○ B)i j = 0}S,</p>
        <p>Eo(I, A ○ B) = S{⟨i, j⟩ ; Ii j = 0, (A ○ B)i j = 1}S,
or, more often, in the relative errors</p>
        <p>
          eu(l) = Eu(I, A ○ B)~SSISS, eo(l) = Eo(I, A ○ B)~SSISS. (2)
eu and eo are functions of the first l factors delivered by
the particular algorithm and measure how well the data is
explained by the l factors. The value of 1 − eu represents
the (pure) coverage of data by the observed factors. We
will use 1 − eu and eo in the experiments section 4 below.
Note that the value of c = 1 − eu − eo represents the overall
coverage of data by the factors and is commonly used to
assess quality of decompositions delivered by BMF
algorithms [
          <xref ref-type="bibr" rid="ref11 ref3 ref4 ref6">3, 4, 6, 11</xref>
          ].
        </p>
        <p>
          The two optimization BMF problems are defined as
follows:
Definition 1 (Approximate Factorization Problem,
AFP [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). Given I ∈ {0, 1}n×m and prescribed error ε ≥ 0,
find A ∈ {0, 1}n×k and B ∈ {0, 1}k×m with k as small as
possible such that SSI − A ○ BSS ≤ ε.
        </p>
        <p>AFP emphasizes the need to account for (and thus to
explain) a prescribed (presumably reasonably large) portion
of data, which is specified by ε.</p>
        <p>
          Definition 2 (Discrete Basis Problem, DBP [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]). Given
I ∈ {0, 1}n×m and a positive integer k, find A ∈ {0, 1}n×k and
B ∈ {0, 1}k×m that minimize SSI − A ○ BSS.
        </p>
        <p>
          DBP emphasizes the importance of the first k
(presumably most important) factors. A throughout study of
Boolean matrix factorization from the point of views of
the two problems is provided in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>GUHA</title>
        <p>
          We will only recall the necessary notions from the GUHA
(General Unary Hypotheses Automaton) method [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ]
which are required to describe the various types of
association rules used in the modified ASSO algorithm. For
throughout treatise of the foundations of the method (of
mechanized formation of hypotheses, in particular its
observational predicate calculi), see books [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] or [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          GUHA, more precisely its ASSOC procedure [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] (do not
confuse with the ASSO algorithm!) or more enhanced
4FT-MINER procedure [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for Boolean data, inputs
(Boolean) object-attribute incidence data with Boolean
attributes which we represent by a n × m Boolean matrix I.
(General) association rule (over a given set of attributes) is
an expression
        </p>
        <p>i ≈ j
where i and j are attributes. (Note that in its full form,
GUHA general association rule is an expression ϕ ≈ ψ
where ϕ and ψ are arbitrary complex logical formulas
above the attributes. We consider the simplified case with
just single attributes for the formulas, as in the
association rules used in ASSO and association rule mining.) i ≈ j
is true in I if there is an association between i and j
interpreted by a function q assigning to the four-fold table 4ft(i,
j, I) corresponding to i ≈ j and I the value 1 (logical true).
4ft(i, j, I) is the quadruple
⟨a, b, c, d⟩ =</p>
        <p>⟨ f r(i ∧ j), f r(i ∧ ¬ j), f r(¬i ∧ j), f r(¬i ∧ ¬ j)⟩
where f r(i ∧ j) is the number of objects having both i and
j in I (rows in I in which there is 1 in both columns
corresponding to i and j) and ¬i is an attribute corresponding to
the negation of attribute i (i.e. the column in I
corresponding to i in which 1s are replaced by 0s and vice versa).
4ft(i, j, I) is usually depicted as</p>
        <p>
          Function q which assigns to any four-fold table 4ft(i, j,
I) a logical value 0 or 1 defines a so-called (generalized,
GUHA) quantifier. There are several different quantifiers,
summarized e.g. in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], logical and statistical, which
interpret different types of association rules (with different
meaning of the association ≈ between attributes):
• founded (p-)implication, ⇒p (for ≈)
1 if a+ab ≥ p,
q(a, b, c, d) = oe 0 otherwise.
        </p>
        <p>Parameter 0 &lt; p ≤ 1 has a meaning of threshold for the
confidence of the association rule i ⇒p j, i.e. the ratio
of the number of objects having in I both attributes i
and j to the number of objects having i. Founded
implication interprets the association rule used in the
original ASSO algorithm (with p denoted as τ
instead) and association rule mining, which is thus a
particular case of GUHA general association rules.
• double founded implication, ⇔p
a
1 if a+b+c ≥ p,
q(a, b, c, d) = oe 0 otherwise.</p>
        <p>Compared to founded implication, double founded
implication, to evaluate to 1, requires that the number
of objects having in I both i and j is at least 100 ⋅ p %
of the number of objects having i or j.
• founded equivalence, ≡p
a+d
1 if a+b+c+d ≥ p,
q(a, b, c, d) = oe 0 otherwise.</p>
        <p>Meaning: At least 100 ⋅ p % among all objects in I
have the same attributes.
• E-equivalence, ∼δE</p>
        <p>1 if max ‰ a+bb , c+cd Ž &lt; δ ,
q(a, b, c, d) = oe 0 otherwise.</p>
        <p>Meaning: Less than 100 ⋅ δ % (0 &lt; δ ≤ 0.5) of objects
among the objects having i do not have j and among
the objects not having i have j.
• negative Jaccard distance</p>
        <p>1 if b+b+c+cd ≥ p,
q(a, b, c, d) = oe 0 otherwise.</p>
        <p>
          This is our new quantifier resembling Jaccard
distance dissimilarity measure used in data mining
(which is one minus Jaccard similarity
coefficient [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] which in turn is equal to double founded
implication above). Compared to double founded
implication, this quantifier, to evaluate to 1, requires that
at least 100 ⋅ p % objects have i or j among the objects
not having i or j.
        </p>
        <p>
          In fact, the above presented quantifiers, except for the
last one, are simplified versions of quantifiers defined
in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] where additional parameter s &gt; 0 is included:
• founded implication, ⇒p,s
        </p>
        <p>1 if a+ab ≥ p and a ≥ s,
q(a, b, c, d) = oe 0 otherwise.
and similarly for the other quantifiers. For association rule
i ≈ j, s has a meaning of threshold for the so-called
support of the rule – the number of objects having in I both
attributes i and j (or, if normalized as in association rule
mining, the ratio of the number to the number of all objects
in I).
Input: A Boolean matrix I ∈ {0, 1}n×m, a positive
integer k, a threshold value τ ∈ (0, 1],
real-valued weights w+, w− and a quantifier qτ
(with parameter τ) interpreting i ≈ j
Output: Boolean matrices A ∈ {0, 1}n×k and</p>
        <p>B ∈ {0, 1}k×m
1 for i = 1, . . . , m do
2 for j = 1, . . . , m do
3 Qi j = qτ (a, b, c, d)
4 end
5 end
6 A ← empty n × k Boolean matrix
7 B ← empty k × m Boolean matrix
8 for l = 1, . . . , k do
9
(Qi_, e) ← arg maxQi_, e∈{0,1}n×1</p>
        <p>B
cover( Qi_ , [A e], I, w+, w−)</p>
        <p>B</p>
        <p>Qi_
10</p>
        <p>A ← [A e], B ←
11 end
12 return A and B</p>
        <p>
          Qi_ denotes the ith row vector of the (Boolean)
association matrix Q and the function cover(B, A, I, w+, w−) is
defined as
w+S{⟨i, j⟩ ; Ii j = 1, (A ○ B)i j = 1}S
−w−S{⟨i, j⟩ ; Ii j = 0, (A ○ B)i j = 1}S.
(generic) version of the original algorithm in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is
computing the association matrix Q (lines 1–5) using the given
quantifier qτ (a, b, c, d) interpreting a (general) association
rule i ≈ j instead of using the confidence of the association
rule i ⇒τ j.
Due to the particular way of finding factors in the
GRECOND algorithm, namely as maximal rectangles, we need
to use a particular association matrix of which the rows
are used as characteristic vectors of candidates to initial
attribute sets in the algorithm. The matrix used is computed
using the GUHA quantifier founded implication with
parameter p set to 1; hence the confidence of the interpreted
association rule i ⇒p j must be 1 for the rule to be true
(which precisely coincides with the notion of attribute
implication between attributes i and j in formal concept
analysis, see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). This, at the same time, means that the
association matrix is the special case of the association matrix
of the ASSO algorithm with τ = 1.
        </p>
        <p>The modified GRECOND algorithm using the
association matrix is depicted in Algorithm 2.</p>
        <p>Algorithm 2: Modified GRECOND algorithm
Input: A Boolean matrix I ∈ {0, 1}n×m and a</p>
        <p>prescribed error ε ≥ 0
Output: Boolean matrices A ∈ {0, 1}n×k and</p>
        <p>B ∈ {0, 1}k×m
1 Q ← empty m × m Boolean matrix
2 for i = 1, . . . , m do
3 for j = 1, . . . , m do
4 if i ⇒1 j is true in I then
5 Qi j = 1
6 end</p>
        <p>end
7
8 end
9 A ← empty n × k Boolean matrix
10 B ← empty k × m Boolean matrix
11 while SSI − A ○ BSS &gt; ε do
12 D ← arg maxQi_ cover(Qi_, I, A, B)
13 V ← cover(D, I, A, B)
14 while there is j such that D j = 0 and</p>
        <p>cover(D + [ j], I, A, B) &gt; V do
15 j ← arg max j,D j=0 cover(D + [ j], I, A, B)</p>
        <p>The original algorithm was described in the
introduction section 1. The only modification in Algorithm 1 to the
D j denotes the jth item of (row Boolean) vector D ∈
{0, 1}1×m, [ j] ∈ {0, 1}1×m denotes the (row Boolean)
vec</p>
        <sec id="sec-2-2-1">
          <title>Dataset Set C1 Set C2 Set C3</title>
          <p>[ j] ∈ {0, 1}1×m ; for each i,Ci = 1 ∶ Ii j = 1,
[i] ∈ {0, 1}n×1 ; for each j, D j = 1 ∶ Ii j = 1.</p>
          <p>
            Again, the original algorithm was described in the
introduction section 1. The only modifications in Algorithm 2
to the (generic) version of the original algorithm in [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] are
computing the particular association matrix Q (lines 1–8)
using the quantifier founded implication with p = 1
interpreting the association rule i ⇒1 j and using the rows of
the matrix as characteristic vectors of candidates to initial
attribute sets (line 12) in the factor construction (lines 12–
19).
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental evaluation</title>
      <p>
        In this section, we provide an experimental evaluation of
the modified algorithms and their comparison to the
original versions, the ASSO algorithm and the GRECOND
algorithm. Due to the lack of space we do not present a
comparison with other algorithms and approaches to the
general BMF. A comparison that includes both the
algorithms can be found for example in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        As in the typical experiment scenario—which occurs
in various BMF papers—we use both synthetic and real
datasets. Experiments on synthetic datasets enable us to
analyze performance of the algorithms on data with the
same and known characteristics—we can analyze results
in average case. On real data, we can study meaning of
obtained results. Let us also note, that synthetic data are
artificial while real data are influenced by real factors.
Synthetic data. We created 1000 of randonly generated
datasets. Every dataset Xi has 500 rows and 250 columns
and was obtained as a Boolean product Xi = Ai ○ Bi of
matrices Ai and Bi that were generated randomly with
parameters shown in Table 1. The inner dimmension k was for
all Xi set to 40, i.e. the expected number of factors is 40.
Real data. We used datasets DNA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Mushroom [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
and Zoo [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the characteristics of which are shown in
Table2. All of them are well known and widely used in the
literature on BMF.
      </p>
      <sec id="sec-3-1">
        <title>Dataset DNA Mushroom Zoo</title>
        <p>We observe the values of 1 − eu(l) (2) for l = 0, . . . , k where
k is the number of factors delivered by a particular
algorithm. Clearly, for l = 0 (no factors, A and B are “empty”)
we have 1 − eu(l) = 0. In accordance with general
requirements on BMF, for a good factorization algorithm 1−eu(l)
should be increasing in l, should have relatively large
values for small l, and it is desirable that for l = k we have
I = A ○ B, i.e. the data is fully explained by all k
factors computed (in which case 1 − eu(l) = 1). For synthetic
datasets C1, C2 and C3, values of 1 − eu(l) are shown in
Figures 1, 2 and 3, respectively.</p>
        <p>
          As we mentioned above the ASSO algorithm admits
overcovering of 0s of input data matrix. The number of
overcovered 0s is a very important value and the values of
eo(l) (2) for synthetic datasets C1,C2 and C3 are shown in
Figures4, 5 and 6. Let us note that the results marked as
“founded implication” are in fact results for the original
ASSO algorithm. Note also that all variants of ASSO
require us to set τ and (one of) w+ and w−, see Algorithm 1.
Based on some previous experimental results (see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) we
fixed w+ = w− = 1 and used the value of τ for which the
particular algorithm produces the best results. In most
cases, the best choice was 0.8 &lt; τ &lt; 1. This observation
corresponds with results in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>We can see that the original algorithm is outperformed
in terms of both coverage (1 − eu) and overcoverage (eo)
by the modification utilizing double founded implication.
This modification produces large values of coverage and
compared to the original ASSO algorithm commits smaller
overcoverage error. This is true for both synthetic and real
datasets. Very promising is also the modification utilizing
the negative Jaccard distance quantifier.</p>
        <p>Modifications utilizing founded equivalence and
Eequivalence, however, do not perform well, for synthetic
datasets. In case of dataset C1—the most sparse one—
both modifications commit extremely large overcover
error, the values are beyond the scale of Figure 4. In cases
of C2 and C3, Figures 2 and 3, they produce poor coverage
while the overcoverage error is not much different from the
modifications utilizing other quantifiers, for higher
number of factors. On the other side, for real datasets the
results are comparable with the other modifications (Figures
7, 8), with significantly smaller overcoverage errors
(Figures 10, 11). The only exception is the Mushroom dataset
where founded equivalence is again beyond the scale of
Figure 10. Due to the limited space we do not include
results for the DNA dataset which are very close to the
results obtained for the Zoo dataset.
2
1.8
0.9
0.8
0.7</p>
        <p>Presented results are representative. We performed the
same evaluations for several other datasets which we have
not included in the paper and observed the same type of
results described above.
4.2</p>
        <sec id="sec-3-1-1">
          <title>GRECOND using association matrix</title>
          <p>Do to the limited space we do not present a comparison
of the original GRECOND algorithm and the
modification utilizing the association matrix (Algorithm 2) on the
synthetic datasets. On each Xi the modified GRECOND
slightly outperforms the original algorithm from the
standpoint of coverage. Moreover, the modified algorithm also
tends to produce less factors, i.e. outperforms the
original GRECOND from both the AFP and DBP views (see
Section 2).</p>
          <p>Values of 1 − eu for the Mushroom and DNA datasets
are shown in Figures 9 and 12, respectively. We can see
that the modified algorithm first looses for small numbers
of factors but in the end, it outperforms the original
GRECOND algorithm—i.e. it outperforms GRECOND from
the AFP view. We observed similar behavior also for the
other real datasets.</p>
          <p>Time complexity. We implemented all used algorithms in
MATLAB with critical parts written in C. Theoretical time
complexity is not of our primary concern. Practically, it
follows from our observations that the modification of the
GRECOND algorithm is slightly faster than the original
algorithm and all modifications of the ASSO algorithm are
equally fast as the original.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We evaluated the use of various types of (general)
association rules from the GUHA knowledge discovery method in
the Boolean matrix factorization (BMF) algorithm ASSO
and an utilization of (particular) association rules in the
greedy-approach BMF algorithm GRECOND which is not
based on association rules. The comparison of the
quality of factorization produced by the modified algorithms
with those produced by the original algorithms on both
synthetic and selected real data showed that our modified
algorithms outperform, for some types of rules, the
original ones. Namely the double founded implication and (our
new) negative Jaccard distance quantifiers interpreting the
association rules in ASSO perform better than the founded
implication quantifier used in the original ASSO. Also the
utilization of association matrix in the factor search
initialization stage of the GRECOND algorithm improves
factorization results produced by the algorithm.</p>
      <p>
        The observed results encourage us to the following
future research directions. First, as the role of association
matrix is crucial for the ASSO algorithm (and, as we have
seen, the algorithm can be improved by using other types
of association rules), we have an idea about algorithm
which computes, or updates, the matrix in the process of
searching for factors instead of computing it once before
the search. Second, we will look for a way how to use in
the utilization of association matrix the so-called essential
elements of the input data matrix, which are crucial for the
GREESS algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (which improves the GRECOND
algorithm).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Agrawal</surname>
            <given-names>R.</given-names>
          </string-name>
          , Imielin´ski T.,
          <string-name>
            <surname>Swami</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mining association rules between sets of items in large databases</article-title>
          ,
          <source>Proc. ACM SIGMOD</source>
          <year>1993</year>
          ,
          <volume>207</volume>
          -
          <fpage>216</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bache</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lichman</surname>
            <given-names>M.,</given-names>
          </string-name>
          <article-title>UCI Machine Learning Repository</article-title>
          [http://archive.ics.uci.edu/ml], Irvine, CA: University of California, School of Information and Computer Science,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Belohlavek</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychodil</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <article-title>Discovery of optimal factors in binary data via a novel method of matrix decomposition</article-title>
          ,
          <source>J. Comput. Syst. Sci</source>
          .
          <volume>76</volume>
          (
          <issue>1</issue>
          )(
          <year>2010</year>
          ),
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          (preliminary version in
          <source>Proc. SCIS &amp; ISCIS</source>
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Belohlavek</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trnecka</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>From-below approximations in Boolean matrix factorization: Geometry and new algorithm</article-title>
          ,
          <source>J. Comput. Syst. Sci</source>
          .
          <volume>81</volume>
          (
          <issue>8</issue>
          )(
          <year>2015</year>
          ),
          <fpage>1678</fpage>
          -
          <lpage>1697</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ganter</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          , Springer, Berlin,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Geerts</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goethals</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mielikäinen</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <article-title>Tiling databases</article-title>
          ,
          <source>Proc. Discovery Science</source>
          <year>2004</year>
          ,
          <fpage>278</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Hájek</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holenˇa</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauch</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The GUHA method and its meaning for data mining</article-title>
          .
          <source>Journal of Computer and System Science</source>
          <volume>76</volume>
          (
          <issue>1</issue>
          )(
          <year>2010</year>
          ),
          <fpage>34</fpage>
          --
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hájek</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havel</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chytil</surname>
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The GUHA method of automatic hypotheses determination</article-title>
          .
          <source>Computing</source>
          <volume>1</volume>
          (
          <year>1966</year>
          ),
          <fpage>293</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hájek</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havránek</surname>
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Mechanizing Hypothesis Formation (Mathematical Foundations for a General Theory</article-title>
          ), Springer-Verlag
          <year>1978</year>
          ,
          <volume>396</volume>
          pp.
          <article-title>New edition available online</article-title>
          at http://www.cs.cas.cz/~hajek/guhabook/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lucchese</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orlando</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perego</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <article-title>Mining top-K patterns from binary datasets in presence of noise</article-title>
          ,
          <source>SIAM DM</source>
          <year>2010</year>
          ,
          <volume>165</volume>
          -
          <fpage>176</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Miettinen</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mielikäinen</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gionis</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>The discrete basis problem</article-title>
          ,
          <source>IEEE Trans. Knowledge and Data Eng</source>
          .
          <volume>20</volume>
          (
          <issue>10</issue>
          )(
          <year>2008</year>
          ),
          <fpage>1348</fpage>
          -
          <lpage>1362</lpage>
          (preliminary version in
          <source>Proc. PKDD</source>
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Myllykangas</surname>
            <given-names>S.</given-names>
          </string-name>
          et al,
          <year>2006</year>
          ,
          <article-title>DNA copy number amplification profiling of human neoplasms</article-title>
          ,
          <source>Oncogene</source>
          <volume>25</volume>
          (
          <issue>55</issue>
          )(
          <year>2006</year>
          ),
          <fpage>7324</fpage>
          -
          <lpage>7332</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Rauch</surname>
            <given-names>J.</given-names>
          </string-name>
          :
          <source>Observational Calculi and Association Rules</source>
          . Springer-Verlag,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Rauch</surname>
            <given-names>J.</given-names>
          </string-name>
          , Šimu˚nek M.
          <article-title>: Mining for 4ft rules</article-title>
          ,
          <source>in: Proceedings of Discovery Science</source>
          , Springer-Verlag,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Tan</surname>
            <given-names>P.-N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinbach</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Introduction to Data Mining</article-title>
          .
          <source>Addison Wesley</source>
          , Boston, MA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Xiang</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuhry</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dragan</surname>
            <given-names>F. F.</given-names>
          </string-name>
          ,
          <article-title>Summarizing transactional databases with overlapped hyperrectangles</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>23</volume>
          (
          <year>2011</year>
          ),
          <fpage>215</fpage>
          -
          <lpage>251</lpage>
          (preliminary version in
          <source>Proc. ACM KDD</source>
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>