<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Likely-Occurring Itemsets for Pattern Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tatiana Makhalova</string-name>
          <email>tatiana.makhalova@inria.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei O. Kuznetsov</string-name>
          <email>skuznetsov@hse.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <email>amedeo.napoli@loria.fr</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Lorraine</institution>
          ,
          <addr-line>CNRS, Inria, LORIA, F-54000 Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Université de Lorraine</institution>
          ,
          <addr-line>CNRS, Inria, LORIA, F-54000 Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We consider the itemset mining problem in general settings, e.g., mining association rules and itemset selection. We introduce the notion of likely-occurring itemsets and propose a greedy approach to itemset search space discovery that allows for reducing the number of arbitrary or closed itemsets. This method provides itemsets that are useful for diferent objectives and can be used as an additional constraint to curb the itemset explosion. In experiments, we show that the method is useful both for compression-based itemset mining and for computing good-quality association rules.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        A generic objective of itemset mining is to discover a small set of non-redundant
and interesting itemsets that describe together a large portion of data and that
can be easily interpreted [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Itemset mining can be summarized into two steps: (i) discovering itemset
search space and (ii) selecting interesting itemsets among the discovered ones.</p>
      <p>This paper is devoted to the first step, i.e., the itemset search space
discovery. Since the itemset search space contains exponentially many elements, it is
important to discover as few useless itemsets as possible.</p>
      <p>
        There are several approaches to discover the itemset search space: (i) an
exhaustive enumeration of all itemsets followed by a selection of those satisfying
imposed constraints [19], (ii) a gradual enumeration of some itemsets guided
by an objective (or by constraints) [17], (iii) mining top-k itemsets w.r.t.
constraints [15], (iv) sampling a subset of itemsets w.r.t. a probability distribution
that conforms to an interestingness measure [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ]. To reduce redundancy when
enumerating itemsets, the search space can be shrunk to closed itemsets, i.e., the
maximal itemsets among those that are associated with a given set of objects
(support).
      </p>
      <p>
        The exhaustive enumeration is the most universal way to discover itemset
search space. There exists a lot of very efficient algorithms for its enumeration,
e.g., CbO [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], In-Close [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], LCM [18], Alpine [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and others [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Despite its wide usage and applicability for a large spectrum of
interestingness measures, the exhaustive enumerators usually mine itemsets w.r.t.
frequency, which results in the following issues: using too high frequency threshold
results in a considerable amount of not interesting itemsets, while too low
frequency threshold results in itemset explosion and intractability of itemset mining
methods in practice.</p>
      <p>However, considering the itemset mining problem in more general settings,
e.g., mining association rules and implications, the exhaustive enumeration of
frequent itemsets is usually the only (universal) remedy for the pattern explosion
problem.</p>
      <p>In this paper, we revisit the notion of likely-occurring itemsets introduced
in [14] and propose a greedy approach to itemset search space discovery that
allows for reducing the number of closed itemsets. This method provides itemsets
that are useful for diferent objectives and can be used as an additional constraint
to curb the itemset explosion. In experiments we show that the method is
useful both for compression-based itemset mining and for computing good-quality
association rules.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        We deal with binary datasets within the FCA framework [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>A formal context is a triple K = (G, M, I), where G is a set of objects, M is
a set of attributes and I ⊆ G × M is the incidence relation, i.e., (g, m) ∈ I if
object g has attribute m.</p>
      <p>Two derivation operators (· )′ are defined for A ⊆ G and B ⊆ M as follows:</p>
      <p>A′ = { m ∈ M | ∀ g ∈ A : gIm} , B′ = { g ∈ G | ∀ m ∈ B : gIm} .</p>
      <p>For A ⊆ G, B ⊆ M , a pair (A, B) such that A′ = B and B′ = A, is called
a formal concept, then A and B are closed sets and called extent and intent (or
closed itemsets), respectively.</p>
      <p>The (empirical or observed) probability of an itemset X ⊆ M is given by
P (X) = f r(X) = | X′| /| G| .
3</p>
    </sec>
    <sec id="sec-3">
      <title>Likely-occurring itemsets</title>
      <p>To reduce the itemset search space, we propose an additional constraint that
consists in considering only the itemsets whose observed probability is greater
than the estimated one. The estimated probability is computed under the
independence model. We give the details on the chosen independence model below.
Definition 1. A closed itemset X ⊆ M is called likely-occurring closed (LOC)
if there exists m ∈ X and Y ⊆ X \ { m} , (Y ∪ { m} )′′ = X such that P (X) &gt;
Q · P (Y ) · P ({ m} ), and Q ≥ 1.
g1 a b c d e
g2 a b c d e
g3 a b c d e
g4 a b c
g5 a b c
g6 c
g7 a b
g8 a b d
g9 a d e
g10 a
(a)
abd
abde
5
8
ad
ade
6
abcde</p>
      <p>(b)
i
the node was created at the step i</p>
      <p>The empty itemset ∅ is considered to be likely-occurring by default. The
parameter Q controls how large the diference between the observed probability
P (X) and the estimated probability P (Y ) · P ({ m} ), Y ⊆ X \ { m} of the itemset
X may be. The least restrictive constraint, i.e., Q = 1, requires the observed
probability to be greater than the estimated one. The larger values of Q are
more restrictive, i.e., they require the observed probability to be much larger
than the estimated one.</p>
      <p>According to the definition above, at most | X| splittings should be
enumerated to check whether an itemset X is LOC or not. To make it more tractable
in practice, we propose a relaxation of the LOC itemset and a greedy approach
for its computing, where one needs to check only one splitting per itemset. Let
us proceed to this definition.</p>
      <p>Definition 2. Let { m1, m2, · · · , mk} be a set of attributes arranged in order of
decreasing frequency, i.e., f r(mi) ≥ f r(mj ) for any i ≤ j. A closed itemset
X is likely-occurring closed (LOC) if there exists a LOC itemset Y ⊂ X and
m ∈ X \ Y such that f r(m) ≥ minm∗∈Y f r(m∗), X = (Y ∪ { m} )′′ and P (X) &gt;
Q · P (Y ) · P ({ m} ).</p>
      <p>Example. Let us consider a running example from Fig. 1a, where the attributes
are arranged by decreasing frequency. Itemset ab is an LOC itemset because a is
an LOC itemset and P (ab) &gt; P (a) · P ({ b} ), the same for abd, namely, abd is an
LOC itemset because ab is an LOC itemset and P (abd) &gt; P (ab) · P ({ d} ), etc.</p>
      <p>We propose an algorithm to compute LOC itemsets using Definition 2, its
pseudocode is given in Algorithm 1. This algorithm computes gradually LOC
itemsets by considering one by one attributes of decreasing frequency. Apart from
the threshold Q on the diference in probabilities, the algorithm also supports
threshold F on frequency. By default, we use minimal restrictions, namely Q = 1
(we require the observed probability to be greater than the estimated one) and
F = 0 (we do not impose any frequency constraints).</p>
      <sec id="sec-3-1">
        <title>Algorithm 1 ComputeLOC</title>
        <p>Example. Let us consider the execution tree of the algorithm for a dataset
from Fig. 1a. The algorithm starts constructing a tree adding the attributes of
decreasing frequency. The order in which itemsets are enumerated is specified in
the corresponding nodes.
4</p>
        <p>Likely-occurring itemsets and related notions
Probability-based models are common in itemset and association rule mining.
In this section we consider two widespread approaches to assess itemsets and
association rules, and discuss how they are related to likely-occurring closed
itemsets.</p>
        <p>Independence model and lift. The models based on the comparison of estimated
and observed probabilities of itemsets are quite common in the scientific
literature. The simplest model is the attribute independence model. Under this
model, all items (attributes) are assumed to be independent. Attribute
probability is approximated straightforwardly using the attribute frequency. Then,
the probability of an itemset X is computed as follows:</p>
        <p>Pind(X) = ∏ P (x) =</p>
        <p>
          ∏ f r(x).
x∈X
x∈X
Despite its simplicity, this model is widely used in machine learning, e.g., Naïve
Bayes classifiers are based on it. A natural extension of the attribute
independence model is the partition independence model, where some partitions of X
are assumed to be independent. Lift [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is one of the most common measures to
assess association rules under the partition independence model.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Definition 3.</title>
        <p>Let X → Y be an association rule, then lift is given by
lif t(X → Y ) =</p>
        <p>P (XY )
P (X)P (Y )
=</p>
        <p>f r(XY )
f r(X)f r(Y )
.</p>
        <p>
          Apart from lift, there is a lot of other measures (indices) based on the
comparison of the antecedent and consequent supports, e.g., redundancy
constraints [
          <xref ref-type="bibr" rid="ref4">4,22</xref>
          ], minimum improvement [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], etc. They are commonly used to select
association rules.
        </p>
        <p>The notion of lift can be also adapted in diferent ways for itemset assessment.
For example, one may assess the probability of an itemset under the assumption
that any partition of the itemset into two disjoint sets is independent. If the
observed probability is greater than all the estimated probabilities obtained under
this model, then the itemset is called productive [21].</p>
        <p>The introduced above LOC itemsets, in a certain sense, represent a particular
case of productive itemsets. Instead of considering all possible partitions of X
into two sets of items, we consider only its proper subset Y and attribute m ∈
X \Y . Reformulating the definition of LOC in terms of lift (for association rules),
LOC itemset X is an itemset that consists of LOC itemset Y and attribute m
such that Y ∪ { m} is the generator of X, and lif t(X → m) &gt; Q, Q ≥ 1. Since
Y is also LOC, this reasoning can be done recursively.</p>
        <p>If it is needed, one may reduce further the size of the discovered LOC itemsets
by putting more tighter constraints, i.e., setting higher values for Q (in line 7 of
the Merge procedure given in Algorithm 1):
| (Ic ∪ In)′| &gt; Q · || IGc′|| · | | IGn′| | .</p>
        <p>| G|</p>
        <p>The constraint above is equivalent to the constraint on lift of the association
rule In → Ic, i.e.,
lif t(In → Ic) =</p>
        <p>P (In ∪ Ic)
P (In) · P (Ic)
&gt; Q.</p>
        <p>Moreover, because of the greedy strategy, the constraints hold recursively,
i.e., there exist two disjoint subsets I∗, Ic∗ ⊆ In such that lif t(In∗ → I∗) &gt; Q.</p>
        <p>n c
In experiments we consider how the proposed greedy strategy works for mining
association rules on real-world datasets. Since the computing strategy is greedy,
there are no guarantees that all LOC itemsets (see Definitions 1) will be
enumerated.</p>
        <p>Itemset mining through compression Likely-occurring itemsets may be also useful
for selection of itemsets. We consider the relation between the itemsets selected
by a compression-based itemset miner Krimp [20] and LOC itemsets.</p>
        <p>In Krimp, and similar methods, the length of the code word corresponding
to an itemset X is given by length(X) = − log P (X). Hence the compression
is achieved by replacing several code words representing the itemsets B with a
single code word, such that length(B) &lt; ∑X∈cover(B) length(X). The latter is
equivalent to log P (B) &gt; log(∏X∈cover(B) P (X)). Thus, we obtain the inequality
P (B) &gt; ∏X∈cover(B) P (X), which is very similar to one from the definition of
the LOC itemsets.</p>
        <p>Intuitively, in both cases, an itemset is considered optimal if its observed
probability is greater than the estimated one. However, there are important
differences between the models underlying the definition of “itemset optimality”
(for the LOC itemsets and the model used in Krimp):
1. the both methods use diferent probability estimates of itemsets, namely,
P (X) = f r(X) (for the LOC estimates) and P (X) = ∑ usagues(aXge)(Y ) (for the
Y ∈P
Krimp-like models), where usage(X) is frequency of X in the coverage, and
P is the set of patterns;
2. the “optimality” of an itemset X in the compression-based model used in
Krimp is evaluated not only w.r.t. the dataset but also w.r.t. the other
itemsets selected so far.</p>
        <p>Thus, LOC itemsets may provide better results than the commonly used
frequent closed itemsets, which are used by Krimp. We compare diferent strategies
for discovering itemset search space on real-world datasets in the next section.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        We use the discretized datasets from the LUCS-KDD repository [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and study
LOC itemsets4 for two tasks, namely association rule and itemset mining.
Association rule mining. Frequent (closed) itemsets are commonly used to mine
association rules. We study how useful LOC itemsets compared to frequent closed
itemsets. In experiments we use 2 diferent sets of itemsets to compute rules:
frequent closed (FR.CL.) and likely-occurring closed (LOC) itemsets. The itemsets
are evaluated on 10 datasets, their parameters are given in Table 1. The number
of objects and attributes is denoted by | G| and | M | , respectively. The density of
datasets (the ratio of 1’s) is given in the column “density”. The total number of
closed itemsets is reported in the column “#CL”. The total number of arbitrary
itemset has not been computed.
      </p>
      <p>For each dataset we generate the whole set of LOC itemsets (Q = 1, F = 0),
the sizes of these sets are indicated in the column “#LOC”.</p>
      <p>We chose the frequency threshold for closed itemsets in such a way that
the number of closed itemsets is equal to the number of the LOC itemsets.
The frequency threshold is indicated in the column “fr.thr.” for closed itemsets.
The frequency threshold varies a lot from dataset to dataset. For example, the
smallest threshold is 0.06 for “ecoli” and “glass” datasets and the largest one is
0.33 for “breast” and “zoo” dataset. As we can see from the table, the sizes of
“#LOC” and “#FR.CL.” are quite close one to another.</p>
      <p>To compute association rules we use a miner from MLxtent library
implemented in Python5. The number of rules generated based on LOC and frequent
closed (FR.CL.) is reported in the column “#rules”.
4 The source code for computing LOC itemsets is available at https:
//github.com/makhalova/pattern_mining_tools/blob/master/modules/binary/
likely_occurring_itemsets.py
5 http://rasbt.github.io/mlxtend/</p>
      <p>The number of rules generated based on the LOC itemsets is higher than the
number of rules generated based on frequent closed itemsets. For example, for
the “ecoli” dataset, the number of rules computed on 120 LOC and 120 frequent
closed itemsets is 4768 and 2950, respectively. It can be explained by the fact
that the size of the LOC itemsets is usually larger than the size of frequent
closed itemsets. Thus, a larger amount of rules can be built on LOC itemsets by
splitting each itemset into an antecedent and consequent.</p>
      <p>To evaluate their quality, we consider the most common quality measures for
the association rules, namely support, confidence , lift, leverage, and conviction.
We recall them below.</p>
      <p>
        Let X → Y be an association rule with the antecedent X and the consequent
Y , then the rule support is given by
support(X → Y ) = support(X ∪ Y ) =
(X ∪ Y )′
| G|
leverage(X → Y ) = support(X → Y ) − support(X) · support(Y ) ∈ [
        <xref ref-type="bibr" rid="ref1">−1, 1</xref>
        ].
For independent X and Y leverage is equal to 0.
      </p>
      <p>Let us proceed to the results of the experiments.</p>
      <p>For the generated rules we consider mean values of the aforementioned
quality measures as well as the 75th, 90th, and 95th percentiles. Considering the
percentiles allows us to focus on the quality of the best itemsets, which are
usually of interest to analysts. The averaged over 10 dataset values are reported in
Fig. 2.</p>
      <p>Since we do not set any frequency threshold for LOC, the support of
LOCbased rules, as expected, is lower than the support of the rules based on frequent
closed itemsets (FR.CL.). The top n% of LOC-based rules have higher values
than the top n% of FR.CL.-based ones. For example, the top 10% values (the
90th percentile) of confidence are at least 0.935 for the LOC-based rules, and
Fig. 2: The averaged quality for 2 types of rules: computed based on frequent
closed (FR.CL.) and LOC itemsets. The quality is measured by support,
confidence, lift, and leverage. For each type of rules and each quality measure, the
average values of mean, the 75th, 90th, and 95th percentiles over 10 datasets
from Table 1 are reported
only 0.885 for FR.CL.-based rules, respectively. Thus, considering the top rules,
the LOC-based rules have higher confidence.</p>
      <p>Regarding lift, LOC-based rules provide the best results. The diference in
values is especially noticeable for the top 5% of rules (the 95th percentile). Top
5% LOC-rules have the highest values of lift, on average, 91.38. However, the
lift values of the top 5% of rules vary a lot from dataset to dataset (the standard
deviation is shown in plots by horizontal lines). Nevertheless, the quality,
measured by lift, is consistently higher for LOC-based rules than for FR.CL.-based
rules.</p>
      <p>The leverage is higher for FR.Cl.-based rules. Despite the fact that lift and
leverage difer only in the mathematical operations they use to compare the
observed and estimated supports of rules and their parts, the analysis of rules
based on these measures may lead to very diferent results. The high values of
leverage (and low values of lift) for FR.CL.-based rules are caused by a diferent
order of magnitude of the supports. Very low supports (that is the case of
LOCbased rules) result in high values of lift and low values of leverage.</p>
      <p>Thus, the analysis of the generated rules allows us to conclude that rules
generated based on LOC itemsets have better quality than the rules generated using
roughly the same amount of frequent arbitrary and closed itemsets, respectively.
Compression quality. In Section 4 we discussed the relation between LOC
itemsets and the itemsets ensuring good compression in Krimp.</p>
      <p>In this section we study the applicability of LOC itemsets for this task and
compare them with closed itemsets (used in the original version of Krimp. We
emphasize that, in the compared approaches, the itemset search space is
discovered independently of the itemset mining process.</p>
      <p>To evaluate the ability of the itemsets to compress data, we consider how
many itemsets we need to obtain a certain compression ratio. Fig. 3 shows how
the compression ratio changes w.r.t. the number of considered itemsets. The
initial state corresponds to the point (0,1), meaning that 0 itemsets have been
used to compress data, and the compression ratio is maximal and equal to 1.
The curves that are closer to the point (0,0) correspond to the best strategies
of itemset search space discovery (i.e., the itemset set allows for compressing
data better with a lower number of itemsets). The experiments show that for
“car evaluation”, “wine” and “nursery” datasets the LOC itemsets do not
provide any benefits over the closed itemsets. For the majority of datasets, the
number of LOC is too small to ensure as good compression as with the whole
set of closed itemsets, e.g., “adult”, “breast”, “led7”, and others. Among some
of these datasets, we may still observe better behavior of LOC itemsets, e.g.,
for “hepatitis”, “mushroom”, “letter recognition”, and “page blocks”. There are
also datasets where with the LOC itemsets we achieve as good compression as
with the closed ones, but use a much lower number of itemsets, e.g., “auto”,
“hepatitis”, “soybean”, “zoo”.</p>
      <p>In general, LOC itemsets may be quite useful for itemset selection based on
compression.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper we studied likely-occurring closed itemsets in the context of
association rule mining and itemset selection. In our experiments we show that the
number of frequent enumerated LOC itemsets is much lower than the number
of frequent closed itemsets. However, with LOC itemsets, we obtain association
rules of better quality. The proposed approach may be useful for compression as
well, however, it does not outperform the methods where itemsets are discovered
towards the direction minimizing the total description length.
14. Makhalova, T., Kuznetsov, S.O., Napoli, A.: On coupling FCA and MDL in pattern
mining. In: Proceedings of the 15th International Conference on Formal Concept
Analysis. pp. 332–340. Springer (2019)
15. Mampaey, M., Vreeken, J., Tatti, N.: Summarizing data succinctly with the most
informative itemsets. ACM Transactions on Knowledge Discovery from Data 6(4),
16 (2012)
16. Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules.
Knowledge Discovery in Databases pp. 229–238 (1991)
17. Smets, K., Vreeken, J.: Slim: Directly mining descriptive patterns. In: Proceedings
of the International Conference on Data Mining. pp. 236–247. SIAM (2012)
18. Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating
closed patterns in transaction databases. In: Proceedings of the 7th International
Conference on Discovery Science. pp. 16–31. Springer (2004)
19. Vreeken, J., Tatti, N.: Interesting patterns. In: Aggarwal, C.C., Han, J. (eds.)</p>
      <p>Frequent Pattern Mining, pp. 105–134. Springer (2014)
20. Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress.</p>
      <p>Data Mining and Knowledge Discovery 23(1), 169–214 (2011)
21. Webb, G.I.: Self-sufficient itemsets: An approach to screening potentially
interesting associations between items. ACM Transactions on Knowledge Discovery from
Data 4(1), 1–20 (2010)
22. Zaki, M.J.: Generating non-redundant association rules. In: Proceedings of the 6th
International Conference on Knowledge Discovery and Data Mining. pp. 34–43.
ACM SIGKDD (2000)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aggarwal</surname>
          </string-name>
          , C.C.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          . (eds.): Frequent Pattern Mining. Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imieliński</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In: Proceedings of the International Conference on Management of Data</source>
          . vol.
          <volume>22</volume>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          . ACM SIGMOD (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A partial-closure canonicity test to increase the efficiency of CbOtype algorithms</article-title>
          . In: International Conference on Conceptual Structures. pp.
          <fpage>37</fpage>
          -
          <lpage>50</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Mining frequent patterns with counting inference</article-title>
          .
          <source>In: ACM SIGKDD Explorations Newsletter</source>
          . vol.
          <volume>2</volume>
          .
          <string-name>
            <surname>ACM</surname>
            <given-names>SIGKDD</given-names>
          </string-name>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bayardo</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunopulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Constraint-based rule mining in large, dense databases</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>4</volume>
          (
          <issue>2-3</issue>
          ),
          <fpage>217</fpage>
          -
          <lpage>240</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Boley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucchese</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paurat</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gärtner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Direct local pattern sampling by efficient two-step random procedures</article-title>
          .
          <source>In: Proceedings of the 17th International Conference on Knowledge discovery and Data Mining</source>
          . pp.
          <fpage>582</fpage>
          -
          <lpage>590</lpage>
          . ACM SIGKDD (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Boley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moens</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gärtner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linear space direct pattern sampling using coupling from the past</article-title>
          .
          <source>In: Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <fpage>69</fpage>
          -
          <lpage>77</lpage>
          . ACM (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dynamic itemset counting and implication rules for market basket data</article-title>
          .
          <source>In: Proceedings of the International Conference on Management of Data</source>
          . pp.
          <fpage>255</fpage>
          -
          <lpage>264</lpage>
          . ACM SIGMOD (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Coenen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The LUCS-KDD discretised/normalised ARM and CARM data library</article-title>
          . http://www.csc.liv.ac.uk/~frans/KDD/Software/LUCS_KDD_DN (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis</source>
          . Springer Berlin Heidelberg (
          <year>1999</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -59830-2, http://dx.doi.org/10. 1007/978-3-
          <fpage>642</fpage>
          -59830-2
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imielinski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Alpine: Progressive itemset mining with definite guarantees</article-title>
          .
          <source>In: Proceedings of the International Conference on Data Mining</source>
          . pp.
          <fpage>63</fpage>
          -
          <lpage>71</lpage>
          . SIAM (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.:</given-names>
          </string-name>
          <article-title>A fast algorithm for computing all intersections of objects from an arbitrary semilattice</article-title>
          .
          <source>Nauchno-Tekhnicheskaya Informatsiya Seriya 2- Informatsionnye Protsessy i Sistemy (1)</source>
          ,
          <fpage>17</fpage>
          -
          <lpage>20</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.A.:</given-names>
          </string-name>
          <article-title>Comparing performance of algorithms for generating concept lattices</article-title>
          .
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          <volume>14</volume>
          (
          <issue>2-3</issue>
          ),
          <fpage>189</fpage>
          -
          <lpage>216</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>