<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generic Mining of Condensed Pattern Representations under Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergey Paramonov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tao Chen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tias Guns</string-name>
          <email>tias.guns@vub.ac.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>KU Leuven</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Belgium</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vrije Universiteit Brussel</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Our goal is to design and develop a principled approach to generic constraintbased pattern mining of any pattern type. While implementation details for different pattern types are widely different, the principles with respect to enforcing constraints are similar. We focus specifically on local constraints (size, cost, structure) and condensed representations (closed, free, maximal), whose combination is not straightforward. Our approach is inspired by inductive databases and combines ideas from two research directions in pattern mining: the development of specialized algorithms and the use of generic databases or constraint processing methods. It builds on top of existing highly efficient systems tailored for only one particular task and generalizes it to a wider class of problems using filtering techniques. As a result we obtain the hyppro system (from Hybrid Pattern Processing system; pronounced [h2iprO:]) which can efficiently mine patterns under both local and condensed representation constraints, for the three most popular datatypes: graphs, sequences and itemsets. The resulting system can outperform generic pattern mining search methods through novel filtering techniques for condensed representations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>However, both approaches are typically pattern-type specific. Furthermore, these methods focus either on local constraints
or condensed representations, e.g. the approaches based on set systems (with polynomial-delay) [BHPW10; AU] that
are applicable to multiple datatypes do not allow reasoning under local constraints and focus only on a particular type of
condensed patterns such as closed. Combining the two requires more care as the order in which the constraints are enforced
matters [BL06].</p>
      <p>What is missing is a unified approach that works for both local constraints and condensed representations, and with any
pattern type. Indeed, the principles for different pattern types are similar and would merit from a generic methodology,
even if the low-level implementations are different.</p>
      <p>Our approach borrows from the inductive databases concept [IM96] in which data and patterns are treated as equal
first-class citizens, and where both mining and querying/filtering are considered basic operations. A number of inductive
query mining and database techniques have been proposed [IV99; BCFGPR; RJLM02], which focused on integration with
SQL and existing database technology. We do not limit ourselves to the relational algebra or table representations, but
maintain the vision that pattern mining is the process of applying a sequence of operators to data/patterns alike. Following
this vision we propose a 3-step framework consisting of mining all frequent patterns, enforcing local constraints followed
by extracting a condensed representation. Each of the constraints can consist of multiple filtering steps as well as the use
of indexes and other techniques inspired by databases and constraint processing.</p>
      <p>The contributions of this paper are as follows:
• we propose a generic pattern-type agnostic framework for constraint-based pattern mining;
• we present a principled way of combining local constraints and condensed representation constraints;
• this includes a novel and highly effective multi-stage post-processing method for condensed representations.
We have implemented the proposed system for the pattern types of itemsets, sequence and graphs, each time using the
best known frequent pattern miner of that type. In terms of generality, the system is on par with constraint-based mining
systems developed for one pattern type, see Table 1. Furthermore, the experiments show that in many cases the additional
time spent post-processing is reasonably small, and that our system can outperform generic search-based pattern mining
systems that support both local constraints and condensed representations.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>In the foundational paper of Mannila and Toivonen [MT97] pattern mining is presented as the task of findingTh(L, r, q),
given a language of patterns L, a transactional database r and a selection predicate q, such that Th(L, r, q) = {ϕ ∈
L | q(r, ϕ) is true}.</p>
      <p>The language of patterns L determines the pattern type. This can be itemsets (sets of elements), sequences, graphs and so
forth. The database r consists of multiple entries, often called transactions in pattern mining from the original application
domain in customer purchase transaction analysis. A transaction is a tuple (tid, datapoint) where tid is the identifier of the
transaction and datapoint is the data associated with this transaction; it is typically of the same language/type as L. The
goal then is to find all patternsϕ that appear in the data and satisfy constraint q(r, ϕ).</p>
      <p>The most basic setting is that of frequent pattern mining, in this case, the constraint q states that each pattern ϕ must
occur more than θ times in the data, where θ is some user-chosen threshold. More formally q(r, ϕ) = |{t ∈ r|ϕ v t}| ≥ θ;
v is the occurence or subpattern relation. It determines when a pattern is a subpattern of another pattern/datapoint.</p>
      <p>We first look into more detail into the subpattern relation for different pattern types, after which we discuss local
constraints and condensed representations.
Pattern types and subpattern relations
The definition of when a pattern is a subpattern of another pattern depends on the languageL and hence the pattern type.
We consider three pattern types in this work: itemsets, sequences and graphs. Figure 1 demonstrates examples of these
three pattern types, including example datasets and a visualization of the subpattern relation.</p>
      <p>In itemset mining, the subpattern relation is defined as follows: let anitemset I be a set of items I ⊆ {1, . . . , n} and a
dataset D be a set of pairs (tid, J ), where tid is an identifier andJ is an itemset. Then, we say that an itemset I occurs in
transaction (tid, J ) iff I ⊆ J .</p>
      <p>In sequence mining, an ordered list of characters hs1, . . . , smi is called a sequence. A sequence S is called a subsequence
of another sequence C if all characters in S also appear in C and in the same order, with possibly many other characters
in between two matching characters. More formally for S = hs1, . . . , smi and C = hc1, . . . , cni : S v C iff m ≤ n and
for all i ∈ [1, m] there exist integers ji s.t. 1 ≤ j1 ≤ · · · ≤ jm ≤ n such that cji = si. Hence, a sequence S occurs in
transaction (tid, C) of dataset D iff S v C.</p>
      <p>In graph mining, a graph G is a triple (V, E, l) where is V is a set of vertices, E is a set of edges and l is a labeling
function that maps each edge and each vertex to a label. We say that G is a subgraph of a graph H = (Vh, Eh, lh) iff G is
isomorphic to a subgraph of H i.e. there exists an injective function f such that
• if e ∈ E, then f (e) ∈ Eh and l(e) = lh(f (e))
• if v ∈ V , then f (v) ∈ Vh and l(v) = lh(f (v))
We say a graph G occurs in a transaction (tid, H) of a dataset D iff G is isomorphic to a subgraph of H: G v H.</p>
      <p>While checking the subset relation is relatively simple, especially when itemsets are implemented as bitvectors, for
graphs this requires solving the subgraph isomorphism problem which is NP complete [Coo71].</p>
      <sec id="sec-2-1">
        <title>Local constraints</title>
        <p>are constraints that each individual pattern must satisfy. For itemsets, this can be constraints on the length of the set,
membership of an item or on the cost of an itemset in case every item has a known cost. For sequences, there can
additionally be constraints expressed as regular expressions that the target sequences must satisfy and on the arities of the
symbols. For graphs, an extra category involves constraints on the in/out degrees of the edges and the presence of particular
subgraphs.</p>
        <p>Obviously, these constraints can be combined using Boolean logic to obtain more complex constraints including
conjunctions, disjunctions and conditional constraints. However, it remains that all local constraints can be verified on each
pattern individually (hence local).</p>
      </sec>
      <sec id="sec-2-2">
        <title>Condensed representations</title>
        <p>The key idea is to remove redundancy by suppressing patterns with similar properties that are sub- or superpatterns of the
other. The best known example are closed frequent patterns; a pattern is closed [PHM00] if there is no superpattern with
the same frequency. Similarly, free patterns [NDGN13] are those for which there is no subpattern with the same frequency.
Patterns are called maximal [RGN08] if there is no superpattern that also satisfies the constraints.</p>
        <p>Condensed representation constraints can be seen as pairwise constraints checking if one pattern dominates the other
(e.g. if it is a subpattern with the same frequency) [NDGN13].</p>
        <p>When there are additional constraints at play, one can choose to first mine a condensed representation and enforce the
local constraints on the remaining patterns. While computationally beneficial, this can have unintended side effects [BL06].
Consider the case of a database with only a single closed itemset of length n. If we would add the local constraint that each
pattern should be at most length m &lt; n, then no pattern would remain. The more natural interpretation is that condensed
representation constraints should only apply to those patterns that satisfy all local constraints; that is, a pattern that does
not satisfy all local constraints can not dominate another pattern. This is the interpretation we use in this work.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>We design a three step procedure illustrated in Figure 2. This can be seen as an inductive databases pipeline, where we start
from a transactional dataset and run a specialized frequent mining algorithm to obtain the set of all frequent patterns. This
set of patterns is as much a first-class citizen as the dataset is. The set of patterns is then further and further processed to
obtain smaller sets of patterns. While not obvious in this figure, step 3 consists of multiple operations in itself for efficiency
reasons. It should be clear that more operations could be easily added to this framework, both on the dataset and on the
pattern sets in between or at the end.
Algorithm 1 shows the algorithm in more detail. We now explain the three major steps in more detail. Note that patterns
is assumed to be an ordered list of patterns throughout the algorithm; skip set is not.</p>
      <p>Algorithm 1 hyppro algorithm for computing condensed patterns under constraints
1: Input: A transactional database D of type t, a frequency threshold θ, the set of local constraints C and a condensed representation constraint r
2: Output: patterns – the set of condensed frequent patterns under constraints
3: patterns ← run minert(D, θ)
4: patterns ← process local constraints(patterns, C)
5: . Step 3: condensed representation constraint with dominance checks
6: skip set ← neighborhood dominance check(patterns, r)
7: for pattern in sort by lengthr(patterns) do
8: if pattern ∈ skip set then
9: continue
10: candidates ← select wherer(patterns \ skip set)
11: for candidate ∈ candidates do
12: if dominance checkr(pattern, candidate) then
13: skip set = skip set ∪ {candidate}
14: return patterns \ skip set
. 3c: sub/superpattern check
. candidate is dominated by pattern
. Step 1: frequent pattern mining</p>
      <p>. Step 2: local constraints
. 3a: Compare neighbor patterns
. 3b: Preserved properties
3.1.1</p>
      <sec id="sec-3-1">
        <title>Step 1: frequent pattern mining</title>
        <p>In the first step, a highly efficient specialized algorithm is run to extract all frequent patterns from the data. Obviously, the
system to run depends on the pattern type requested by the user.</p>
        <p>Any mining frequent pattern mining system can be used which outputs both the frequent patterns and their frequency
(or full cover set, but frequency is enough). Additionally, if the system is able to also output for each pattern what its direct
subpattern is (as gSpan [YH02] does), then this information can be used in subsequent steps.
3.1.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Step 2: local constraints</title>
        <p>A property of local constraints is that all constraints can be checked on each pattern individually. Hence, in this step, we
iterate over the set of frequent patterns once and check all local constraints on each pattern. Patterns that do not satisfy all
constraints are removed. In practice, during this iteration we also build an index mapping each frequency value to the set
of patterns with that frequency, for later use.
3.1.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Step 3: condensed representation constraint with dominance checks</title>
        <p>This is the most critical part. To illustrate, consider that the typical output of a graph pattern mining algorithm contains ten
thousands graphs or more. This means that a naive dominance check would involve solving up to 100 million subgraph
isomorphism problems as checking all patterns against all others requires O(n2) checks. This computational challenge
creates a need for a better approach, knowing that many of these patterns will not be sub/superpatterns in any case.</p>
        <p>The first key property that we use in our approach is that pairwise dominance is transitive: if a pattern dominates a
subpattern, then that subpattern is not needed to discover the superpatterns that dominate the pattern, hence the subpattern
can immediately be discarded.</p>
        <p>We combine this with the fact that the highly efficient specialized miners do depth-first search, and they output the
patterns the order they are found. This means that the patterns that are output before/after are more likely to be
sub/superpatterns. Hence, in step 3a we first do a linear scan over all patterns and check whether they dominate any of theδ patterns
that are output before or after this one. If so, those subpatterns are disregarded. Even with a reasonably small δ this already
significantly reduces the set of patterns and hence the number of further checks that need to happen.</p>
        <p>Note that in case a mining system outputs the pattern that is its immediate subpattern we can even avoid the dominance
check and only (and much more cheaply) check whether it is an immediate sub/superpattern.</p>
        <p>The second key property that we use is that a given pattern can only dominate patterns that satisfy certain properties,
for example that a subpattern must be smaller in size. It is important to realize that a number of properties are preserved
by the subpattern relationship for each datatype, meaning that if such a property does not hold for a given pattern and
other pattern, then the pattern can never be a subpattern of the other pattern. For the subset relation there are three basic
properties: a subpattern must be smaller in length and its minimal/maximal item must be greater/smaller or equal than the
minimal/maximal one of the other itemset. For the subsequence relation, a subsequence has to be smaller than the other
sequence, and each symbol in the subsequence must appear at least as many times in the other sequence as it does in the
subsequence. For the subgraph isomorphism relation, we can represent the length of a graph as a pair (#vertices,#edges)
and a subpattern must be Pareto-suboptimal, i.e. (x, y) is smaller than (x0, y0) iff x &lt; x0 and y ≤ y0 or x ≤ x0 and y &lt; y0;
we also consider a similar property as for sequences, namely let (lv, le, lw) be a label-triplet with labels lv, lw of nodes v, w
and their edge label le, then the arity of the label-triples must be at least as large in the other graph as in the subgraph.</p>
        <p>We use this to order the set of patterns such that dominating patterns are more likely to be encountered first, thereby
reducing the set of non-skipped patterns more quickly. More concretely, for closed/maximal it is ordered on
decreasing size and for free patterns on increasing size. For graphs, one does not order on size alone but a partial order on
(#nodes, #edges) instead.</p>
        <p>We also use it to efficiently select, in step 3b, from the set of all non-skipped pattern only those that can be dominated
based on such easy-to-check properties as size and label/symbol arities. Notably, to check the closedness or free relation,
patterns only have to be compared to patterns that have the same frequency. Following the inductive databases principles,
we borrow a technique from the database community. Namely, we first index all patterns by frequency (in step 2); then,
in step 3b we first query for all patterns with the same frequency and only iterate over those to verify the easy-to-check
properties.</p>
        <p>Steps 3a and 3b are visualized in Figure 3 for sequences and graphs, respectively:
Adding a new local constraint for a pattern type amounts to writing a function that receives as input a pattern of that type
and outputs yes or no. Adding a new condensed representation amounts to adding support for a new dominance relation on
a pattern type. This hence requires implementing the dominance checkr(pattern, candidate) function (which is also used
in step 3a), as well as the select wherer(patterns \ skip set) function.</p>
        <p>To add a new pattern type, run minert(D, θ) must be added to call a frequent pattern miner of that pattern type and parse
its output appropriately. Then, local constraints and dominance relations must be added as appropriate.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>We start by comparing hyppro to other state-of-the-art pattern mining systems in terms of constraints it supports. As Table
1 shows, our system is the only one that supports condensed representations and local constraints of any pattern type.</p>
      <p>The key questions we next investigate in this experimental section are the following:
• Q1 What is the runtime distribution between three steps of the algorithm?
• Q3 How does the performance of hyppro compare to other generic pattern mining systems such as Dominance
Programing and ASP sequence mining models?</p>
      <p>All experiments have been run on a 64-bit Ubuntu machine with Intel Core i5-3570 CPU 3.40GHz and 8GB memory.
The runtime limit is set to one hour. The code and all datasets will be made available as a github repository at
http://github.com/SergeyParamonov.</p>
      <p>Firstly, in Question Q1, we analyze the runtimes of the different steps. As specialized algorithms (step 1) we used
state-of-the-art methods in pattern mining: Eclat, PPIC and gSpan∗. Table 2 summarizes the results for multiple datasets
for each datatype; we added a local dummy constraint that reads the pattern and always returns true. The runtime results
for the ’free’ condensed representation are similar to closed and not shown. As we see in most cases the post-processing
is bound by a constant factor of 10-20 with respect to the specialized algorithm. Step 2 is typically very fast (subsecond),
as it requires just a linear scan over all generated patterns. Step 3’s expected runtime is harder to characterize as it depends
on the effectiveness of filtering and the remaining number of dominance checks to do. In most cases in the table it takes
seconds and for 5 cases it takes more than 10 seconds.</p>
      <p>We next investigate Question Q2, by measuring the effect of disabling different filtering steps in step 3. Disabling 3a
corresponds to removing the sliding window filter on Line 6 of Algorithm 1; and disabling step 3b amounts to not using
any properties, that is no sorting (Line 7) and no selection filtering (Line 10). As we see from Figure 4 performance is
improved drastically over naive-postprocessing (both 3a and 3b disabled); for graph and sequence mining, pattern ordering
and filtering (3b) give the major runtime improvement. While not visible on the plots, step 3a typically gives a 2-3 times
speedup, especially for maximal patterns. Also for closed itemset mining step 3b plays a crucial role, especially the use of
an index to fetch all patterns with the same frequency. For maximal itemset mining step 3b has little effect but 3a plays a
crucial role here.</p>
      <p>Thirdly, in Figure 5 we compared our system to existing generic search-based pattern mining systems: Dominance
Programming (DP) for itemset mining using a constraint solver [NDGN13] and sequence mining with Answer Set
Programming (ASP) [GGQRS]. As Figure 5c indicates, hyppro outperforms the ASP model on mining condensed representations
(the runtime comparison has to be done on generated samples provided with the ASP solver [GGQRS]).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusion</title>
      <p>While research on generic pattern mining techniques is still ongoing [PLDR15; NG; GDTNR13], these techniques always
focus on one specific pattern type. Furthermore, local constraints and condensed representations are usually studied
separately [PHM00; YHA; YH03]. A notable exception is in the use of constraint programming for itemset mining [NDGN13;
RGN08], of Answer Set Programming for sequence mining [GGQRS], and in a theoretical study of operators in structured
pattern mining [GPN16].</p>
      <p>We proposed a pattern-type agnostic approach. Closest in spirit to this is the C++ template library for pattern mining
[CHSZ08]. However, the goal there is to define pattern-type agnostic building blocks that are used in a generic depth first
∗http://www.borgelt.net/eclat.html
http://sites.uclouvain.be/cp4dm/spm/
https://github.com/Jokeren/DataMining-gSpan
search. In contrast, we chose to reuse highly optimized search methods and allow to extend the pipeline with local and
condensed representation constraints. Indeed, a key asset of the pattern mining community is the development of highly
optimized frequent mining algorithms [YH02; ZPOL97; AGS16]. A principled and generic way of reusing them can
strengthen this position.</p>
      <p>Another related system, from an inductive databases point of view, is the MiningZinc system [GDTNR13]. It provides
a constraint-based modeling language and the system can automatically detect whether a specialized mining algorithm can
be used to mine all patterns satisfying a subset of the constraints, while using generic constraint solvers to post-process
the rest. Our method is less generic in using only a predefined (but extensible) set of building blocks, but it also supports
condensed representations and is pattern-type agnostic. From a user perspective all the user has to do is provide a dataset
and choose the frequency value, the local constraints and the condensed representation.</p>
      <p>To summarize, we presented a pattern type agnostic framework for constraint-based pattern mining. Key to it is the
inductive databases view, under which it can be seen as a pipeline of pattern filtering techniques. To efficiently support
condensed representation constraints we use a number of novel techniques to avoid comparing all patterns to each other.
Also, our method allows to combine local constraints with dominance checks, which few systems support.</p>
      <p>Another trend in pattern mining research is to develop methods that extract a very small set of patterns that together
(a) Graph Mining: AIDS Compound 422. Closed (left), Maximal (right)
(b) Sequence Mining. Unix Users. Closed (left), Maximal (right)
(c) Itemset Mining. German Credit. Closed (left), Maximal (right)
make a good pattern set [GGM04; VLS11; BKS10]. These can be seen as adding constraints on the entire collection of
patterns. While this is outside the scope of this work, it should be noted that most of these methods actually post-process a
set of frequent (closed) patterns, hence they could be added to our pipeline.</p>
      <p>While parallelizing search algorithms is hard, we would like to point out that our approach could be parallelized quite
easily and there is much room for engineering. Since each step essentially filters or transforms the pattern set, one could
do parallel batch processing or use a framework like Apache Spark to handle this automatically.</p>
      <p>This work has been partially funded by the ERC AdG SYNTH (Synthesising inductive data models) of prof. Luc De
Raedt (KU Leuven).
[AIS93]</p>
      <p>Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large
databases. In ACM SIGMOD. ACM Press, 1993.</p>
      <p>Charu C. Aggarwal and Jiawei Han, editors. Frequent pattern mining. Springer, 2014. ISBN: 978-3-319-07820-5.</p>
      <p>(b) DP (blue-square), hyppro (red-star). Mushrooms, Closed(left), Maximal(middle), Size+Closed(right)</p>
      <p>Xifeng Yan, Jiawei Han, and Ramin Afshar. Clospan: mining closed sequential patterns in large databases. In SIAM
2003, pages 166–177.</p>
      <p>Xifeng Yan and Jiawei Han. Closegraph: mining closed frequent graph patterns. In ACM SIGKDD, washington, dc, usa,
august 24 - 27, 2003, 2003, pages 286–295.</p>
      <p>Luc De Raedt, Tias Guns, and Siegfried Nijssen. Constraint programming for itemset mining. In ACM SIGKDD, 2008,
pages 204–212.</p>
      <p>Benjamin Ne´grevergne, Anton Dries, Tias Guns, and Siegfried Nijssen. Dominance programming for itemset mining. In
IEEE ICDM, 2013, pages 557–566.</p>
      <p>John O. R. Aoga, Tias Guns, and Pierre Schaus. An efficient algorithm for mining frequent sequence with constraint
programming. In ECML PKDD, 2016, pages 315–330.</p>
      <p>Martin Gebser, Thomas Guyet, Rene´ Quiniou, Javier Romero, and Torsten Schaub. Knowledge-based sequence mining
with ASP. In IJCAI, ny, usa, 2016, pages 1497–1504.</p>
      <p>Mario Boley, Tama´s Horva´th, Axel Poigne´, and Stefan Wrobel. Listing closed sets of strongly accessible set systems
with applications to data mining. Theor. comput. sci., 411(3):691–700, 2010.</p>
      <p>Hiroki Arimura and Takeaki Uno. Polynomial-delay and polynomial-space algorithms for mining closed sequences,
graphs, and pictures in accessible set systems. In. Proceedings of the 2009 siam international conference on data mining,
pages 1088–1099.</p>
      <p>Francesco Bonchi and Claudio Lucchese. On condensed representations of constrained frequent patterns. Knowledge and
information systems, 9(2):180–201, 2006.
[PLDR15]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Siegfried</given-names>
            <surname>Nijssen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Albrecht</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          .
          <article-title>Constraint-based pattern mining</article-title>
          . In, Frequent pattern mining, pages
          <fpage>147</fpage>
          -
          <lpage>163</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Jian</given-names>
            <surname>Pei</surname>
          </string-name>
          , Jiawei Han, and
          <string-name>
            <given-names>Runying</given-names>
            <surname>Mao</surname>
          </string-name>
          .
          <article-title>CLOSET: an efficient algorithm for mining frequent closed itemsets</article-title>
          .
          <source>InACM SIGMOD workshop dmkd</source>
          ,
          <year>2000</year>
          , pages
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Tomasz</given-names>
            <surname>Imielinski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Heikki</given-names>
            <surname>Mannila</surname>
          </string-name>
          .
          <article-title>A database perspective on knowledge discovery</article-title>
          .
          <source>Commun. acm</source>
          ,
          <volume>39</volume>
          (
          <issue>11</issue>
          ):
          <fpage>58</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>November 1996</year>
          . ISSN:
          <fpage>0001</fpage>
          -
          <lpage>0782</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Tomasz</given-names>
            <surname>Imielin</surname>
          </string-name>
          <article-title>´ski and Aashu Virmani. Msql: a query language for database mining</article-title>
          .
          <source>Data mining and knowledge discovery</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>373</fpage>
          -
          <lpage>408</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Hendrik</given-names>
            <surname>Blockeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Toon</given-names>
            <surname>Calders</surname>
          </string-name>
          , E´lisa Fromont, Bart Goethals, Adriana Prado, and Ce´line Robardet.
          <article-title>An inductive database system based on virtual mining views</article-title>
          .
          <source>In Dmkd</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [MT97] [Coo71] [YH02] Luc De Raedt, Manfred Jaeger, Sau Dan Lee,
          <string-name>
            <given-names>and Heikki</given-names>
            <surname>Mannila</surname>
          </string-name>
          .
          <article-title>A theory of inductive query answering</article-title>
          . In, Icdm,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Heikki</given-names>
            <surname>Mannila</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hannu</given-names>
            <surname>Toivonen</surname>
          </string-name>
          .
          <article-title>Levelwise search and borders of theories in knowledge discovery</article-title>
          .
          <source>Dmkd</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <fpage>241</fpage>
          -
          <lpage>258</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Stephen A.</given-names>
            <surname>Cook</surname>
          </string-name>
          .
          <article-title>The complexity of theorem-proving procedures</article-title>
          . In Acm stoc. New York, NY, USA,
          <year>1971</year>
          , pages
          <fpage>151</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Xifeng</given-names>
            <surname>Yan</surname>
          </string-name>
          and Jiawei Han.
          <article-title>Gspan: graph-based substructure pattern mining</article-title>
          .
          <source>In IEEE ICDM, 9-12 december</source>
          <year>2002</year>
          , maebashi city, japan,
          <year>2002</year>
          , pages
          <fpage>721</fpage>
          -
          <lpage>724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Paramonov</surname>
          </string-name>
          , Matthijs van Leeuwen,
          <string-name>
            <surname>Marc Denecker</surname>
          </string-name>
          , and Luc De Raedt.
          <article-title>An exercise in declarative modeling for relational query mining</article-title>
          .
          <source>In ILP</source>
          ,
          <year>2015</year>
          , pages
          <fpage>166</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Ne</surname>
          </string-name>
          <article-title>´grevergne and Tias Guns. Constraint-based sequence mining using constraint programming</article-title>
          .
          <source>In CPAIOR</source>
          <year>2015</year>
          , barcelona, spain,
          <year>2015</year>
          , pages
          <fpage>288</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Tias</given-names>
            <surname>Guns</surname>
          </string-name>
          , Anton Dries, Guido Tack, Siegfried Nijssen, and Luc De Raedt.
          <article-title>Miningzinc: A modeling language for constraint-based mining</article-title>
          .
          <source>In IJCAI</source>
          ,
          <year>2013</year>
          , pages
          <fpage>1365</fpage>
          -
          <lpage>1372</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Tias</given-names>
            <surname>Guns</surname>
          </string-name>
          , Sergey Paramonov, and Benjamin Ne´grevergne.
          <article-title>On declarative modeling of structured pattern mining</article-title>
          .
          <source>In Declarative learning based programming, AAAI</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Vineet</given-names>
            <surname>Chaoji</surname>
          </string-name>
          , Mohammad Al Hasan,
          <string-name>
            <given-names>Saeed</given-names>
            <surname>Salem</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mohammed J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          .
          <article-title>An integrated, generic approach to pattern mining: data mining template library</article-title>
          .
          <source>Data mining and knowledge discovery</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <fpage>457</fpage>
          -
          <lpage>495</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Mohammed J Zaki</surname>
            , Srinivasan Parthasarathy, Mitsunori Ogihara, and
            <given-names>Wei</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>New algorithms for fast discovery of association rules</article-title>
          .
          <source>Technical report. Rochester</source>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Floris</given-names>
            <surname>Geerts</surname>
          </string-name>
          , Bart Goethals, and
          <article-title>Taneli Mielika¨inen. Tiling databases</article-title>
          .
          <source>In Ds</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Jilles</given-names>
            <surname>Vreeken</surname>
          </string-name>
          , Matthijs van Leeuwen,
          <string-name>
            <given-names>and Arno</given-names>
            <surname>Siebes</surname>
          </string-name>
          .
          <article-title>Krimp: mining itemsets that compress</article-title>
          .
          <source>Dmkd</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ):
          <fpage>169</fpage>
          -
          <lpage>214</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Tijl De Bie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kleanthis-Nikolaos Kontonasios</surname>
            , and
            <given-names>Eirini</given-names>
          </string-name>
          <string-name>
            <surname>Spyropoulou</surname>
          </string-name>
          .
          <article-title>A framework for mining interesting pattern sets</article-title>
          .
          <source>SIGKDD explorations</source>
          ,
          <volume>12</volume>
          (
          <issue>2</issue>
          ):
          <fpage>92</fpage>
          -
          <lpage>100</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>