<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining rare sequential patterns with ASP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmed Samet Thomas Guyet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UniversitØ Rennes 1/IRISA-UMR6074 AGROCAMPUS-OUEST/IRISA-UMR6074 Benjamin Negrevergne UniversitØ Paris-Dauphine, PSL Research University</institution>
          ,
          <addr-line>CNRS, LAMSADE</addr-line>
        </aff>
      </contrib-group>
      <fpage>51</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>This article presents an approach of meaningful rare sequential pattern mining based on the declarative programming paradigm of Answer Set Programming (ASP). The setting of rare sequential pattern mining is introduced. Our ASP approach provides an easy manner to encode expert constraints on expected patterns to cope with the huge amount of meaningless rare patterns. Encodings are presented and quantitatively compared to a procedural baseline. An application on care pathways analysis illustrates the interest of expert constraints encoding.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Copyright c by the paper’s authors. Copying permitted for private and academic purposes.</p>
      <p>In: Nicolas Lachiche, Christel Vrain (eds.): Late Breaking Papers of ILP 2017, OrlØans, France, September 4-6, 2017, published at
http://ceur-ws.org
Seq.</p>
      <p>s2
ha; c; b; ci</p>
      <p>s3
ha; b; ci</p>
      <p>s4
ha; b; ci</p>
      <p>s5
ha; ci
s6
hbi
s7
hci</p>
      <p>
        Set Programming (ASP), Constraint Programming (CP) and Satisability Solving (SAT). For each approach,
the objective is to propose a uniform representation for examples, background knowledge and hypotheses. ILP
started with encodings in Prolog [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ], and alternative systems have been implemented based on CP [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ] or on ASP
[
        <xref ref-type="bibr" rid="ref6">7</xref>
        ]. In declarative pattern mining, most of the approaches have been implemented using SAT [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ], CP [
        <xref ref-type="bibr" rid="ref8 ref9">9, 10</xref>
        ] and
some with ASP [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ]. In this work, we choose the paradigm of ASP to implement our rare sequential pattern
mining to benet from a rst-order logic programming language. It makes the encoding of extra constraints
easier and its solvers have proved their eciency [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ].
      </p>
      <p>The contributions of this paper are threefold:
1. we formalize the general problem of rare sequential pattern mining and we model it in ASP, together with
two important variations of this problem (non-zero-rare and minimal rare patterns). Thanks to the exibility
of ASP, we were able to solve all three problems with the same ASP solver, avoiding the tedious work of
designing and implementing an algorithm for each new problem.
2. We provide important insights on modelling eciency by comparing several alternative models. The
experimental comparison shows that general ASP-based approaches can compete with ad-hoc specialized
algorithms.
3. We apply rare sequential mining to our target application of analyzing care pathway data. In particular, we
demonstrate the benet of additional constraints to extract meaningful results in this domain.
2</p>
      <p>Rare sequential patterns: denitions and properties
This section introduces the basic concepts of rare sequential pattern mining.</p>
      <p>Let I = fi1; i2; : : : ; ijIjg be a set of items. A sequence s, denoted by hsj i1 j n is an ordered list of items
sj 2 I, where n denotes the length of the sequence.</p>
      <p>Given two sequences s = hsj i1 j n and s0 = hs0j i1 j m with n m, we say that s is a subsequence of s0,
denoted s s0, i there exists n integers i1 &lt; : : : &lt; in such that sk = s0ik ; 8k 2 f1; : : : ; ng.</p>
      <p>Let D = s1; s2; : : : ; sN , be a dataset of N sequences. The support of any sequence s, denoted supp(s), is
the number of sequences si 2 D that contain s:
supp(s) =
si 2 Djs
si</p>
      <p>Let 2 N+ be a frequency threshold given by the data analyst. A sequence s is rare i supp(s) &lt; . In such
case, the sequence s is called a rare sequential pattern or a rare pattern for short.</p>
      <p>It is noteworthy that rare patterns are strongly related to the frequent patterns, i.e. the sequence s such that
supp(s) . The set of rare patterns is the dual of frequent patterns. A pattern that is not rare is frequent and
vice versa.</p>
      <p>Moreover, the rarity constraint is monotone, i.e. for two sequences s and s0, if s s0 and s is rare, then s0 is
rare. This property comes from the anti-monotonicity of the support measure.</p>
      <p>Example 2.1 (Sequences and rare patterns) Table 1 presents a simple dataset D, dened over a set of
items I = fa; b; c; dg. The support of sequence p1 = ha; bi is 4 and the support of p2 = hc; ci is 1. For a support
threshold = 2, p2 is a rare pattern but p1 is not. Due to the monotonicity of rarity, every extension of p2 is
also rare (e.g. ha; c; ci).</p>
      <p>We now introduce two additional denitions: zero-rare patterns and minimal rare patterns . A zero-rare
pattern is a rare pattern with a null support. This means that this pattern does not occur in dataset sequences.
A non-zero-rare pattern is a rare pattern which is not zero-rare.</p>
      <p>Finally, a sequence s is a minimal rare pattern (mRP) i it is rare but all its proper subsequences are frequent.
(1)
(2)
mRP = fsjsupp(s) &lt;
^ 8s0
s; supp(s0)</p>
      <p>g
fg
&lt; c; c &gt;
&lt; c &gt;
&lt; d; a &gt;
&lt; d; b &gt;</p>
      <p>&lt; d; c &gt;
&lt; d &gt;</p>
      <p>seq (1 ,1 , d ). seq (1 ,2 , a ). seq (1 ,3 , b ). seq (1 ,4 , c ).
seq (2 ,1 , a ). seq (2 ,2 , c ). seq (2 ,3 , b ). seq (2 ,4 , c ).
seq (3 ,1 , a ). seq (3 ,2 , b ). seq (3 ,3 , c ).
seq (4 ,1 , a ). seq (4 ,2 , b ). seq (4 ,3 , c ).
seq (5 ,1 , a ). seq (5 ,2 , c ).
seq (6 ,1 , b ).
seq (7 ,1 , c ).</p>
      <sec id="sec-1-1">
        <title>Listing 1: Facts specifying the sequence dataset of Table 1</title>
        <p>Example 2.2 (minimal rare patterns) Figure 1 illustrates the lattice of non-zero rare patterns in the dataset
D of Table 1 for = 2. For example, ha; c; bi is a non-zero-rare pattern, but is not a mRP since one of its
subpatterns is rare (i.e. hc; bi ha; c; bi). hb; ai (omitted in Figure 1) is a zero-rare pattern as it does not appear
in D.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Mining rare sequential patterns with ASP</title>
      <p>
        In this section, we detail ASP encodings for mining rare and minimal rare sequential patterns. ASP is a declarative
programming paradigm founded on the semantic of stable models [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]. From a general point of view, declarative
programming gives a description of what is a problem instead of specifying how to solve it. A program describes
the problem and a dedicated solver nds out its solutions.
      </p>
      <p>
        An ASP program is a set of rules of the form a0 :- a1; : : : ; am; not am+1; : : : ; not an, where each ai is a
propositional atom for 0 i n and not stands for default negation. Atoms can be encoded using a rst-order logic
syntax. If n = 0, such rule is called a fact. If a0 is omitted, such rule represents an integrity constraint. The
reader can refer to Janhunen et al. [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ] for a more detailed introduction to ASP. Dedicated solvers, such as
clingo [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ], ensure the ecient solving of ASP encodings.
      </p>
      <p>In the following, Section 3.1 introduces the ASP generation of candidate sequential patterns, then the rarity
constraints is introduced in Section 3.2 and nally, Section 3.3 introduces additional constraints to extract
eciently only the minimal-rare patterns.</p>
      <p>First of all, the dataset has to be encoded as facts. It is represented by atoms seq(s,t,i) stating that item i
occurs at time t in sequence s. Listing 1 illustrates facts encoding of the sequence dataset of Table 1.
3.1</p>
      <p>
        Enumerate candidate sequential patterns in ASP
Listing 2 species the candidate generation. It reuses several notations and principles introduced by Gebser et
al. [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ]. Line 1 lists all atoms present in the dataset. For each item i 2 I a unique atoms item(i) is generated.
The token "_" stands for an anonymous variable that does not recur anywhere. Line 2 denes the sequence
lengths of each sequence from the dataset. The length of the sequence is an item position for which there is no
items afterwards. Lines 4 to 7 enumerate all possible patterns. A pattern is a sequence of positions to ll up with
exactly one item 1. It is encoded by atoms pat(x,i) where x is a position in the pattern and i 2 I. Predicate
patpos/1 takes one variable and denes the sequence position for pattern items. Rules with curly brackets in
the head are choice rules such as lines 5 and 7. Rule at line 5 makes the pattern length choice which can not be
larger than maxlen. Line 7 chooses exactly one atom pat/2 for each position X and thus generates the candidate
patterns combinatorics.
      </p>
      <p>1We do not consider sequences of itemsets for sake of encoding simplication.</p>
      <p>
        Lines 9 to 13 evaluate the pattern support. This encoding uses the ll-gaps strategy proposed in [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] (see
Figure 2). The idea consists in mapping each sequence position to one (and exactly one) pattern position. In
addition, some sequence positions are associated with an item that does not match the item in the mapped
pattern position. This denotes lling positions and indicates which part of the pattern has already been read .
Lines 9 to 11 traverse each sequence s = hsji1 j n in D to yield atoms of the form occ(s,l,pl) where pl is a
position in the pattern and l is a position in the sequence s. occ(s,l,pl) indicates that the pl-th rst items of
the patterns have been read at position l of the sequence. Line 9 searches for a compatible position P with the
rst pattern item. Line 11 yields a new atom occ(s,l,pl) while the (l 1)-th item has been read at position
(pl 1) and the pl-th sequence item matches the l-th pattern item. In any case, line 10 yields an atom associating
position pl with already read positions of the pattern. The support(s) is yielded line 13 while sequence t holds
the current pattern.
      </p>
      <p>1 item (I) :- seq (_ , _ ,I ).
2 seqlen (S ,L) :- seq (S ,L ,_), not seq (S ,L +1 , _ ).
4 patpos (1).
5 { patpos (X +1) } :- patpos (X), X &lt; maxlen .
6 patlen (L) :- patpos (L), not patpos (L +1).
7 1 { pat (X ,I) : item (I) } 1 :- patpos (X ).
9 occ (S ,1 , P) :- pat (1 , I), seq (S ,P ,I ).
10 occ (S ,L ,P) :- occ (S , L , P -1) , seq (S ,P ,_ ).
11 occ (S ,L ,P) :- occ (S , L -1 , P -1) , seq (S ,P ,C), pat (L ,C ).
13 support (S) :- occ (S , L , LS ), patlen (L), seqlen (S , LS ).</p>
      <sec id="sec-2-1">
        <title>Listing 2: Encoding of candidate patterns generation For more details about ll-gaps strategy and its alternatives, we refer the readers to the article of Gebser et al. [11]. 3.2</title>
        <p>Ecient encoding of rarity constraint (ASP-RSM)
The modularity of ASP encoding enables to easily add constraints to the previous program. Where Gebser et
al. added constraints to select only the frequent patterns, we select the answer sets that correspond to
non-zerorare patterns.</p>
        <p>The following constraint states that the number of supported sequences must be strictly below the given
threshold th (i.e. not above th). th is a program constant that can be specied at solving time.
:- # count { S : support (S) } &gt;= th .</p>
        <p>Although this is a correct formulation, we propose an alternative encoding that explicitly expresses the
monotonicity property to help the solver pruning non-zero-rare patterns. This encoding introduces a rare/1 predicate
(see listing 3). Instead of evaluating the support of pattern hpji1 j n, it evaluates independently the support of
each of its prexes. For any l; 1 l n, rare(l) means that the subpattern hpj i1 j l is rare. Line 18 imposes
to have the atom rare(n) where n is the pattern length. Line 17 gives a simpler manner to yield such atom: if a
prex-pattern of length l is rare, then all patterns of length l0, l0 &gt; l, are rare. This rule prevents the solver from
evaluating all rules of line 15 and it potentially prevents from evaluating occ/3 atoms. Finally, line 19 prunes
zero-rare patterns imposing to have at least one supported sequence.
15 rare ( IL ): - IL =1.. L , patlen ( L ) ,
16 # count { S : occ (S , IL , LS ) , seqlen (S , LS ) } &lt; th .
17 rare ( L ) : - rare (L -1) , L &lt;= PL , patlen ( PL ).
18 : - not rare ( L ) , patlen ( L ).
19 : - not support ( S ) : seq (S ,_ , _ ).</p>
      </sec>
      <sec id="sec-2-2">
        <title>Listing 3: Encoding of rare sequence mining As the number of non-zero-rare patterns may be very high, the following section presents ASP-MRSM, an extension of ASP-RSM that extracts the condensed set of minimal-rare patterns, mRP. 3.3</title>
        <p>ASP Minimal Rare Sequence Miner (ASP-MRSM)
The encoding in Listing 4 can be used to mine minimal rare sequences. It is based on the observation that it
is sucient to evaluate the support of all sub-patterns of size n 1 to determine whether a pattern p of size n
is a minimal. Let p = hpii1 i n be a non-zero-rare pattern. p is a mRP i 8u 2 f1; : : : ; ng, the sub-pattern
pu = hpiii2f1;:::;u 1;u+1;:::;ng is frequent (i.e. not rare). According to Equation 2, p is a mRP i 8p0 p, p0 is
frequent. It is then clear that if p is a mRP, then 8u; pu is frequent. Conversely, for all sequence s such that
s p, 9u such that s pu. Then, as pu is frequent and by anti-monotonicity of the frequency constraint, s is
frequent too.</p>
        <p>Furthermore, the encoding takes advantage of two properties of sequential patterns:
1. a sequence that supports the pattern p supports each pattern pu,
2. if a sequence s is supported by pu but not by p, then for all l 2 [1; u 1], occ(s,l,pl) is part of the ASP
model computed by Listing 2, where (pl)l2[1;(n 1)] is an occurrence of pu. In fact, as there is no dierence
between pu and p before position u, there is no dierences between their occurrences.</p>
        <p>These properties are helpful to decide whether pu is rare. In Listing 4, line 20 enumerates all subpatterns,
identied by U. Similarly to occ/3 predicates, atoms spocc(s,u,l,pl) describe the occurrence of pu in sequence
s (Figure 3).</p>
        <p>Lines 22 to 26 compute occurrences of pattern pu according to the ll-gaps principle (see section 3.1) with
two dierences: (1) spocc/4 are yielded only if sequence t is not supported, otherwise the sequence is obviously
supported knowing that pu p; and (2) spocc/4 starts from pattern position u reusing occ(t,u 1,_) atoms. It
thus avoids redundant computation according to the second property above. Lines 28 to 33 determine whether
the subpattern pu is rare. This encoding diers from section 3.2 mainly at lines 30-32. The total number of
sequences supported by pu is the number of sequences supported by p (line 32) plus the number of sequences
supported by pu but not by p (line 31), i.e. sequences for which spocc/4 atoms reached the last pattern item.
Finally, line 33 eliminates answer sets for which a subpattern is rare.</p>
        <p>
          It is worth noticing that in ASP two answer sets can not share information during the solving process. Gebser
et al. [
          <xref ref-type="bibr" rid="ref10">11</xref>
          ] proposed to use a solver extension, called asprin, to encode the mining of closed patterns, i.e. patterns
which have any larger patterns with the same support. A similar approach may be used for mRP. Our approach
prefers a pure declarative but more ecient encoding.
22 spocc (S ,1 ,2 , P) :- seq (S ,P ,I), pat (2 , I), not support (S ).
23 spocc (S ,U ,U ,P) :- suppat (U), occ (S ,U -1 , P), not support (S ).
24 spocc (S ,U ,L ,P +1) :- spocc (S ,U ,L ,P), seq (S ,P +1 , _ ).
25 spocc (S ,U ,L +1 , P +1) :- spocc (S ,U ,L ,P), seq (S ,P +1 , I),
26 pat (L +1 , I ).
28 sprare (U ,U -1): - sprare (U -1) , suppat (U ).
29 sprare (U ,L) :- sprare (U , L -1) , L &lt;= LP , patlen ( LP ).
30 sprare (U , IL ): - suppat (U), IL =U +1.. L , patlen (L),
31 # count { S: spocc (S ,U ,IL , LS ), seqlen (S , LS );
32 S: support (S) } &lt; th .
33 :- sprare (U ,L), patlen (L ).
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Listing 4: Encoding part for minimal rare patterns</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and results</title>
      <p>In this section, we evaluate our approach on synthetic and real datasets. First, we use synthetic datasets to
compare performances of our ASP encodings with procedural algorithms. Then, we demonstrate the exibility
of our declarative pattern mining approach on a case study.</p>
      <p>In all experiments, we used clingo, version 5.0. All programs were run on a computing server with 16Gb RAM.
Multi-threading capabilities of clingo have been disabled (1) to make a fair comparison with the procedural
algorithms that is not parallel and (2) to prevent from having high variance in run times measurements due to
multi-threading strategies of clingo.
4.1</p>
      <p>
        Runtime and memory eciency on synthetic datasets
In this rst set of experiments, we use synthetic datasets to compare the eciency of our ASP encodings with
procedural algorithms. Four approaches have been compared: ASP-RSM and Apriori-Rare extract non-zero-rare
patterns; and ASP-MRSM and MRG-EXP extract minimal rare patterns. Since Apriori-Rare [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ] and MRG-Exp
[
        <xref ref-type="bibr" rid="ref13">14</xref>
        ], were originally developed for rare itemset mining, they were adapted to mine rare sequential patterns 4.
      </p>
      <p>We use several synthetic datasets to measure the impact of various parameters such as vocabulary size and
mean sequence length on the runtime and on the memory usage. Our sequence generator 4 produces datasets
following a 3-step retro-engineering procedure inspired from the IBM Quest Synthetic Data Generator: 1) a set
of random patterns is generated, 2) for each pattern, a list of occurrences is generated, and 3) synthetic sequences
are built so that each sequence contains a pattern if the sequence is in the pattern occurrence list. The sequences
are then completed with random items so that the mean sequence length of the resulting dataset reaches the
input parameter .</p>
      <p>Figure 4 presents runtimes for with respect to dataset size (a), sequence length (b), and vocabulary size (c).
Table 2 presents the number of extracted patterns and the memory footprint.</p>
      <p>
        First, we can see in Figure 4-(a) that ASP-MRSM is an order of magnitude faster than alternatives approaches.
This is an important result since previous work on declarative pattern mining have shown that declarative
approaches are often slower than the procedural counterparts [
        <xref ref-type="bibr" rid="ref10 ref14">11, 15</xref>
        ]. This experiment demonstrates that the
overhead induced by the declarative encoding is balanced by the more ecient solving strategy achieved by the
ASP solver.
      </p>
      <p>Second, we focus on ASP-RSM and Apriori-Rare (non-zero-rare-patterns). In general, decreasing the threshold
induces a larger search space and thus higher runtimes. However, we can see that ASP-RSM is far less sensitive to
variations of the threshold than Apriori-Rare is. As a consequence, ASP-RSM is faster for very low thresholds.
Apriori-Rare is greatly impacted by the search space size, even though the number of rare patterns remains
almost constant as it can be seen in Table 2. In contrast the runtime of ASP-RSM remains fairly low, even
when the threshold is low. This demonstrates again that the ASP solver is able to explore the search space more
eciently than our procedural algorithm.</p>
      <p>4The source codes of the ASP programs, of procedural algorithms and of the dataset generator are available at http://www.
ilp-paper-1.co.nf/
0:5 1 1:5
Dictionary size ( jDj/1000)</p>
      <p>=20 = 20%</p>
      <sec id="sec-3-1">
        <title>ASP-MRSM ASP-RSM MRG-EXP Apriori-Rare</title>
        <p>2
(b)
10
15 20 25
Mean sequence length ( )
jDj = 250 = 20%
30
(c)
We now apply our approach to care pathway analytics to illustrate that the number of rare patterns can be
reduced using additional expert constraints. This is not a new result but we show that extracting rare patterns
becomes practically interesting with constraints. The ASP approach was here crucial to incorporate constraints
easily. Care pathway analytics is a research eld that consists in exploring patient care pathways to understand
care uses, more especially drugs consumption in our case, and their impacts on patients health.</p>
        <p>This case study analyzes care pathways of patients exposed to anti-epileptic drugs who had epileptic seizures.
Rare pattern mining aims at identifying (i) patients to exclude from the study because of their unexpected rare
care pathways, for instance due to specic heavy treatments (cancer, hepatitis C, AIDS, etc.); (ii) rare adverse
drugs reactions. The dataset is made of 500 care pathways with drugs deliveries during one year. This represents
a total amount of 101; 793 events, with jIj = 3; 671. Drug deliveries are encoded with seq(t,p,e) atoms where e
Threshold
( )
5%
10%
20%
30%
5%
10%
20%
30%
5%
10%
20%
30%
s
n
r
e
t
t
a
P
#
2;000
1;500
1;000</p>
        <p>Memory usage (bytes)
#Rare</p>
        <p>#mRP
is the active component of the delivered drug.</p>
        <p>The rough mining of minimal rare sequential patterns with = 10% yields 7; 758 mRPs of length at most 3
(maxlen = 3). This pattern number makes these results dicult to analyze by clinicians.</p>
        <p>Two types of constraints have been added in order to illustrate the pattern reduction. The rst one is a
relaxation of rare patterns. Too rare patterns are poorly signicant but so numerous. We thus introduced a
second parameters fmin such that the support of extracted patterns must be between fmin and . The second
type of constraints concerns the maximum delay between two events of a pattern occurrence. This constraint,
known as the maxgap constraint, avoids to enumerate an occurrence while two of its events are too far in time.
It assumes that the rst event may causally be related to the second one.</p>
        <p>Figure 5 illustrates the eciency of this constraint to reduce the number of rare patterns with = 10% and
maxlen = 3. The plot on the left illustrates the number of patterns w.r.t. the maxgap constraints value (from
3 to 25 events). Considering that having larger max-gap reduces the number of non-zero patterns, this number
increases with maxgap. The plot on the right illustrates the number of patterns w.r.t. the minimal frequency
constraint, fmin (from 0.4% to 6%). The closer fmin is to , the fewer patterns will be generated. This number
decreases exponentially with fmin. The number of patterns extracted with fmin = 6% fall down to 87. This
number of patterns is suciently low to be presented to clinicians. Moreover, the computing time is reduced
by using expert constraints. This nal experiment is solved in 425s which is signicantly faster than in our
experiments on synthetic datasets.</p>
        <p>The two constraints can be combined. We choose to select patterns with fmin = 5% and maxgap = 3. In such
case, the number of patterns fall down to 133. One interesting pattern describes the deliveries of lamotrigine,
an anti-epileptic drug, co-occurring with two antibiotics ceftriaxone and amoxicillin. This unexpected potential
adverse drug reaction may lead to new epidemiological questions.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        This article extends Gebser et al. [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] work, on declarative sequential pattern mining with Answer Set
Programming by proposing an ecient encoding for rare patterns mining. The paper shows the versatility of the
declarative approach to design easily a new data mining task. More specically, we have shown that ASP
encodings is an ecient solution compared to procedural approaches. Unfortunatly, the memory consumption limits
the analysis to mid-size databases. This problem is known in pattern mining with ASP [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] and the solution can
be to use dedicated propagators [
        <xref ref-type="bibr" rid="ref14">15</xref>
        ] to improve computing eciency of the approach. We have also illustrated the
benet of our approach by extracting sequential patterns satisfying additional expert constraints, in the context
of a care pathway analytics application. In this context, our approach can eciently prune useless patterns and
reduce computation times. As future work, we aim at nding and adding contraints through a human-machine
interface in which the user expresses his needs and the machine converts them into ASP constraints.
      </p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>This work is a part of the PEPS project funded by the French national agency for medicines and health products
safety (ANSM), and of the SePaDec project funded by Region Bretagne.
[1] Koh, Y.S., Ravana, S.D.: Unsupervised rare pattern mining: A survey. Transactions on Knowledge Discovery
from Data 10(4) (2016) 129
37 (2016) 1324
Potassco: The</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Discovery of rare sequential topic patterns in document stream</article-title>
          .
          <source>In: Proceedings of the SIAM International Conference on Data Mining</source>
          . (
          <year>2014</year>
          )
          <fpage>533541</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Finding minimal rare itemsets with an extended version of the Apriori algorithm</article-title>
          .
          <source>In: Proceedings of the International Conference on Applied Informatics</source>
          . Volume
          <volume>1</volume>
          . (
          <year>2014</year>
          )
          <fpage>8592</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Muggleton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Raedt</surname>
          </string-name>
          , L.:
          <article-title>Inductive logic programming: Theory and methods</article-title>
          .
          <source>The Journal of Logic Programming</source>
          <volume>19</volume>
          (
          <year>1994</year>
          )
          <fpage>629679</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>De</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Guns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Nijssen</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>Constraint programming for itemset mining</article-title>
          .
          <source>In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <source>ACM</source>
          (
          <year>2008</year>
          )
          <fpage>204212</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rouveirol</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Constraint inductive logic programming</article-title>
          .
          <source>In: Advances in Inductive Logic Programming</source>
          . IOS Press (
          <year>1996</year>
          )
          <fpage>277294</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Corapi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
          </string-name>
          , E.:
          <article-title>Inductive logic programming in answer set programming</article-title>
          .
          <source>In: International Conference on Inductive Logic Programming</source>
          , Springer (
          <year>2011</year>
          )
          <fpage>9197</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Jabbour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sais</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salhi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Decomposition based SAT encodings for itemset mining problems</article-title>
          .
          <source>In: Proceeding of the Pacic-Asia Conference Advances on Knowledge Discovery and Data Mining</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          . (
          <year>2015</year>
          )
          <fpage>662674</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>De</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Guns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Nijssen</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>Constraint programming for data mining and machine learning</article-title>
          .
          <source>In: Proceedings of the Twenty-Fourth AAAI Conference on Articial Intelligence (AAAI-10)</source>
          . (
          <year>2010</year>
          )
          <fpage>16711675</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Guns</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dries</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nijssen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tack</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raedt</surname>
          </string-name>
          , L.D.:
          <article-title>Miningzinc: A declarative framework for constraintbased mining</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>244</volume>
          (
          <year>2017</year>
          )
          <fpage>629</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gebser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quiniou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Knowledge-based sequence mining with ASP</article-title>
          .
          <source>In: Proceedings of the Twenty-Fifth International Joint Conference on Articial Intelligence (IJCAI)</source>
          .
          <article-title>(</article-title>
          <year>2016</year>
          )
          <fpage>14971504</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Janhunen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niemel</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>The answer set programming paradigm</article-title>
          .
          <source>AI Magazine</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Gebser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Kaufmann,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Ostrowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Potsdam answer set solving collection</article-title>
          .
          <source>AI</source>
          Communications
          <volume>24</volume>
          (
          <issue>2</issue>
          ) (
          <year>2011</year>
          )
          <fpage>107124</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Generating rare association rules using the minimal rare itemsets family</article-title>
          .
          <source>International Journal on Software and Informatics</source>
          <volume>4</volume>
          (
          <issue>3</issue>
          ) (
          <year>2010</year>
          )
          <fpage>219238</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Negrevergne</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guns</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Constraint-based sequence mining using constraint programming</article-title>
          .
          <source>In: Proceedings of the International Conference on Integration of AI</source>
          and
          <article-title>OR Techniques in Constraint Programming</article-title>
          . (
          <year>2015</year>
          )
          <fpage>288305</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>