=Paper= {{Paper |id=Vol-2796/presentation7 |storemode=property |title=Sequential Exceptional Pattern Discovery Using Pattern-Growth: An Extensible Framework for Interpretable Machine Learning on Sequential Data |pdfUrl=https://ceur-ws.org/Vol-2796/xi-ml-2020_mollenhauer.pdf |volume=Vol-2796 |authors=Dennis Mollenhauer,Martin Atzmueller |dblpUrl=https://dblp.org/rec/conf/ki/MollenhauerA20 }} ==Sequential Exceptional Pattern Discovery Using Pattern-Growth: An Extensible Framework for Interpretable Machine Learning on Sequential Data== https://ceur-ws.org/Vol-2796/xi-ml-2020_mollenhauer.pdf
      Sequential Exceptional Pattern Discovery Using
      Pattern-Growth: An Extensible Framework for
    Interpretable Machine Learning on Sequential Data

                     Dennis Mollenhauer1 and Martin Atzmueller2
           1 University of Kassel, Wilhelmshöher Allee 73, 34121 Kassel, Germany
                               dennis@mollenhauer.is
           2 Tilburg University, Warandelaan 2, 5037 AB Tilburg, The Netherlands
                                 m.atzmuller@uvt.nl


       Abstract. Interpretable machine learning on complex data requires adequate cos-
       tumizable as well as scalable computational analysis methods. This paper presents
       a framework combining the paradigms of exceptional model mining with se-
       quential pattern mining in order to mine interesting patterns from complex data.
       We present a method for sequential exceptional pattern discovery using pattern-
       growth and further show how to adapt this method for large-scale complex data,
       with an adaptation to Map/Reduce. First evaluation results demonstrate the effi-
       cacy of the proposed method, for both synthetic as well as real-world datasets.

       Keywords: Exceptional Model Mining · Sequential Patterns · Complex Data.


1    Introduction
Complex data such as large-scale multi-relational temporal and/or spatial data requires
specific interpretable and explainable machine learning and analytics methods. Here the
goal is to enable human insight into the methods and/or models in order to ultimately
allow for computational sensemaking [6]. In particular, rule-based and pattern-based
methods, e. g., [22, 23, 37] have shown considerable impact and promise in this respect.
     In this paper, we consider interpretable machine learning on complex data, in par-
ticular focusing on sequence data – utilizing sequential pattern mining, which provides
usually simple to interpret models similar to rule-based and association-based models,
cf. [23, 37, 45]. We present the extension of sequential pattern discovery via excep-
tional model mining [19] – a flexible technique for systematically obtaining interesting
patterns ranked by a given quality function (also called an interestingness measure),
as a variant of subgroup discovery [5]. We show how to combine exceptional model
mining and sequential pattern mining adapting a conceptually flexible pattern-growth
approach [30, 39, 40] in order to allow for accessible customization and extension.
For this, we build on the GP-Growth approach [30], while extending it for sequen-
tial pattern discovery using techniques of PrefixSpan [39, 40]. For enabling a scalable
approach on large datasets, we provide an adaptation of the presented method using the
Map/Reduce framework for Big Data processing. We perform first evaluations of the
proposed method on synthetic as well as real-world datasets. Our preliminary results
demonstrate the efficacy of the proposed method for large-scale complex data.




Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Our contributions are summarized as follows.
 1. We propose an extension of sequential pattern mining, enabling more advanced
    quality functions than a simple support measure by adapting the paradigm of ex-
    ceptional model mining. We present an integrated framework to discover according
    sequential patterns using a pattern-growth approach. With this, interpretable ma-
    chine learning using exceptional model mining of sequential patterns is enabled.
 2. We present an algorithm for Sequential Exceptional Pattern discovery using Pattern-
    growth (SEPP), as the combination of exceptional and sequential pattern mining,
    also towards large-scale data building on the Map/Reduce framework.
 3. Evaluting the proposed method, we present first results using synthetic as well as
    real-world data demonstrating the efficacy of the proposed method.
   The rest of the paper is structured as follows: Section 2 introduces the necessary
background and discusses related work. After that, Section 3 introduces the proposed
method. Next, Section 4 presents our results. Finally, Section 5 concludes with a sum-
mary and interesting directions for future research.

2     Background
Recently, the concept of interpretable and explainable machine learning has received
a strong interest and momentum in the data mining and machine learning community,
e. g., [13, 31, 41]. In particular, in the scope complex models, in particular including
black box methods, further explanation and interpretation are usually required to enable
domain experts and users to understand, trust, and ultimately transform novel and useful
model results into a real-world application [10, 32, 33, 42, 44]. Then, human-in-the-loop
approaches [28], and computational sensemaking [6] are enabled.
     In the past, rule-based/associative methods and approaches, e. g., [7, 8, 22, 23] have
shown considerable impact and promising perspectives regarding interpretability and
explainability, also regarding pattern mining based approaches. Here, we focus on min-
ing sequential patterns – specifically exceptional sequential patterns – which we define
according to the framework of exceptional model mining.
     Below, we introduce the necessary background and discuss related work regarding
these issues, focusing on (sequential) pattern mining, exceptional model mining, and
the Map/Reduce framework for processing large datasets, which form the basis of our
proposed methods that are introduced in the next section.

2.1   Exceptional Model Mining
Pattern mining typically aims to discover local models characterizing (or describing)
parts of the data given an quality function, e. g., [29]. This can be achieved e. g., tech-
niques like methods for association rule mining [3] or subgroup discovery, e. g., [5, 46].
The interestingness of the respective subgroups is usually defined by a certain property
of interesting formalized by a quality function. Here, exceptional model mining [5, 19]
can be seen as a variant of subgroup discovery, focusing on more complex quality func-
tions, i. e., considering complex target models, like comparing regression models or
graph structures [9]. Essentially, exceptional model mining tries to identify interesting
patterns with respect to a local model derived from a set of attributes, cf. [17, 18].
2.2   Basics of Subgroup Discovery and Exceptional Model Mining

In the scope of subgroup discovery and exceptional model mining, selectors or selection
expressions select the instances contained in a database D, which are covered by the re-
spective patterns. Typical selection expression are given by attribute-value pairs in the
case of nominal attributes, or by intervals in the case of numeric attributes. More for-
mally, a subgroup description (or pattern) combines selectors into a boolean formula.
For a typical conjunctive description language, a pattern 𝑝 = {sel1 , . . . , sel 𝑘 } is defined
by a set of selectors sel 𝑗 , which are interpreted as a conjunction, i.e. 𝑝 = sel1 ∧. . .∧ sel 𝑘 .
In the general case, each selector sel 𝑘 considers restrictions on the domain of an at-
tribute contained in database D. In this paper – in our context of sequential pattern
mining – we will focus on simple description languages made up of items (and item-
sets), so that the selectors 𝑠 𝑘 correspond to specific items 𝑖 𝑘 or sets of those. Please
note, that without loss of generality we can always construct a mapping of selectors
to a set of items, if the set of selectors making up the description language is fixed. In
the following, let database D contain instances with (sequential) itemset information.
A subgroup corresponding to a pattern then contains all instances 𝑑 ∈ D for which the
respective formula for the pattern evaluates to true.
     For exceptional model mining, a model consists of a specific model class (e. g., a
regression or graph-structured model, cf. e. g., [9, 17, 18]), requiring a specific qual-
ity function. It applies model parameters which depend on the values of the model
attributes in the instances of the respective subgroup. The attributes consist of two (usu-
ally non-overlapping) sets of describing attributes and model attributes. The goal of
exceptional model mining is then to identify patterns, using a specific quality function,
for which the model parameters differ significantly from the parameters of the model
built from the entire dataset. For a more detailed discussion we refer to [19].
     Prominent approaches for exhaustive subgroup discovery and exceptional model
mining, e. g., the SD-Map [12]/SD-Map* [11] and GP-Growth [30] algorithms, extend
the FP-growth [26] algorithm – an efficient approach for frequent pattern mining – rely-
ing on a special data structure, the so-called FP-tree. The FP-tree is a prefix-tree built us-
ing the given set of selectors. It enables efficient access for generating frequent patterns,
while containing the complete condensed frequency information for database D. Due
to the limited space, we refer to [26] for more details. The basic FP-tree focuses only
on frequencies, for computing the support. However, extended FP-trees, e. g., for sub-
group discovery [11,12] also contain additional information for estimating the qualities
of patterns given a specific quality function. This principle is generalized in the GP-
Growth algorithm [30], which substitutes simple frequencies by a generic condensed
representation. This is called a valuation basis [30]. Essentially, it captures all the nec-
essary (local) information to enable the computation of the applied quality function.
As outlined in [30], a valuation basis is thus just represented by an 𝑛-tuple, contain-
ing all the relevant information. For computing supports of patterns, for example, just
basic frequency information needs to be stored (e. g., in a 1-tuple). Then, for adapting
FP-Growth to GP-Growth, the respective GP-tree structure considers valuation bases
instead of simple frequencies as in the FP-tree case. Due to the limited space, we refer
to [30] for a detailed discussion. Below, we outline how to apply the ideas of GP-Growth
to exceptional sequential pattern mining and its efficient large-scale implementation.
2.3   Sequential Pattern Mining
Sequential pattern mining, e. g., [20, 35, 36, 39, 40, 49] aims to identify frequent subse-
quences in sequential databases, taking into account the order of the respective items. In
the following, we define basic concepts following the formalism and notation of [40].
Let 𝐼 = {𝑖1 , 𝑖2 , ..., 𝑖 𝑛 } denote a set of items (e. g., visited websites/locations etc., arti-
cles bought one after another, etc.). Then, a sequence 𝑠 = h𝑡 1 , 𝑡2 , ..., 𝑡𝑙 i is an ordered
list (𝑛-tuple) of 𝑛 non-empty itemsets 𝑡 𝑖 ⊆ 𝐼, |𝑡𝑖 | > 0, 𝑖 = 1 . . . 𝑙. An itemset 𝑡 𝑖 is rep-
resented as follows (𝑥1 , 𝑥2 , ..., 𝑥 𝑚 ), with 𝑥𝑖 ∈ 𝐼; for an itemset of size 1 we omit the
brackets for simpler notation. Please note, that an item can occur several times within a
sequence, but only once within an itemset. The length of a sequence is then the number
of itemsets that appear in the sequence. A sequence of length l is called an 𝑙-sequence.
We call a sequence 𝛼 = h𝑎 1 , 𝑎 2 , ...𝑎 𝑛 i a subsequence of 𝛽 if it is part of a sequence
𝛽 = h𝑏 1 , 𝑏 2 , ..., 𝑏 𝑚 i, i. e., 𝛼 v 𝛽, iff there are integers 1 < 𝑗 1 < 𝑗 2 < ... < 𝑗 𝑛 ≤ 𝑚, so
that 𝑎 1 ⊆ 𝑏 𝑗1 , 𝑎 2 ⊆ 𝑏 𝑗2 , ..., 𝑎 𝑛 ⊆ 𝑏 𝑗𝑛 . 𝛽 is then called a super sequence of 𝛼.
     A sequence database 𝑆 is a set of tuples (𝑠𝑖𝑑, 𝑠), where 𝑠𝑖𝑑 is the (unique) sequence
ID and s is the sequence. If 𝛼 v 𝑠 for a tuple (𝑠𝑖𝑑, 𝑠), then 𝛼 is said to be contained in
(𝑠𝑖𝑑, 𝑠). The frequency of a sequence 𝛼, i. e., the support in a sequence database 𝑆 is
determined by the number of tuples containing 𝛼. If 𝑆 is a sequence database and 𝛼 is a
sequence, then the support of 𝛼 is determined by: 𝑠𝑢 𝑝 𝑝𝑜𝑟𝑡 𝑆 (𝛼) = |{(𝑠𝑖𝑑, 𝑠)|(𝑠𝑖𝑑, 𝑠) ∈
𝑆 ∧ 𝛼 v 𝑠}| . A sequential pattern is then defined as a sequence 𝛼 with a support greater
or equal than a defined minimum support threshold 𝜉, i. e., 𝑠𝑢 𝑝 𝑝𝑜𝑟𝑡 𝑆 (𝛼) ≥ 𝜉, cf. [40].
The problem of sequential pattern mining is then defined as follows: Given a sequence
database 𝑆 and a minimal support threshold 𝜉, find all sequential patterns in the given
database [40]. In our proposed approach, we combine sequential pattern mining with
exceptional model mining enabling interpretable machine learning on sequential data.

2.4   Parallel Sequential Pattern Mining
Sequential pattern mining is a prominent data mining task, e. g., [20, 35]. Regarding
large-scale data processing frameworks and parallel processing, algorithmic adaptations
have been investigated, cf. [24] for a survey. One popular framework for implementing
such algorithms is the Map/Reduce [16] framework. In general, Map/Reduce is ap-
plicable for a certain computational task, if this task can be divided into independent
computational tasks, such that there is no required communication between these. Then,
large tasks can be split up into subtasks – a typical divide-and-conquer strategy.
    Due to these reasons, we design our proposed method for exceptional parallel se-
quential pattern mining using Map/Reduce - here, we can adapt the ideas of the GP-
Growth algorithm using the respective valuation bases for exceptional model min-
ing. As a method for sequential pattern mining, we build on the PrefixSpan algo-
rithm [39, 40], due to its conceptual adaptability and similarity to the GP-Growth al-
gorithm in partitioning the data. With this, PrefixSpan can then be parallelized with
Map/Reduce as we will describe below in more detail. Essentially, we can simply ex-
tend the support counter of the algorithm by a valuation basis. Further constraints and
extensions can then be simply implemented using both the PrefixSpan as well as the Ex-
ceptional Model Mining techniques. With this, we provide an extensible pattern-based
framework for interpretable machine learning on sequential data.
3     Method
For extending sequential pattern mining to exceptional sequential pattern mining, one
important step is to identify a suitable sequential pattern mining algorithm. In this pa-
per, we extend the PrefixSpan algorithm since it is “compatible” with the GP-Growth
approach w.r.t. our proposed extension for exceptional model mining and the incorpora-
tion of valuation bases, in particular the aggregation operation. From [30] we recall the
definition of a valuation basis, following the notation and terminology proposed in [30]:
Definition 1. For constructing valuation bases on an arbitrary set 𝑉, we consider an
abelian semi-group (𝑉, ⊕), where ⊕ is a binary operator on 𝑉, such that ⊕ : 𝑉 ×𝑉 → 𝑉,
and the following holds: (1) 𝑉 is closed under ⊕, i. e., 𝑎, 𝑏 ∈ 𝑉 ⇒ 𝑎 ⊕ 𝑏 ∈ 𝑉, (2) ⊕ is
associative, i. e., 𝑎, 𝑏, 𝑐 ∈ 𝑉 ⇒ 𝑎 ⊕ (𝑏 ⊕ 𝑐) = (𝑎 ⊕ 𝑏) ⊕ 𝑐, and (3) ⊕ is commutative,
i. e., 𝑎, 𝑏 ∈ 𝑉 ⇒ 𝑎 ⊕ 𝑏 = 𝑏 ⊕ 𝑎. An element 𝑣 ∈ 𝑉 is then called a valuation basis.
     The support (count) is a very simple example of a valuation basis (and its domain)
which can be simply aggregated. We refer to [30] for more details on more complex
valuation bases, specifically for more complex model classes such as the variance or
the linear regression model. Basically, the variance model capturing the variance of a
numeric model attribute in a sequential pattern can be represented using a 3-tuple, from
which the variance (aggregation) can be computed; similarly, the aggregation can be
computed for the linear regression model (for two model attributes) using a 5-tuple.
     For pattern-growth algorithms, the PrefixSpan algorithm is a good candidate since
it is conceptually flexible and extensible including various variants, e. g., [4, 15, 25, 43]
such that e. g., domain-driven constraints and extensions can be easily implemented and
integrated into a customized approach. Below, we first describe the problem of sequen-
tial exceptional model mining, briefly summarize the PrefixSpan algorithm, before we
present the approaches for sequential exceptionality detection using PrefixSpan.

3.1   Sequential Exceptional Model Mining
Below, we first define the problem of Sequential Exceptional Model Mining (SEMM).
Essentially, with SEMM the problems of exceptional model mining and sequential pat-
tern mining are combined. Sequential exceptional patterns are those for which their
model has a deviation in comparison to the overall model of the database from which
they were extracted. Let 𝑒 = (𝑒𝑖𝑑, 𝑒𝑎, 𝑠) denote an extended sequence, where 𝑒𝑖𝑑 is a
unique identifier of 𝑒, 𝑒𝑎 are the model attributes for estimating the model parameters
and 𝑠 is a sequence. Let 𝐸 = {𝑒 1 , 𝑒 2 , ..., 𝑒 𝑖 } be a database of extended sequences.
Definition 2. We consider an extended sequence database, a model 𝑀, a minimum sup-
port threshold 𝜉, a quality function 𝑞 𝑀 and an integer 𝑘. Sequential exceptional model
mining is the task of finding the 𝑘 best sequential patterns w.r.t. the quality function 𝑞 𝑀 ,
with a support greater or equal to the minimum support threshold 𝜉.
It is easy to see, that if 𝜉 is set to one, then all possible patterns according to the given
quality function are retrieved. The threshold 𝜉 was introduced as a separate criterion in
order to provide a way to limit individual outliers and restrict the problem size easily.
However, it could potentially also be integrated into the quality function directly.
     Basically, the respective requirements for the sequential pattern mining algorithm
follow from the requirements of Map/Reduce and those of the valuation bases. The
Map/Reduce framework requires that each dataset (sequence) can be viewed individ-
ually and that the data can be partitioned. This is obviously given for the sequential
pattern mining problem, since each sub-sequence can be counted individually, and par-
titions can be formed at the borders of sequences. Furthermore, valuation bases only
guarantee a ⊕ operator, which can be understood as an addition. Since the support
counter is to be extended by valuation bases, it is necessary that the algorithm always
adds the support.
     The algorithm PrefixSpan [39, 40] by Pei et. al. works with partial databases, each
of which is a projection of the entire sequence database with respect to one prefix. In
the comparative tests in [35], PrefixSpan performed well throughout and was the only
algorithm that was able to perform all test runs.
     Therefore, since PrefixSpan is a conceptually flexible yet effective algorithm which
can be extended as already discussed above, and can be parallelized on Map/Reduce,
we utilize this algorithm for our approach. In the following, we describe the PrefixSpan
algorithm. We assume that all elements of a sequence are lexicographically ordered. A
sequence 𝛽 = h𝑒 1 , 𝑒 2 , ..., 𝑒 𝑛 i is a prefix of a sequence 𝛼 = h𝑒 10 , 𝑒 20 , ..., 𝑒 𝑚
                                                                                         0 i, iff (1) 𝑒 0 = 𝑒
                                                                                                        𝑖     𝑖
                          0                                      0
for (𝑖 ≤ 𝑚 −1), (2) 𝑒 𝑚 ⊆ 𝑒 𝑚 , and (3) all items in (𝑒 𝑚 −𝑒 𝑚 ) occur lexicographically after
           0 . For example, h(𝑎𝑑)𝑐i, h(𝑎𝑑)i and h𝑎i are prefixes of h(𝑎𝑑)𝑐(𝑏𝑐) (𝑎𝑒)i, but
those in 𝑒 𝑚
not h𝑎𝑐i nor h(𝑎𝑑)𝑏i.
     Let 𝛼 = h𝑒 1 , 𝑒 2 , ..., 𝑒 𝑛 i denote a sequence and 𝛽 = h𝑒 1 , 𝑒 2 , ...𝑒 𝑚−1 , 𝑒 𝑚   0 i a prefix of
                 00                                   00           0
𝛼. Then 𝛾 = h𝑒 𝑚 , 𝑒 𝑚+1 i is a suffix, where 𝑒 𝑚 = (𝑒 𝑚 − 𝑒 𝑚 ). The connection of prefix 𝛽
with suffix 𝛾 to a sequence 𝛼 is also represented as 𝛼 = 𝛽 · 𝛾.


Algorithm 1 PrefixSpan: outputs the complete set of sequential patterns
  procedure P REFIX S PAN(𝛼, l, 𝑆| 𝛼 )
     ⊲ 𝛼: prefix (on the first pass 𝑛𝑢𝑙𝑙)
     ⊲ l: length of 𝛼
     ⊲ 𝑆| 𝛼 : DB projected after 𝛼, or the complete database during the first run
     𝐵 ← search 𝑆| 𝛼 and find frequent items 𝑏 so that:
          1. 𝑏 can be added to the last itemset of 𝛼 to form a sequential pattern, or
          2. h𝑏i can be added to 𝛼 as a new itemset to form a sequential pattern
     for all 𝑏 in 𝐵 do
          𝛼0 = 𝛼 · 𝑏
          𝑠𝑢 𝑝 𝑝𝑜𝑟𝑡 ← 𝑠𝑢 𝑝 𝑝𝑜𝑟𝑡 𝑆 |𝑎 (b)
          STDOUT (𝛼 0 + 𝑠𝑢 𝑝 𝑝𝑜𝑟𝑡)
          𝑆| 𝛼0 ← project 𝑆| 𝛼 to 𝑏
          P REFIX S PAN(𝛼 0 , 𝑙 + 1, 𝑆| 𝛼0 )



   In Algorithm 1, we denote the basic PrefixSpan algorithm. PrefixSpan expects a se-
quence database and a minimal support threshold as parameters. PrefixSpan assumes,
without loss of generality, that the itemsets of the sequences are in a total order, where
usually a lexicographic order is used. During the initial PrefixSpan call, frequent 1-
sequences are determined. Afterwards the passed sequence database is projected to
each frequent item and PrefixSpan is called recursively with the respective 1-sequence
as prefix. In the first recursion stage, the transferred database is searched for frequent
2-itemsets with respect to the prefix. Then for each 2-sequence found, the database is
projected and PrefixSpan is called with the respective 2-sequence as the prefix. These
recursive calls end when there are no more frequent 𝑛-sequences in the respective pro-
jected database or the respective projected databases are empty.
    The central idea of PrefixSpan is to split the database into projections – an idea
which is similar to the conditional FP-trees of FP-Growth. A projected database is de-
fined as follows [40]:
Definition 3. Let 𝛼 be a sequential pattern in a sequence database 𝑆. Then the 𝛼 pro-
jected database 𝑆| 𝛼 will contain all the 𝛼 suffixes from the database 𝑆.
The support in the projected databases is determined as follows:
Definition 4. Let 𝛼 be a Sequential Pattern in a sequence database 𝑆 and 𝛽 be a se-
quence with prefix 𝛼. Then the support of 𝛽 in the 𝛼 projected database 𝑆| 𝛼 is the
number of sequences 𝛾 in 𝑆| 𝛼 , so that 𝛽 v 𝛼 · 𝛾 applies [40].
    The database 𝑆| 𝛼 constructed in this way cannot be larger than 𝑆. This database
contains all the necessary suffixes to find all sequential patterns starting with the prefix
𝛼. For a more efficient implementation, the use of pseudo sequences is suggested. When
projecting a sequence, then no new sequence is created, but the original sequence is
referenced and only a pointer is used to point to the location where the projection begins.
A detailed discussion can be found in [40].
    In the algorithm, counting the support is only performed when determining the fre-
quent items. At this point the valuation basis can be calculated as follows. A test must
be performed on each sequence in the transferred database to determine whether the
sequence can be extended. If this is the case, the counter is increased by one for the
corresponding pattern. Assuming that the sequence is extended by a valuation basis, i.e.
𝑠 = h𝑠𝑖𝑑, 𝑣𝑎𝑙𝑢𝑎𝑡𝑖𝑜𝑛𝐵𝑎𝑠𝑖𝑠, 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒i, the additional valuationBasis attribute can be
used at this point to determine the resulting valuation basis, which is then stored in the
result. Therefore, during determining the support, all valuation bases for the respective
sequential patterns are available and can be evaluated. This is how the according valua-
tion bases for a specific exceptional model class can be integrated into PrefixSpan, if the
according model class can be represented in this way. For more details on limitations
and restrictions we refer to [30] for a detailed discussion.

3.2   Sequential Exceptional Pattern Discovery using Pattern-Growth (SEPP)
Due to its flexibility and strict partitioning of the problem, PrefixSpan is very well suited
for adaptation and for being parallelized using Map/Reduce. For our pattern-growth ap-
proach, we adapt and extend PrefixSpan, as shown in Algorithm 2. The determination
of frequent sequences has not been changed. Then, the extension using valuation bases
discussed above is straight-forward, by adding valuation bases to each sequence as addi-
tional information. With these valuation bases, the quality of an exceptional sequential
pattern can then be simply estimated using the respective quality function. We call this
general approach sequential exceptional pattern discovery using pattern-growh (SEPP).
     We call our method for parallelizing PrefixSpan using Map/Reduce parallel SEPP,
while we can also resort to a serial version for smaller problem sizes. The basic idea
of parallel SEPP is to convert the projection and the determination of the frequent
sequences into a parallel formulation. Since it is problematic to implement recursive
algorithms with Map/Reduce, (parallel) SEPP we convert the depth-first search strategy
of PrefixSpan into a breadth-first search strategy. By replacing the call stack with a
queue only the order in which the projections are visited changes, cf. [38]. This order
is not important for PrefixSpan, because there is no relationship between projections
of the same level. Each recursive call holds at most one projection of a prefix with the
length 𝑛+1, if its predecessor has the length 𝑛. The frequent patterns of length 𝑛 are then
searched for in parallel and then the projections are calculated in parallel. This process
is repeated until no more new patterns are found or all projections are empty. Initially,
only frequent 1 sequences are searched. (Parallel) SEPP is depicted in Algorithm 2.


Algorithm 2 SEPP: outputs the complete set of sequential exceptional patterns
  procedure SEPP(𝑆, 𝜉)
     ⊲ 𝑆: a sequence database
     ⊲ 𝜉: minimal support
     𝑛-sequences ← find all common 1 sequences in parallel
     STDOUT (n-sequences)
     projections ← new set
     for all 𝛼 in 𝑛 sequences do                                                          ⊲ parallel
         projections ← projections + PROJECT(𝑆 to 𝛼)
     while projections != blank do
         𝑛-sequences ← search parallel every 𝛼-projection
         and find frequent items 𝑏, so that either:
                1. 𝑏 can be added to the last itemset of 𝛼 to form a sequential exceptional pattern,
                2. h𝑏i can be added to 𝛼 as a new itemset to form a sequential exceptional pattern
         STDOUT (𝑛-sequences)
         projectionsNew ← new set
         for all 𝑆| 𝛼 in projections do                                                   ⊲ parallel
             𝛼 0 -sequences ← any sequence in 𝑛-sequences matching 𝑆| 𝛼
             for all 𝛼 0 in 𝛼 0 sequences do
                   projectionsNew ← projectionsNew + PROJECT(𝑆| 𝛼 to 𝛼 0 )
         projections ← projectionsNew



    (Parallel) SEPP does not change the projections and the way frequent sequences are
determined. It simply modifies the processing order so that counting steps for length 𝑛
and projections of length 𝑛 can be executed in parallel. Each counting and projection
step is performed as one Map/Reduce job. For example, twice as many Map/Reduce
jobs are required to execute parallel SEPP as the maximum pattern length. If we only
take into account the support of a sequential pattern, then parallel SEPP simplifies to
a parallelized version of a sequential pattern mining algorithm. Since longer patterns
result in a lot of small projections, from a certain size on a projection can be processed
in a serial way, i. e., not applying parallelization using Map/Reduce – as serial SEPP.
3.3   Interpretable Machine Learning on Sequential Data using SEPP


Figure 1 depicts the proposed framework for interpretable machine learning on sequen-
tial data – using our proposed combination of exceptional model mining and sequential
pattern mining using SEPP. It includes the workflow from data, to extended sequences
(contained in a database), to sequential patterns and finally a pattern-based model. Two
further important components are an exceptionality model capturing the exceptional
model information, and a human-in-the-loop approach enabled by the interpretable
SEPP method.



                                   Exceptionality
                 Valuation            Model                 Pattern Set
                Constraints                                 Constraints
                                   Quality   Model
                                  Function   Class

           Modeling                      SEPP                      Modeling     Pattern-
                         Extended                    Sequential
                                                                                 Based
                         Sequences                    Patterns
                                                                                 Model

           Enrichment,                                              Analysis,
           Refinement                                             Sensemaking




       Fig. 1. Interpretable ML Framework – Sequential Exceptional Pattern Discovery.



    In a human-centered approach, pattern-based modeling is performed in an itera-
tive approach, which can also result in incremental modeling. Both the exceptionality
model as well as the human-in-the-loop provide important information and constraints
for the process, e. g., by including specific (objective) design constraints for the valua-
tion basis, or by providing (subjective) domain-driven interestingness constraints. Ba-
sically, the process starts with the modeling of the extended sequence database given
the original data. Here, valuation constraints, e. g., for building and modeling valua-
tion bases and domains can be applied, ultimately resulting in the extended sequence
database. These, can also be enhanced using refinement and enrichment steps by the
human-in-the-loop. Next, the SEPP algorithm is applied – given the respective quality
function for the applied model class. Here, depending on the problem size, either the
serial or the parallel algorithm is applied. Finally, the obtained patterns are provided
to the pattern-based model, which can also incorporate pattern set constraints given
the exceptionality model. Both of these enable analysis and sensemaking capabilities
for the human-in-the-loop, relying on the applied inherently interpretable pattern-based
approach [22, 23, 37].
4     Results

For evaluation, we created a synthetic dataset using the IBM Quest synthetic data gen-
erator [2]. In addition, we utilized two real-world datasets, as described below.
    For the experiments, the Hadoop cluster of the IT Service center of the University
of Kassel was used. This cluster consisted of twelve nodes. Each node had two AMD
Dual-Core Opteron 2218 processors (2.6 GHz), with 16 GB of RAM and about 960
GB free disk space. The Hitachi Ultrastar A7K2000 hard drives were connected with
SATA. The nodes were running a Red Hat Enterprise Linux Server 6.2 with Apache
Hadoop 1.1.1 running on Orcale Java 1.7. The cluster had two map and two reduce
slots configured on each node, providing a total of 24 mappers and reducers.


4.1   Synthetic Dataset

The IBM Quest Synthetic Data Generator [2] was used to generate the synthetic data,
as a standard data generator in such evaluation settings, e. g., [35, 39, 40, 47, 48]. The
Quest generator provides synthetic shopping cart transactions, simulating real-world
shopping basket composition. Here, most sequences have a similar length and only a
few are particularly long. The same applies to the number of items per transaction.
    For the evaluation, we created a dataset with 27.8 million sequences using the pa-
rameters shown in Table 1. This dataset will be referenced as 27M in the following. The
number of itemsets per sequence is determined using a Poisson distribution, the param-
eter 𝜇 is equal to |𝐶 |. The same procedure is used to determine the size of the itemset,
here 𝜇 = |𝑇 |. Furthermore, the frequency of the items is exponentially distributed [1].


                Table 1. Parameters of the IBM Quest Synthetic Data Generator.

                  |𝐷|         Number of customers (= number of sequences)
                  |𝐶 | = 8    Average number of itemsets per customer
                  |𝑇 | = 2.5  Average number of items per itemset
                  |𝑆| = 4     Average length of sequences
                  |𝐼 | = 1.25 Average size of itemsets
                  𝑁 𝑆 = 5000 Number of sequences
                  𝑁 𝐼 = 25000 Number of itemsets
                  𝑁 = 20000 Number of different items




4.2   Parallel SEPP with a Simple Exceptional Model Class (Support)

First the runtime was examined regarding different support thresholds with 24 mappers
and reducers (Figure 2), where we applied a simple exceptional model class and quality
function (support – as the standard baseline for sequential pattern mining). The runtime
investigation in [40] showed with similarly structured datasets that the runtime of Pre-
fixSpan approximately doubles when the support threshold is divided by two. Figure 2
                 11

                 10

                  9

                  8
     (Stunden)
       (hours)




                  7

                  6
 Time
Zeit




                  5

                  4

                  3

                  2

                      5.00010.000   20.000   30.000   40.000   50.000         70.000                     100.000
                                                                Support


                        Fig. 2. SEPP (support quality function), 27M: Duration in relation to support.


shows that SEPP provides similar or better results. For a minimum support between
100000 and 50000 the factor is 2, between 20000 and 10000 the factor is only 1.3.
    In a further series of tests the speedup of SEPP was examined. Speedup specifies the
factor of acceleration of a parallel program when the number of processors is increased.

Definition 5. The speedup 𝑆 𝑝 is defined as 𝑆 𝑝 = 𝑇𝑇𝑝1 , where 𝑝 represents the number
of processors, 𝑇1 the runtime with one processor, and 𝑇 𝑝 the runtime with 𝑝 processors.

For an ideal speedup 𝑆 𝑝 = 𝑝. This ideal result is usually difficult to achieve due to
the computational overhead involved in managing the respective computational tasks.
In order to be able to run this experiment in an acceptable time, a minimum support
of 100000 was chosen. With this support the calculation with 24 mappers and reducers
took about two hours. Assuming that an almost ideal speedup is achieved, a runtime
of two days can be expected with one mapper and reducer. The result of this test can
be seen in Figure 3. We observe, that for up to 24 reducers the speedup is close to
the ideal speedup. Since only 24 slots are available for mapper/reducers, it was ex-
pected that no further increase could be observed with 48 mappers and reducers, as also
shown in the experiments. So, for the evaluation the algorithm fulfilled all expectations
above, demonstrating its scalability and efficacy for processing and analyzing complex
datasets. In particular, with the 27M dataset, it could be shown that SEPP scales al-
most linearly for larger datasets and can calculate even larger datasets with very small
minimum support thresholds in less than twelve hours.

4.3                   Parallel SEPP With More Complex Exceptional Model Classes
For more complex model classes bases than support – as discussed above – the quality
estimation implemented in SEPP is basically an extension of the standard support cal-
culation. Therefore, for more complex valuation bases only a constant factor determines
          48                                                                48




                                                                  Speedup
Speedup

                                                      Speedup                                                           Speedup
          24                                                                24                                            Ideal
                                                 ●
                                                        Ideal                                           ●          ●
                                                                                                                          Slope
                                       ●
                                                        Real                                            ●          ●

                                                                                                                          Variance
          12                    ●
                                                                            12                    ●
                                                                                                  ●




           6               ●                                                 6               ●


           3
           2
                       ●                                                     3
                                                                             2       ●
                                                                                         ●


           1   ●
                   ●
                                                                             1   ●




               123 6            12    24         48                              123 6 12              24          48
                               #Mapper/Reducer                                                   #Mapper/Reducer


Fig. 3. Speedup of SEPP (support quality                         Fig. 4. Corresponding speedup of SEPP on
function) on 27M, in relation of the applied                     27M, using the simple linear regression model
number of Mappers and Reducers.                                  (Slope) and variance model (Variance).



the runtime of SEPP compared to only using support. To confirm this, the experiments
shown in Figure 2 were repeated with more complex valuation bases. As a valuation
input, the number of items per sequence was calculated in advance. As valuation do-
main the variance model was used. Furthermore, we applied the simple linear regression
model, where we took the first two items of a sequence as a valuation input - i. e., as the
two respective variables (independent, dependent) for the linear regression modeling.
    The results of these experiments are shown in Table 2, where the column “Factor”
indicates the ratio of the runtime of the variance model divided by runtime using sim-
ple support. This factor is about 1.5 for all support thresholds. We observed a similar
behavior for the linear regression model class.


Table 2. SEPP runtimes of variance model and support model classes in minutes on dataset 27M.

                                     Support Variance Model Class Support Model Class Factor
                                      100000                 157                 107 1,47
                                       70000                 239                 164 1,46
                                       50000                 327                 222 1,47
                                       30000                 490                 324 1,51
                                       10000                 803                 543 1,48




    In a further series of experiments, the speedup for the variance model and the simple
linear regression model was determined. The result is shown in Figure 4. The nearly
linear speedup of parallel SEPP is achieved with different model classes. However, for
the simple linear regression model the speedup is slightly worse than for the variance
model. This can be explained by the fact that the variance model’s valuation base is only
a 3-tuple, whereas the simple linear regression model uses a 5-tuple base. This leads to
a higher effort when sorting and copying the data between mappers and reducers.
4.4   Results on Real-World Datasets

Two real-world datasets were applied for evaluation, i. e., a dataset from MSNBC, and
data from a web server protocol of a Hungarian news portal. For our experiments, we
applied a simple valuation basis for SEPP, i. e., using the support as quality function.

MSNBC Dataset The dataset "msnbc.com anonymous web data" from the UCI Ma-
chine Learning Repository [21,27] is a dataset containing a formatted web server log of
the site msnbc.com for 28.09.1999. The dataset contains all requests for the entire day
that were not answered by caches. The calls were divided into 17 thematic categories,
such as main page, news, technology, local. A sequence of these categories was created
for each user. In total the dataset contains 989818 sequences with an average number of
5.7 calls per user. Each category contains between ten and 5000 requests [27].


 Table 3. Runtimes MSNBC with a support of 7500 and a limit for serial processing of 10000.


                          Reducer        48 24 12 6 3 2 1
                          Terminal (min) 20 16 15 18 18 22 30



    Table 3 shows the results of seven runs with different numbers of reducers. The
parameter minimum support was set to 7500 and the limit from which size the pro-
jected databases should be processed with the serial SEPP variant (serial limit) was
set to 10000 sequences. Since this dataset is small in relation to the synthetic dataset,
only a very low speedup can be observed. With a larger number of reducers even the
runtime increased. When running three reducers, the step of the sequential PrefixSpan
dominated the calculation with nine minutes. In the calculation with twelve reducers
this step took only 6.5 minutes. In both cases it could be observed that all but one re-
ducer were finished within two minutes and only the last one had to be waited for. This
suggests that there are relatively dense partitions with less than 10000 sequences in the
data or that the distribution of instances between the reducers is not balanced.
    In order to confirm that on this data set no significant improvement of the runtime
with more than twelve mappers and reducers can be achieved, several runs with low
support thresholds were performed without serial processing. The result is shown in
Figure 5. As expected, the optimum is given for twelve mappers and reducers.



                  Table 4. Lead times Kosarak data set; serial limit 10000.

                   Support Reducer            48 24 12 6 3 2 1
                   7500    Terminal (min)     16 13 13 14 19 23 38
                   5000    running time (min) 18 16 16 18 24 30 50
                   2500    Terminal (min)     29 27 29 33 49 59 NA
             60   ●
                                                                                 60       ●

             55
                                                                                 55
             50
                                                                                 50   ●


             45                                                                               ●
Time (Min)

                                                                                 45




                                                                    Time (Min)
             40                                          Support
                      ●


                                                                                 40
                                                                                                                             Support
             35
                                                         ●
                                                             1000                     ●


                  ●       ●
                                                    ●    ●
                                                             2000                35
                                                                                                                             ●
                                                                                                                                 2500
             30                                          ●
                                                             4000                                 ●                          ●
                                                                                                                                 5000
                                                         ●
                                                             5000                30
                                                                                          ●
                                                                                                       ●                ●
                                                                                                                             ●
                                                                                                                                 7500
             25       ●
                              ●
                                                    ●                                                        ●


                                   ●     ●                                       25       ●
                                                                                              ●


             20   ●       ●
                                                    ●

                  ●
                              ●
                                   ●
                                                    ●                            20           ●                         ●
             15       ●

                      ●●
                                         ●                                                        ●
                                                                                                       ●     ●          ●
                       ●
                              ●
                              ●
                                   ●
                                         ●
                                         ●                                       15               ●

             10                    ●                                                                   ●     ●




                  123 6           12    24          48                                123 6           12    24          48
                                  #Mapper/Reducer                                                     #Mapper/Reducer

Fig. 5. Runtime of SEPP on the MSNBC                                Fig. 6. Runtime of SEPP on the Kosarak
dataset with different minimum support                              dataset with different minimum support
thresholds, in relation to different numbers of                     thresholds, in relation to different numbers of
mapper/reducers.                                                    mapper/reducers.


Kosarak Dataset The second applied real-world dataset3 considers a Hungarian news
portal and contains anonymous access sequences of 990002 users. The sequences cover
41270 different pages. The average sequence length is 8.09, the longest 2498.
    We applied minimum supports of 5000, 7500 and 2500, with a serial limit of 10000.
The runtimes are shown in Table 4 and Figure 6. The optimal number of mappers and
reducers in the first two cases was twelve reducers, as for the MSNBC dataset (see
Section 4.4). For a support of 2500 the optimum was 24 reducers. As for the synthetic
data with only 24 available slots for mappers/reducers, it was expected that no further
increase could be observed with 48 mappers/reducers, as also shown in the experiments.


5             Conclusions
In this paper, we have proposed a pattern-based extensible framework for interpretable
machine learning on complex data, in particular focusing on sequence data – utiliz-
ing sequential pattern mining. We showed how to combine exceptional model min-
ing and sequential pattern mining into an extensible approach, building on existing se-
quential pattern mining methods, in particular pattern-growth approaches. We presented
the according method for sequential exceptional pattern discovery using pattern-grown
(SEPP). For this basic algorithm, both serial as well as parallelized variants were pre-
sented. We evaluated the proposed method on synthetic as well as real-world datasets.
Our preliminary results demonstrated the efficacy of the proposed methods.
    For future work, we aim investigate more complex quality functions as well as in-
terpretability constraints, which we aim to integrate into the proposed approach, also in
different application contexts. In addition, we aim to compare the proposed framework
to further parallelized implementations, e. g., [24]. Yet another interesting direction for
future work is to investigate extended interpretability and sensemaking patterns during
the application of the proposed approach, e. g., [14, 34].
  3 FIMI Repository: http://fimi.uantwerpen.be/
References
 1. Agrawal, R., Srikant, R.: Mining sequential patterns. Tech. rep., Research Report RJ 9910,
    Almaden Research Center, IBM Research Division, San Jose, California (Oktober 1994)
 2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. ICDE. pp. 3 –14. IEEE (1995)
 3. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. VLDB.
    pp. 487–499. Morgan Kaufmann (1994)
 4. Antunes, C., Oliveira, A.L.: Generalization of pattern-growth methods for sequential pattern
    mining with gap constraints. In: Proc. MLDM. pp. 239–251. Springer (2003)
 5. Atzmueller, M.: Subgroup Discovery. WIREs DMKD 5(1), 35–49 (2015)
 6. Atzmueller, M.: Declarative Aspects in Explicative Data Mining for Computational Sense-
    making. In: Proc. International Conference on Declarative Programming. Springer (2018)
 7. Atzmueller, M., Baumeister, J., Puppe, F.: Quality Measures and Semi-Automatic Mining of
    Diagnostic Rule Bases. In: Proc. INAP/WLP. pp. 65–78. No. 3392 in LNAI, Springer (2005)
 8. Atzmueller, M., Baumeister, J., Puppe, F.: Semi-Automatic Learning of Simple Diagnostic
    Scores Utilizing Complexity Measures. Artif. Intell. Med. 37(1), 19–30 (2006)
 9. Atzmueller, M., Doerfel, S., Mitzlaff, F.: Description-Oriented Community Detection using
    Exhaustive Subgroup Discovery. Information Sciences 329, 965–984 (2016)
10. Atzmueller, M., Hayat, N., Schmidt, A., Klöpper, B.: Explanation-Aware Feature Selection
    using Symbolic Time Series Abstraction: Approaches and Experiences in a Petro-Chemical
    Production Context. In: Proc. IEEE INDIN. IEEE Press, Boston, MA, USA (2017)
11. Atzmueller, M., Lemmerich, F.: Fast Subgroup Discovery for Continuous Target Concepts.
    In: Proc. ISMIS. LNCS, vol. 5722, pp. 1–15. Springer, Berlin, Germany (2009)
12. Atzmueller, M., Puppe, F.: SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery.
    In: Proc. PKDD. pp. 6–17. Springer, Berlin, Germany (2006)
13. Biran, O., Cotton, C.: Explanation and Justification in Machine Learning: A Survey. In:
    IJCAI-17 Workshop on Explainable AI (2017)
14. Bloemheuvel, S., Kloepper, B., van den Hoogen, J., Atzmueller, M.: Enhancing sequential
    pattern mining explainability with markov chain probabilities (abstract). In: DBDBD (2019)
15. Chaudhari, M., Mehta, C.: Extension of prefix span approach with grc constraints for se-
    quential pattern mining. In: Proc. ICEEOT. pp. 2496–2498. IEEE (2016)
16. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commu-
    nications of the ACM 51(1), 107–113 (Jan 2008)
17. Duivesteijn, W., Knobbe, A., Feelders, A., van Leeuwen, M.: Subgroup Discovery Meets
    Bayesian Networks–An Exceptional Model Mining Approach. In: Proc. ICDM. IEEE (2010)
18. Duivesteijn, W., Feelders, A., Knobbe, A.J.: Different Slopes for Different Folks: Mining for
    Exceptional Regression Models with Cook’s Distance. In: Proc. KDD. pp. 868–876 (2012)
19. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional Model Mining. Data Mining and
    Knowledge Discovery 30(1), 47–98 (2016)
20. Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential
    pattern mining. Data Science and Pattern Recognition 1(1), 54–77 (2017)
21. Frank, A., Asuncion, A.: UCI mach. learning repository. http://archive.ics.uci.edu/ml (2010)
22. Fürnkranz, J., Kliegr, T.: A brief overview of rule learning. In: International symposium on
    rules and rule markup languages for the semantic web. pp. 54–69. Springer (2015)
23. Fürnkranz, J., Kliegr, T., Paulheim, H.: On cognitive preferences and the plausibility of rule-
    based models. Machine Learning 109(4), 853–898 (2020)
24. Gan, W., Lin, J.C.W., Fournier-Viger, P., Chao, H.C., Yu, P.S.: A survey of parallel sequential
    pattern mining. ACM Transactions on Knowledge Discovery from Data 13(3), 1–34 (2019)
25. Han, J., Pei, J., Yan, X.: Sequential pattern mining by pattern-growth: Principles and exten-
    sions. In: Foundations and advances in data mining, pp. 183–220. Springer (2005)
26. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns Without Candidate Generation. In: Proc.
    ACM SIGMOD Intl. Conference on Management of Data. pp. 1–12. ACM Press (05 2000)
27. Heckerman, D.: msnbc.com anonymous web data. http://archive.ics.uci.edu/ml/machine-
    learning-databases/msnbc-mld/msnbc.data.html (1999)
28. Holzinger, A.: Interactive machine learning for health informatics: when do we need the
    human-in-the-loop? Brain Informatics 3(2), 119–131 (2016)
29. Knobbe, A.J., Cremilleux, B., Fürnkranz, J., Scholz, M.: From Local Patterns to Global Mod-
    els: The LeGo Approach to Data Mining. In: Proc. ECML/PKDD LeGo Workshop (2008)
30. Lemmerich, F., Becker, M., Atzmueller, M.: Generic Pattern Trees for Exhaustive Excep-
    tional Model Mining. In: Proc. ECML/PKDD. Springer, Berlin, Germany (2012)
31. Li, X., Huan, J.: Constructivism Learning: A Learning Paradigm for Transparent Predictive
    Analytics. In: Proc. SIGKDD. pp. 285–294. ACM (2017)
32. Liu, S., Wang, X., Liu, M., Zhu, J.: Towards Better Analysis of Machine Learning Models:
    A Visual Analytics Perspective. Visual Informatics 1(1), 48–56 (2017)
33. Lonjarret, C., Robardet, C., Plantevit, M., Auburtin, R., Atzmueller, M.: Why should I trust
    this item? Explaining the recommendations of any model. In: Proc. IEEE International Con-
    ference on Data Science and Advanced Analytics. pp. 526–535. IEEE (2020)
34. Lycett, M., Marshan, A.: Capturing sensemaking pattern during data analysis: A conceptual
    framework. In: Proc. ISD 2016. Springer, Berlin/Heidelberg, Germany (2016)
35. Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM
    Computing Surveys 43(1), 3:1–3:41 (Dezember 2010)
36. Mathonat, R., Nurbakova, D., Boulicaut, J.F., Kaytoue, M.: Seqscout: Using a bandit model
    to discover interesting subgroups in labeled sequences. In: Proc. IEEE International Confer-
    ence on Data Science and Advanced Analytics. pp. 81–90. IEEE (2019)
37. Molnar, C.: Interpretable Machine Learning. Lulu. com (2020)
38. Ottmann, T., Widmayer, P.: Algorithmen und Datenstrukturen. Spektrum, Heidelberg (2012)
39. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Min-
    ing sequential patterns by prefix-projected growth. In: Young, D.C. (ed.) Proc. International
    Conference on Data Engineering. pp. 215–224. IEEE, Los Alamitos (April 2001)
40. Pei, J., Han, J., Mortazavi-Asl, B., W., J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Min-
    ing sequential patterns by pattern-growth: the prefixspan approach. IEEE Transactions on
    Knowledge and Data Engineering 16(11), 1424 – 1440 (November 2004)
41. Ribeiro, M.T., Singh, S., Guestrin, C.: Why Should I Trust You?: Explaining the Predictions
    of Any Classifier. In: Proc. ACM SIGKDD. pp. 1135–1144. ACM (2016)
42. Samek, W., Wiegand, T., Müller, K.R.: Explainable Artificial Intelligence: Understanding,
    Visualizing and Interpreting Deep Learning Models. arXiv preprint arXiv:1708.08296 (2017)
43. Shabtay, L., Yaari, R., Dattner, I.: A guided fp-growth algorithm for multitude-targeted min-
    ing of big data. arXiv preprint arXiv:1803.06632 (2018)
44. Sternberg, E., Atzmueller, M.: Knowledge-Based Mining of Exceptional Patterns in Logis-
    tics Data: Approaches and Experiences in an Industry 4.0 Context. In: Proc. International
    Symposium on Methodologies for Intelligent Systems. LNCS, vol. 5722. Springer (2018)
45. Vojíř, S., Zeman, V., Kuchař, J., Kliegr, T.: Easyminer. eu: Web framework for interpretable
    machine learning based on rules and frequent itemsets. Knowl.-Based Syst. 150 (2018)
46. Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Proc. PKDD.
    pp. 78–87. Springer (1997)
47. Zaki, M.J.: Efficient enumeration of frequent sequences. In: Gardarin, G., French, J.C., Pissi-
    nou, N., Makki, K., Bouganim, L. (eds.) Proc. CIKM. pp. 68–75. ACM, New York (1998)
48. Zaki, M.J.: Parallel sequence mining on shared-memory machines. Journal of Parallel and
    Distributed Computing 61(3), 401 – 426 (2001). https://doi.org/10.1006/jpdc.2000.1695
49. Zaki, M.J.: Spade: An efficient algorithm for mining frequent sequences. Machine learning
    42(1-2), 31–60 (2001)