=Paper= {{Paper |id=None |storemode=property |title=Leveraging Super-Scalarity and Parallelism to Provide Fast Declare Mining without Restrictions |pdfUrl=https://ceur-ws.org/Vol-1021/paper_10.pdf |volume=Vol-1021 |dblpUrl=https://dblp.org/rec/conf/bpm/WestergaardS13b }} ==Leveraging Super-Scalarity and Parallelism to Provide Fast Declare Mining without Restrictions== https://ceur-ws.org/Vol-1021/paper_10.pdf

Leveraging Super-Scalarity and Parallelism to Provide
Fast Declare Mining without Restrictions

Michael Westergaard1,2? and Christian Stahl1
1
Department of Mathematics and Computer Science,
Eindhoven University of Technology, The Netherlands
2
National Research University Higher School of Economics, Moscow, 101000, Russia
m.westergaard@tue.nl, c.stahl@tue.nl

Abstract. UnconstrainedMiner is a tool for fast and accurate mining Declare
constraints from models without imposing any assumptions about the model. De-
clare models impose constraints instead of explicitly stating event orders. Con-
straints can impose various choices and ordering of events; constraints typically
have understandable names, but for details, refer to [5]. Current state-of-the-art
mining tends to fail due to a computational explosion, and employ filtering to
reduce this. Our tool is not intended to provide user-readable models, but in-
stead to provide all constraints satisfied by a model. This allows post-processing
to weed out uninteresting constraints, potentially obtaining better resulting mod-
els than making filtering beforehand out of necessity. Any post-processing (and
complexity-reducing filtering) possible with existing miners is also possible with
the UnconstrainedMiner; our miner just allows more intelligent post-processing
due to having more information available, such as interactive filtering of models.
In our demonstration, we show how the new miner can handle large event logs in
short time, and how the resulting output can be imported into Excel for further
processing. Our intended audience is researchers interested in Declare mining and
users interested in abstract characterization of relationships between events. We
explicitly do not target end-users who wish to see a Declare model for a particular
log (but we are happy to demonstrate the miner on other concrete data).

Processes that are not perfectly understood or have little structure, are often easier mod-
eled using a declarative rather than a classical imperative approach. In a declarative ap-
proach, constraints between tasks are described rather than for each task specifying the
next task to execute. Declarative modeling is a more recent and less matured approach,
which has so far not found widespread application in industry yet. Declare [7] is an
emerging language studied in academia over the last decade.

Existing miners. Typically the complexity of checking a constraint is O m · nk · k! ,

where m is the number of traces in the log, n is the number of different events in the
entire log, and k is the number of parameters to the
constraint (number of ways to as-
sign n events to k ordered parameters – n Pk = nk · k! – times the number of traces k).
?
Support from the Basic Research Program of the National Research University Higher School
of Economics is gratefully acknowledged.
For the BPI challenge log from 20123 [1] the parameters are m = 13, 087 and n = 24,
yielding from 314, 088 (k = 1) to 66, 749, 981, 760 (k = 5) checks. ProM 6 has a
Declare miner which systematically checks each constraint and returns a model com-
prising constraints satisfied for a certain percentage of traces. Due to the complexity,
this miner employs a couple of tricks, including forcing users to pick among interest-
ing constraints [3], employing a priori reduction to avoid considering rarely occurring
events [3], and considering the relationships between constraints to avoid mining some
constraints [4]. Even so, the ProM Declare miner did not mine a single constraint such
as succession in 24 hours.
The assumption that rarely occurring Low risk
events need not be checked is crucially
dependent on the notion of support, i.e., • X X
that a constraint is only interesting if it High risk

is triggered. This is problematic in a case
such as Fig. 1. There we have a hypothet- Fig. 1: Model with a branch.
ical XOR-split and -join. If the top path
is chosen, constraints in the bottom part are never triggered, so their support is low,
even though those are arguably more interesting. If we want to mine without enforcing
high support, we cannot use a priori reduction. Exploitation of relationships between
constraints is problematic if we want to display a simpler model, e.g., by removing
constraints redundant due to transitivity. The problem here is that, generally, support of
a removed constraint cannot be derived from support of the remaining constraints.
The MINERful miner [2] uses regular expressions and global computations to mine
constraints. First statistics are computed for the log and subsequently constraints are
mined from these. The MINERful miner can mine all constraints for the 2012 BPI
challenge log in approximately 26 seconds, but only supports a subset of all constraints.
Computation of constraints from statistics makes it difficult to add new constraints, as it
is necessary to develop and prove rules for doing so. For example, it is far from obvious
how to extend this approach to also mine choices.

Removing all assumptions. Instead, we prefer to mine first and, subsequently, filter
with full information. Not only does this avoid the problem of support and branches,
it allows us full knowledge of all constraints, so we can remove redundant constraints
more intelligently using simple patterns. Finally, this allows us to provide users with a
slider and let them interactively (and locally) balance between simplicity and support.

Efficient preprocessing. We first make sure that our base mining algorithm is efficient.
We transform the log into a simple array of arrays of identifiers instead of actual events.
Declare constraints have a (finite) LTL semantics, which can be represented as a finite
automaton. Using a precomputed mapping between event identifiers and automaton
labels, we obtain a fast and memory-efficient replay of the log. We can mine most
of Declare’s 34 constraints (excluding 2 with four or more parameters) on the BPI
challenge log in 249 seconds using 12 MB of memory (including storing the log and all
3
We use the 2012 log as the 2013 log is much simpler with 819 ≤ m ≤ 7554 and 3 ≤ n ≤ 4.
results). We mine the same set of constraints as MINERful in 32 seconds (vs 26). This is
done automatically, making it easy to add constraints by entering their LTL semantics.

Symmetry reduction. We observe that some Declare constraints are symmetric in
their parameters; this is, for example, the case for the choice 1 of 2 constraint. By
only checking these constraints for one ordering of parameters, we halve the num-
ber of checks (and time needed). In general, this reduces the checking complexity to
O m · nk · k! ÷ sg∈SG |sg|! , i.e., divided by the faculty of the size of each sym-
Q

metry group sg ∈ SG. The number we divide by tends to get larger exactly when the
traditional approach has problems (i.e., when k is large). For example, the most com-
plex standard Declare constraint is choice 1 of 5, which yields the aforementioned
67 billion checks for the BPI challenge log. As this constraint is symmetric in all pa-
rameters, this is reduced to 556, 249, 848 checks. Our implementation can mine all 34
Declare constraints for the BPI challenge log in 287 seconds (so the reduction allows
us to check all constraints in the same amount of time we could only check the easy
ones before). We now tie with MINERful at 26 seconds (the reason for less reduction
is that MINERful does not handle any constraints where the reduction is large). For
the standard constraints, we never have more than two symmetry groups, which means
that as number of parameters go up, so does the reduction. We compute the symmetry
reduction automatically, making this compatible with any added constraint.

Parallelization. As our miner is simple, we do not need any global computations so
far. This means our miner is easy to parallelize. We can parallelize in the number of
constraints or the number of traces. With 34 constraints, we can mine each individually.
Declare constraints are not equal however, and indeed the choice 1 of 5 constraint is
responsible for more than 50% of the total time spent on the BPI challenge log, making
this strategy less compelling. As we generate the parameters for the mining determin-
istically, we can have each computation thread take care of this on its own, alleviating
this (but making mining more complex). A simpler approach, and the one we have cho-
sen to go with, is to split the log and have each computation thread handle part of the
log. As we have abolished a priori reduction, we do not need any central coordination
in this case, and our overhead is so small (< 1 second) we can just run the full min-
ing process for each part of the log. Our results can be directly added, making mining
scalable nearly infinitely (with constant overhead for preprocessing and aggregation).
This has the added benefit of allowing each thread to use less memory (it only needs
to store its own part of the log). Employing this allows us to mine all constraints of the
BPI challenge log in 174 seconds (using two cores; the reason scalability is not com-
pletely linear is due to the feature Turbo Boost on modern CPUs allowing one core on
a multi-core system to run at extra speed if other cores are inactive). We beat MINER-
ful at 19 seconds (vs 26). We have implemented this in a parallel setting (i.e., multiple
CPU cores on a single machine), but it is equally applicable in a distributed setting as
synchronization is only necessary for aggregating results at the end.

Super-scalar mining. Using the automaton representation from [6], it is possible to
check more than one constraint at a time. This comes at a cost of memory as the com-
bined automaton is larger than the individual automata, sometimes even exponentially
so, though in practise the size is always less than 1,000 states. We call this super-scalar
mining, using the term from CPUs where it refers to the ability to execute multiple
instructions simultaneously. We do not wish to break our symmetry reduction from be-
fore. Thus, we build a directed acyclic graph with constraints for nodes and arcs from
one constraint to another if the first implies the second holds. We then group by taking
each leaf and iteratively grouping it with all predecessors and successors not extend-
ing the set of parameters and not having incompatible symmetry groups (e.g., splitting
them up so [[A], [B]] is not compatible with [[A, B]]—it would split up the symmetry
group—but the other way around does hold—the symmetry group is already split). The
rationale is that adding new parameters increases complexity as does splitting symmetry
groups. Adding a constraint with larger symmetry groups does not increase complexity,
while the constraint with larger symmetry groups could be checked more efficient on its
own, adding to the super-scalar group comes at no extra cost. We check all such maxi-
mal groups. We compute this automatically and tailor it to any constraints selected and
any new constraints added. Using super-scalar mining with parallelism, we can mine the
entire BPI log in only 57 seconds and even without parallelism, we can mine the entire
log in just 92 seconds. We beat MINERful on their constraints at just over 4 seconds, an
improvement of a factor 6 over 26 seconds. On a server with 8 slower processing cores,
MINERful still runs in 26 seconds, whereas UnconstrainedMiner runs in 30 seconds on
the entire set of constraints. Using hyper-threading, the UnconstrainedMiner and MIN-
ERful tie in running time with the UnconstrainedMiner mining more and more complex
constraints. The gain obtained by super-scalar mining is independent of the log.

Fig. 2: Screen shots of our miner running.
Usage. In Fig. 2 (top) we see a screen-shot from the configuration of the mining
process. All options are saved between runs, making mining from scratch a breeze;
the number of threads is computed automatically as the number of cores in the local
computer. We can manually inspect the LTL semantics and constraint automaton at the
right. In Fig.2 (bottom left) we see the tool in the process of mining. The top progress
shows overall progress, and the next progress bars show overall and individual con-
straint progress for each computation thread. The text area is continuously updated with
information about what each thread is doing. At Fig. 2 (bottom right), we see the end
result along with total processing time. The end result is a list of all mined constraints,
which can be imported in Excel for further processing, as shown in Fig. 3.

Maturity, availability, screencast. Our miner is very new but so far very robust, and
uses the same underlying library as the Declare tool (for representing automata) and
ProM (for reading logs). The tool is written in Java and runs on all platforms supported
by Java. We are currently employing the tool to construct hybrid models containing
both declarative constraints and imperative modeling in a block-structured manner.
Our miner is available from tinyurl.com/declareminer along with source
code. That page also contains a screen cast of simple use of the tool.

References
1. DOI: 10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
2. Di Ciccio, C., Mecella, M.: A two-step fast algorithm for the automated discovery of declara-
tive workflows. In: CIDM 2013. IEEE (2013)
3. Maggi, F.M., Bose, R.P.J.C., van der Aalst, W.M.P.: Efficient discovery of understandable
declarative process models from event logs. In: CAiSE 2012. LNCS, vol. 7328, pp. 270–285.
Springer (2012)
4. Maggi, F.M., Bose, R.P.J.C., van der Aalst, W.M.P.: A knowledge-based integrated approach
for discovering and repairing declare maps. In: CAiSE 2013. LNCS, Springer (2013)
5. Pesic, M.: Constraint-Based Workflow Management Systems: Shifting Controls to Users.
Ph.D. thesis, Beta Research School for Operations Management and Logistics, Eindhoven
(2008)
6. Westergaard, M.: Better Algorithms for Analyzing and Enacting Declarative Workflow Lan-
guages Using LTL. In: Proc. of BPM. LNCS, vol. 6896, pp. 83–98. Springer (2011)
7. Westergaard, M., Maggi, F.M.: Declare: A Tool Suite for Declarative Workflow Modeling
and Enactment. In: Business Process Management Demonstration Track (BPMDemos 2011).
CEUR Workshop Proceedings, vol. 820. CEUR-WS.org (2011)

Fig. 3: The mining results imported into Excel.