    Know What You Stream: Generating Event Streams
            from CPN Models in ProM 6

                   S.J. van Zelst, B.F. van Dongen, and W.M.P. van der Aalst

                           Department of Mathematics and Computer Science
                          Eindhoven University of Technology, The Netherlands

          Abstract. The field of process mining is concerned with supporting the analy-
          sis, improvement and understanding of business processes. A range of promising
          techniques have been proposed for process mining tasks such as process discov-
          ery and conformance checking. However there are challenges, originally stem-
          ming from the area of data mining, that have not been investigated extensively
          in context of process mining. In particular the incorporation of data stream min-
          ing techniques w.r.t. process mining has received little attention. In this paper,
          we present new developments that build on top of previous work related to the
          integration of data streams within the process mining framework ProM. We have
          developed means to use Coloured Petri Net (CPN) models as a basis for event-
          stream generation. The newly introduced functionality greatly enhances the use
          of event-streams in context of process mining as it allows us to be actively aware
          of the originating model of the event-stream under analysis.

          Keywords: Process mining, event-streams, coloured Petri nets, ProM, CPN Tools

1        Introduction
We assume the reader to be familiar with the basics of process mining and refer to [1]
for a detailed overview of the field.
     Streams of events have rarely been studied within the field of process mining. Given
the current state of art in information technology we are able to store huge quantities
of information related to business process execution. However, classical analysis tech-
niques are not able to cope with these quantities of data, i.e. within the ProM Frame-
work [2]1 it is currently impossible to analyze an event log which size exceeds the
computer’s physical memory. Moreover, some business processes owners simply do
not gain anything from static posteriori analysis. As an example, a chip manufacturer
is interested in process deviation detection upon production of a batch of chips rather
than after shipment of the batch to a customer. Thus, treating business process data as a
dynamic sequence of events rather than a static event log is a natural next step.
     In [3], we presented a standardized approach that extends the ProM framework with
basic support for handling streaming data. The approach allows us to connect dynamic
and volatile sources of data to the ProM framework. Although a preliminary imple-
mentation of a connection to an external source was provided, the framework has not

boosted the integration of stream based analysis within the process mining community.
A potential cause of this could be explained by the lack of existing quality metrics for
process models learned over streams. In this paper we present the integration of CPN
Tools [4]2 with the existing stream framework within ProM. CPN Tools can be used
to model, execute and analyze any type of discrete event systems in terms of Colored
Petri Nets [5]. The connection to CPN Tools provides great flexibility w.r.t. generating
streams, and, allows us to generate streams of which the actual process model is known.
This in turn can greatly help in further development of established quality metrics for
stream based process mining.
    The remainder of this paper are organized as follows. In Section 2 we briefly touch
upon the architecture and implementation of the newly developed integration. In Sec-
tion 3 we demonstrate the use of the integration by means of an explanatory case study.
Section 4 concludes the paper and provides pointers to interesting topics for future
work. Additionally, we have recorded a screen-cast in which we discuss the integration
in more detail, based on the case study 3 .

2        Architecture and Implementation
The core of the integration of CPN Tools with the ProM stream framework is an author
entity [3] that generates events and emits these onto a designated stream. The underlying
connection to CPN Tools, i.e. for the purpose of simulation of a CPN Model, is handled
by the Access/CPN framework [6, 7].
    In order to generate events, the author entity needs a CPN model, an initial marking
of the CPN model, a simulator object (SO) and a parameters object (PO). The PO
specifies certain parameters of the author:
    – The total number of times that the model should be executed starting from the initial
      marking, denoted by rmax .
    – The maximum number of steps within a single execution, denoted by smax .
    – The emission rate of the author by specifying a delay in between the emission of
      two consecutive packets, denoted by er .
    – The case identification technique. This property specifies which transitions will be
      emitted on the stream upon firing and how the corresponding events are identified.
      Currently, we have implemented two approaches being repetition based and CPN
      variable based.
    – Event decoration. We can choose whether we want to emit all variables associated
      to the firing of a transition within a data packet or only the core elements, being the
      trace identifier and the event name.
    In the repetition based case, each repetition of an execution of the CPN Model
is used as a basis for identifying a case. Thus all transitions fired in the first repeti-
tion will have 1 as a case identifier, all transitions fired in the second repartition will
(a) CPN Model suitable for repetition based   (b) CPN Model suitable for CPN variable based
case identification technique.                case identification technique.

Fig. 1: Two CPN Model fragments representing different examples of case identification

have 2 as a case identifier etc. In this identification technique, every transition that is
fired will be emitted as an event where the transition name acts as an event name. As
an example of a CPN Model suitable for a repetition based case identification tech-
nique, consider Figure 1a. Within the CPN model we have defined two variables of
type INT, i.e. var i,j: INT;. An example stream originating from the CPN Model,
where rmax , smax ≥ 2, including event decoration could be Sr = h {trace=1,
concept:name=t1, i=1}, {trace=1, concept:name=t2, j=1}, {trace=2,
concept:name=t1, i=1}, {trace=2, concept:name=t2, j=1}, ...i. Note that
within the repetition based case, first all events related to trace 1 are emitted before
events related to trace 2 are emitted, i.e. cases do not run concurrently.
    In the CPN variable based approach, the user specifies a specific variable present
within the CPN model to act as a case identifier. In this case, only those transitions
that fire and that have the specified variable associated will be emitted to the event-
stream. Consider Figure 1b which depicts a CPN model suitable for CPN variable based
case identification. Again we have defined two variables, i.e. var i,j: INT;. If we
define variable i as the trace identification variable, given rmax ≥ 1, smax ≥ 3,
a possible stream originating from the CPN Model could be Sv = h {trace=1,
concept:name=t1, i=1}, {trace=2, concept:name=t1, i=2}, {trace=3,
concept:name=t1, i=3}, ...i. Note that using CPN variable based case identifica-
tion allows us to hide certain transitions present within the model from the stream, i.e.,
transition t2 will never be emitted to the stream as it uses variable j.
    All graphical components w.r.t. the author entity are inherited from the streaming
framework presented in [3]. The only new graphical component of the Stream/CPN
integration framework is the author configuration screen which provides means to se-
lect rmax , smax , er , the case identification technique, and the event decoration. For an
impression of the UI of the plug-in we refer to the screen-cast accompanying this paper.

3       Case Study
As an explanatory case we have designed a hierarchical CPN model that is used as a
basis for stream generation. The model consists of one root model and two sub models.
The root model is depicted in Figure 24 . The CPN model consists of two variables,
i.e. var trace, ignore: INT. The initial marking of the root model is one token of
        The CPN model can be found at: https://svn.win.tue.nl/repos/prom/Packages/
    Fig. 2: Root CPN Model of the hierarchical model used within the case study

  (a) CPN sub-model executed in case of a   (b) CPN sub-model executed in case of a to-
  token with an even INT value.             ken with an odd INT value.

                Fig. 3: Two CPN sub-models used within the case study

colset INT, i.e. 1‘1, in place source. The transition labeled start is connected to
place source and acts as a token generator. In its binding it uses the trace variable.
If transition start fires, it produces a token with the value of trace in place p1 and
it produces a token with value trace + 1 in place source. All tokens with an even
INT value will be routed to the sub-model named sub even whereas all tokens with
an odd INT value will be routed to the sub-model named sub odd. In routing to the
sub-models the variable ignore is used. The two sub-models are depicted in Figure 3.
    After importing the hierarchical model in the ProM framework, we configure an
event-stream with the following parameters: rmax = 1, smax = ∞, er = 50 ms.,
case identification = CPN Variable with value trace and event decoration is true.
After the event-stream object is created we connect a stream-based implementation of
the Inductive Miner [8]. After receiving a number of events, the stream-based Inductive
Miner returns the Petri net depicted in Figure 4.

Fig. 4: Result of applying a stream-based implementation of the Inductive Miner to the
event-stream generated by the hierarchical CPN model.
    Although the stream-based miner is not able to discovery hierarchy the resulting
model aligns rather acceptable with the input model, i.e., from a control-flow perspec-
tive it exactly describes all possible traces that are emitted onto the event-stream.

4   Conclusion
The newly presented CPN extension of the stream framework within ProM enhances
researchers, business users, and developers to experiment with the concept of streaming
data within a process mining context. The extension allows the user to import a CPN
model, using any concept present within CPN tools, i.e. time, hierarchy etc., within
ProM. The user is able to specify several parameters of the accompanying stream such
as emission rates, event decoration and the trace identification technique.
    An interesting direction for future work concerns support for the use of multiple
case identification variables. This allows us to discover multiple perspectives of the
model under study. Another interesting direction is the development of a stream evalua-
tion framework which allows us to manipulate certain elements of the stream, e.g. case
arrival rates, throughput time, etc., in order to investigate the impact of these parameters
w.r.t. the stream-based algorithm under study.

