=Paper= {{Paper |id=Vol-2454/paper_73 |storemode=property |title=Temporal Deviations on Event Sequences |pdfUrl=https://ceur-ws.org/Vol-2454/paper_73.pdf |volume=Vol-2454 |authors=Janina Sontheim,Florian Richter,Thomas Seidl |dblpUrl=https://dblp.org/rec/conf/lwa/SontheimR019 }} ==Temporal Deviations on Event Sequences== https://ceur-ws.org/Vol-2454/paper_73.pdf
        Temporal Deviations on Event Sequences
               Janina Sontheim, Florian Richter, and Thomas Seidl
                  Ludwig-Maximilians-Universität, Munich, Germany
                    {sontheim, richter, seidl@dbs.i.lmu.de}




        Abstract.    Time deviations in business processes - depending on gradi-
        ent and severity - are crucial for the performance of a business and nally
        leading to a gain or a loss in return and reputation. Therefore, focusing
        on the time perspective of processes in general is very important. Even
        more important is the temporal behavior of a single execution, a case, to
        nd diculties and potential in the process. The identication of cases
        that dier from the default execution allows to understand individual
        instances.
        Conservative data mining on event sequences, so called process min-
        ing, exclusively focuses on structural aspects of the process. These ap-
        proaches, however, are unaware of temporal aspects regarding acceler-
        ations or decelerations of activity execution times and neglect a very
        powerful adjusting screw. Our novel signature for cases tackles this task
        by representing cases depending on their temporal deviation behavior.
        Thus processes with their cases can be monitored on a entirely new level
        and anomalies and derivations regarding time can be identied.
        Keywords:    Process Mining · Case Proling · Time Deviation.


1     Introduction
Determining the conformance of a singular case is one of the three key tasks
in process mining beside process discovery and model enhancement. It conrms
case compliance to the assumed underlying process model structure. Agrawal et
al. were the rst who tackled this topic in process mining in [2]. Senderovich et
al. dealed in [6][5] with mining of process delays on event-level. They focused on
structural deviations from the baseline process which is also the issue with all
current conformance checking approaches and neglect thereby the temporal per-
spective. We focus on the temporal perspective and thus improve the spectrum
of mining possibilities in the eld of conformance checking.
    For example considering a manufacturing process with some artisans produc-
ing chairs, as skeched in Fig. 1. Every artisan crafts components rst. Then they
assemble all the pieces to build the whole chair. Finally, the chair is checked
and deciencies get remedied. In the example in Fig. 1 there are two prominent
variants which do temporally not conform to the preceding cycles. Here the so
far published conformance checking approaches can only detect that both cases
act conform to the desired process.
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2       J. Sontheim, F. Richter et al.

         Origin Process:
         100%   Craft Components           Assemble Chair        Remedy Deficiencies

         Variants:
          81%   Craft Components -1        Assemble Chair        Remedy Deficiencies   +1

          19%   Craft Components      +2   Assemble Chair   -1   Remedy Def.     -2


Fig. 1. Partitioning the process into two variants reveal, that a minority of cases needs
less time and should be selected as a best practice for future cases.


    Another process mining key task is model enhancement. Of our special inter-
est is the direction regarding the temporal perspective, for which Cheikhrouhou
et al. wrote a survey [3] outlining existing approaches and ongoing research chal-
lenges. A subarea of it is time prediction. Van der Aalst et al. presented in [1]
an approach to predict the completion time of running instances. Though time
predictions are currently just for single events ignoring the existence of time cor-
relation between events of the same case. In our previous example e.g. there is
a dependency between the event Crafts Components and Remedy Deciencies.
With our temporal deviation signatures this eld gets a new perspective due to
considering event dependencies.
    One of our strongest areas of application is clustering which is a further
topic in process mining. Related to clustering is the topic of anomaly detection.
Rogge-Solti and Kasneci covered in [4] an anomaly detection approach with a
temporal perspective, though it is done after the process model is created from
the log whereby it can just nd temporal anomalies regarding the process model.
Our approach uses the log le to calculate the signature and hence it does not
contain any process modeling errors. Our signature can support clustering with
an additional view. Each case might be dierent but there are other cases that
are similar to the viewed case. Using our new signature for cases we are able
to dene a clustering regarding time aspects and thus among others remaining
time prediction will become more precise.
    Initially we focus on clustering of cases regarding temporal aspects. Therefore
we urgently need a signature for cases to be able to cluster them. While repre-
sentations of structural deviations of cases are researched in process mining, a
representation of temporal signatures for cases was not investigated to the best
of our knowledge.

2    Case Signatures for Temporal Deviations
To formally introduce events and logs we start with dening the activity space
A as the set of actions which can occur during a process execution. An event
e = (c, a, t) is then dened as an aggregation of a case identier c ∈ N, an activity
a ∈ A and a timestamp t ∈ N. The event space is denoted as E . The set of all
events containing the same case identier is called case. Some abbreviations will
help to keep the following explanations clearer: For any event e = (c, a, t) we
                                       Temporal Deviations on Event Sequences              3

                  strong
               deceleration
                 1             weak              no deviation / no occurrence
                            deceleration
                    ab
               aa             ad ba bb bc        ca cb     cc        da db dc dd
                         ac
                                            bd                  cd
                      weak                         medium
                -1 acceleration                  acceleration

Fig. 2. The graphical representation of a temporal deviation signature. The signature
contains all possible pairs for four activities a, b, c, d, plotted on the x-axis. The y-axis
corresponds to the z-scoring.



dene e.c = c, e.a = a and e.t = t. A log L is a (multi-)set of events resp. a set
of cases.
    We compute time intervals between pairs of activities within the same case.
Only considering directly consecutive activities should be avoided due to po-
tential temporal correlations. Instead we consider all succeeding activity pairs
within a case. This step requires quadratic eort. If the application consists of
rather long cases, it is reasonable to trade performance for accuracy. Each trace
can now be mapped to a vector of durations, as shown in Fig. 2, where blue bars
on top of the zero line indicate longer durations and red bars below indicate
shorter durations relative to the process means. The dimensions of the vectors
dier depending on the case lengths.
    As the aim of the signature is among others to support nding clusters of
traces it is very convenient that any clustering method for vector data can be
applied at this state. Since we are especially interested in abnormal temporal
behavior we use a normalization focusing on deviations. Thus we use z-scoring
as it puts emphasis on the degree of deviation by normalizing with an attributes'
standard deviation and mean value. Although this is mostly useful for Gaussian
distributed values, it works quite ne in process data due to the large amount
of events in most process logs. This allows us to utilize the central limit theo-
rem of statistics if we assume that activities among dierent cases are mostly
independently and identically distributed. Working with values relative to their
variance instead of absolute values counters vastly the balancing of dimensions.
    In many applications the gradient of small deviations is more important
than large dierences between large values. A small deviation of few minutes
can already point towards a major problem while it does rarely matter if an
event is delayed by 12 hours or by 24 hours. Here we use again a simple method
by applying a sigmoid function. The particular choice is not very important so
we apply the fastest one: S(x) = x/(1 + |x|). Applying these steps leads us to
a vector representation of a case containing temporal and structural properties.
One should keep in mind that the common vector space of all process instances
has a very high dimension although each case exists only in a lower dimensional
subspace.
4          J. Sontheim, F. Richter et al.

           Case       Activity     Timestamp
       1
       2
                  A
                  A
                                 0
                                 0
                                               C1   A               B                       C
       3          A              0
       4
       5
                  A
                  A
                                 0
                                 0
                                               C2   A                       B                   C
       3          B              2.8
       1
       5
                  B
                  B
                                 3.3
                                 4.4
                                               C3   A           B                       C
       2          B              4.5
       4
       3
                  B
                  C
                                 5.0
                                 7.3
                                               C4   A                       B                       C
       1          C              8.1
       2
       4
                  C
                  C
                                 8.9
                                 9.2
                                               C5   A                   B                           C
       5          C              10.0
                                                    0   1   2   3       4       5   6   7   8   9


Fig. 3. An example process log contains 5 short traces with the same structure but
dierent temporal behavior, although the start time is always the same. This log is
sorted by the timestamp.


    Given a process with activity space A we call R = [−1, 1]|A×A| the deviation
signature space. Let µ be the mean, σ the standard deviation of the chosen dis-
tribution, and (c, ai , ti ), (c, aj , tj ) ∈ C . Then we dene the temporal deviation
signature of case c, with c = h(c, a1 , t1 ), . . . , (c, an , tn )i, as the vector vc ∈ R:
                                           
                                          S |ti −tj |−µ(ai ,aj )
                                                                  
                                                  σ(ai ,aj )        ,i < j
                          vc (ai , aj ) =
                                          0                        , otherwise.

    To illustrate the previous steps in an example we give a small sample log
in Figure 3 with a schematic representation. The mean value and the standard
deviation of all activity pairs is then computed: µ(a,b) = 4.0, σ(a,b) = 0.82,
µ(a,c) = 8.7, σ(a,c) = 0.93, µ(b,c) = 4.7, σ(a,b) = 0.49. For case 3 we compute the
temporal deviation signature consisting of the activity pairs (ab, ac, bc). The rel-
evant interim times are (2.8, 7.3, 4.5) and the z-scoring is (−1.47, −1.51, −0.41).
After the application of S we receive the nal temporal deviation signature
(−0.59, −0.60, −0.29). For all cases 1 to 5 of this example process the temporal
deviation signatures are shown in comparison ordered as (ab, ac, bc):
                                                         
      −0.46         0.38         −0.59         0.55          0.33
v1 = −0.39 v2 =  0.18  v3 = −0.60 v4 =  0.35  v5 = 0.58
       0.17        −0.38         −0.29        −0.51          0.65


3    Research Directions
With the temporal deviation signature we present a novel process representation
which puts emphasis on the very important temporal view. The identication
of temporal deviations reveals great insights and improves a process or avoids
drawbacks. For future directions a clustering of cases with similar temporal de-
viation signatures can be modeled to reveal various clusters of process variants
across a whole process which leads to great possibilities for businesses.
                                    Temporal Deviations on Event Sequences          5

References
1. Van der Aalst, W.M., Schonenberg, M.H., Song, M.: Time prediction based on
   process mining. Information Systems 36(2), 450475 (2011)
2. Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workow
   logs. In: EDBT. pp. 469483 (1998)
3. Cheikhrouhou, S., Kallel, S., Guermouche, N., Jmaiel, M.: The temporal perspective
   in business process modeling: a survey and research challenges. Service Oriented
   Computing and Applications 9(1), 7585 (2015)
4. Rogge-Solti, A., Kasneci, G.: Temporal anomaly detection in business processes. In:
   International Conference on Business Process Management. pp. 234249. Springer
   (2014)
5. Senderovich, A., Weidlich, M., Gal, A.: Temporal network representation of event
   logs for improved performance modelling in business processes. In: International
   Conference on Business Process Management. pp. 321. Springer (2017)
6. Senderovich, A., Weidlich, M., Yedidsion, L., Gal, A., Mandelbaum, A., Kadish, S.,
   Bunnell, C.A.: Conformance checking and performance improvement in scheduled
   processes: A queueing-network perspective. Information Systems 62, 185206 (2016)