A Novel Fitness Improvement Method for
                                         Mined Business Process Models

                                                       Yaguang Sun and Bernhard Bauer

                                 Software Methodologies for Distributed Systems, University of Augsburg, Germany
                                          {yaguang.sun,bernhard.bauer}@informatik.uni-augsburg.de


                                      Abstract. Business process model discovery (BPMD) is a significant
                                      research topic in the business process mining area. The present BPMD
                                      techniques encounter great challenges while dealing with real-life event
                                      logs that contain complex process behaviors. As a result, non-fitting pro-
                                      cess models might be obtained. In this paper, we propose a new mecha-
                                      nism for locating and handling the process behaviors recorded in event
                                      logs which cannot be expressed by the utilised model discovery algo-
                                      rithms.

                                      Keywords: Business Process Mining, Business Process Model Discov-
                                      ery, Process Model Fitness Improvement


                                1   Introduction
                                As one of the most important research directions in business process mining
                                area, the present business process model discovery (BPMD) techniques meet
                                great challenges while trying to mine process models from real-life event logs
                                that usually stem from the business processes implemented in highly flexible
                                environments, e.g., healthcare, customer relationship management (CRM) and
                                product development [1, 3]. Such real-life logs often contain complex process
                                behaviors and utilising existing BPMD techniques to mine these logs might gen-
                                erate non-fitting process models.
                                    The mining algorithm enhancement-based strategy has been put forward in
                                the literature for solving the problem of low-fitness process models mined from
                                real-life event logs. Such a strategy aims at improving the expressive ability of
                                existing BPMD algorithms or developing new algorithms that are able to model
                                more complex workflow patterns.
                                    In this paper, we develop a new method which inherits the basic idea of
                                mining algorithm enhancement-based strategy for assisting the present BPMD
                                techniques in generating high-fitness process models from real-life event logs. In
                                our method, the fitness improvement problem for the inaccurately mined process
                                models is surveyed from a new perspective and redefined as an issue of locating
                                the inexpressible process behaviors recorded in event logs and then transforming
                                them into expressible behaviors for the utilised BPMD algorithms. The structure
                                of the main contents of this paper is organised as:


                                Copyright c by the paper’s authors. Copying permitted only for private and academic
                                purposes.
                                In: S. España, M. Ivanović, M. Savić (eds.): Proceedings of the CAiSE’16 Forum at
                                the 28th International Conference on Advanced Information Systems Engineering,
                                Ljubljana, Slovenia, 13-17.6.2016, published at http://ceur-ws.org


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
26       Yaguang Sun and Bernhard Bauer

    - In Section 2, a novel method named IDC is put forward which is able to
      help the employed BPMD techniques mine more fitting process models from
      real-life logs.
    - In Section 3, we carry out the proposed method IDC on an example event
      log to test the correctness and effectiveness of it.
    - In Section 4, the related work from the academia is reviewed.
    - In Section 5, a conclusion together with the future problems that we are
      going to solve are elaborated.

2     Approach Design
We firstly introduce a method for detecting and structuring the process be-
haviors from real-life event logs in Section 2.1. Afterwards, based on the method
presented in Section 2.1 a heuristic technique named IDC for helping the utilised
BPMD algorithms mine more fitting process models is put forward in Section
2.2.

2.1    Construct Process Behavior Space (PBS)
Business process behaviors from real-life event logs cannot be analysed before
they are extracted and structured in an appropriate form. We solve the PBS
construction problem by providing the following two concepts:
Definition 1. (behavior-related activity)
Let SA be the set of activities for event log E. Let E represents a direct relation
between any two activities from SA. For instance, for two activities a, b ∈ SA,
a E b = true if activity a is directly followed by b at least once in E. Symbol
⇒E represents a behavior-based relation between any two activities from SA.
For two activities c, d ∈ SA, c ⇒E d = true if c E d = true or d E c = true
and d is also called a behavior-related activity of c.
Definition 2. (behavior-related sub-trace)
Let SA be the set of activities and ST be the multiset of traces for event log E.
Let t ∈ ST be a trace from E, st v t be a sub-trace of t and SAst be the set of
activities for st. Given an activity a ∈ SA, st is a behavior-related sub-trace of
a if ∀b ∈ SAst ∧ b 6= a such that a ⇒E b and ∃c ∈ SAst such that c = a.
     Take a simple event log E1 = {< l1 , m1 , n1 , x1 , y1 , z1 >15 , < l1 , n1 , m1 , y1 , z1 >15
, < l1 , m1 , x1 >3 , < l1 , n1 , m1 , z1 >5 } (utilised for the rest of Section 2) as an exam-
ple. According to Definition 1, the activity n1 from E1 has three behavior-related
activities which are activity l1 , m1 and x1 . Then, according to Definition 2, a set
of behavior-related sub-traces {< l1 , m1 , n1 , x1 >15 , < l1 , n1 , m1 >20 } for activity
n1 can be extracted from E1 . Let Ξ : (ST + , SA+ , SA) → SBT be a function
for finding all of the behavior-related sub-traces for a specific activity from a
given trace, where ST + represents the set of traces, SA+ represents the set of all
possible sets of behavior-related activities, SA represents the set of activities and
SBT stands for the set of all possible multisets of behavior-related sub-traces.
The PBS construction procedure is described in Algorithm 1.
   A Novel Fitness Improvement Method for Mined Business Process Models            27

Algorithm 1 Construct the PBS for a specific event log E (CPBS)
Input: an event log E, the set of activities SA for E
    Let BA be a set of behavior-related activities.
    Let BT be a set of behavior-related sub-traces.
    Let PBS be a set of sets of behavior-related sub-traces.
 1: BA ← null, BT ← null, PBS ← null
 2: for each activity a ∈ SA do
 3:   for each trace t1 ∈ E do
 4:      for each activity b ∈ t1 do
 5:        if b ∈
                / BA && a ⇒E b = true then
 6:           BA ← BA ∪ b
 7:        end if
 8:      end for
 9:   end for
10:   for each trace t2 ∈ E do
11:      BT ← BT ∪ Ξ(t2 , BA, a)
12:   end for
13:   PBS ← PBS ∪ BT
14:   BA ← null
15:   BT ← null
16: end for
Output: a set of sets of behavior-related sub-traces PBS


2.2   Detection and Conversion of Inexpressible Process Behaviors

In this subsection, we firstly define a new concept called activity environment
item. Afterwards, a new algorithm named IDC that is able to assist in detect-
ing and converting the inexpressible process behaviors related to one particular
activity is presented.

Definition 3. (activity environment item)
Let SA be the set of activities from event log E, activity a, b and c are three
activities from SA, the tuple (b, c) is an environment item of activity a if ∃t ∈ E
such that < b, < a . . . >, c >v t, where t stands for a trace from E and < a . . . >
represents a sub-trace that only consists of activity a.

    Take the event log E1 mentioned in Section 2.1 as an example. According to
Definition 3, the activity n1 ∈ E1 has two types of environment items which are
(m1 , x1 ) and (l1 , m1 ). Activity l1 ∈ E1 also has two types of environment items
that are (null, m1 ) (the value null indicates that l1 appears as a starting activity)
and (null, n1 ). Let Θ : (SA, SBT ) → SEI + be an environment item searching
function for a specific activity, where SA represents the set of activities, SBT
represents the set of all multisets of behavior-related sub-traces and SEI + stands
for the set of all sets of environment items. Let Φ : (SA, SEI, SBT, SNA) → SBT
be an activity conversion function, where SEI stands for the set of environment
items and SNA represents the set of activity names. Given an activity, function
Φ is capable of converting this activity into a new activity (with a new name)
28      Yaguang Sun and Bernhard Bauer

Algorithm 2 Inexpressible behaviors detection and conversion (IDC)
Input: an activity a from event log E, the PBS for log E, a model fitness threshold
         ε, a model fitness improvement threshold η
    Let BSTa and BSTt be two sets of behavior-related sub-traces.
    Let SEIa and SEIr be two sets of environment items.
    Let f1 and f2 be two variables of type Double.
    Let i be a variable of type Integer.
    Let eit be an environment item.
 1: f1 ← 0, f2 ← 0, i ← 0
 2: eit ← null, BSTa ← null, BSTt ← null, SEIa ← null, SEIr ← null
 3: BSTa ← PBS[a] # collect the set of behavior-related sub-traces for a from PBS
 4: SEIa ← Θ(a, BSTa ) # find the set of environment items for activity a
 5: f1 ← Σ(Ω(BSTa ), BSTa ) # calculate the fitness of the model mined from BSTa
 6: while (|SEIa | > 0 && f1 < ε) do
 7:    for each environment item eia ∈ SEIa do
 8:       BSTt ← Φ(a, eia , BSTa , ai ) # transform a into ai under environment eia
 9:       if (Σ(Ω(BSTt ), BSTt ) > f2 ) then
10:          f2 ← Σ(Ω(BSTt ), BSTt )
11:          eit ← eia
12:       end if
13:    end for
14:    if (f2 − f1 > η) then
15:       f1 ← f2
16:       BSTa ← Φ(a, eit , BSTa , ai )
17:       SEIr ← SEIr ∪ eit
18:       remove eit from SEIa
19:       i←i+1
20:    else
21:       break
22:    end if
23: end while
Output: a set of environment items SEIr for activity a.


under a certain environment in all of the behavior-related sub-traces of the given
activity. For the activity n1 ∈ E1 , given an environment item ei = (m1 , x1 )
of n1 , the set of behavior-related sub-traces BSTn1 = {< l1 , m1 , n1 , x1 >15 , <
l1 , n1 , m1 >20 } of n1 and a new activity name n1 0 , Φ(n1 , ei, BSTn1 , n1 0 ) = {<
l1 , m1 , n1 0 , x1 >15 , < l1 , n1 , m1 >20 }. Let SST be the set of all possible multisets
of traces, Ω : SST → SM be a workflow discovery algorithm, where SM is the
set of process models. Σ : (SM, SST ) → SV represents a process model fitness
evaluation schema with an input of a process model and its relevant set of traces
and an output of an assessed value from SV (the set of all possible values output
by Σ). The details of IDC is described in Algorithm 2.
      For a specific activity a from log E, the algorithm IDC firstly obtains the
set of behavior-related sub-traces BSTa for a through step 3, the set of environ-
ment items SEIa for a by step 4 and the fitness of the behavior-related model
(by mining the generated set of behavior-related sub-traces BSTa ) of a by step
    A Novel Fitness Improvement Method for Mined Business Process Models        29

5. Afterwards, steps 6 − 13 try to find the environment item of a under which
converting a into ai in the set of behavior-related sub-traces of a will help gen-
erate a behavior-related model (by mining BSTt ) with maximal fitness and the
discovered environment item is put in eit and the value of the maximal fitness
is stored by f2 . Then, it is checked if the fitness improvement acquired is larger
than or equal to a threshold η by step 14. If this is the case, the environment
item is put in set SEIr (step 17) which will be finally output by IDC. The orig-
inal set of behavior-related sub-traces is updated (step 16) by transforming the
relevant activity under the found environment which is removed from SEIa later
by step 18 and IDC continues to search for the next environment item if SEIa
is not empty and the fitness of the new behavior-related model is less than a
target value ε. Otherwise, the technique IDC stops.


3     Evaluation

In our experiment, we utilise the Heuristics Miner (HM) [7] from ProM 61 for
mining process models from event logs. At present, several application instances
[9–14] of the mining algorithm enhancement-based strategy (introduced in Section
1) have been put forward in the literature. These proposed methods are able to
help mine high-fitness process models expressed by Petri net [15]. However, few
efforts have been made to help the HM mine high-fitness models. According
to [16], the HM which expresses process model by Heuristics net is one of the
most popular BPMD tools in the ProM framework and this is why we adapt our
technique to HM for the evaluation. Adapting our technique to other BPMD
algorithms will be a future job. Furthermore, we use the ICS fitness [8] for
assessing the accuracy of the mined models because it has a computationally
efficient calculative process.
     We tested the correctness and effectiveness of our technique by utilising an
example event log Ee . As shown in Figure 1, log Ee has nine activities and 523
traces and the process model generated by implementing the HM on Ee has a
low ICS fitness value which is 0.7748.
     In the process of our experiment, we firstly utilise the algorithm CPBS (in-
troduced in Section 2.1) to build the PBS for log Ee . Afterwards, the algorithm
IDC (introduced in Section 2.2) is used to analyse the built PBS for discovering
the activities together with their relevant environment items that are the main
factors to turn out the inexpressible process behaviors for HM in log Ee . The
model fitness threshold ε is set to 1 and the model fitness improvement threshold
η is set to 0.3 for IDC.
     According to Figure 2, the method IDC finds that the activity F under
environment (D, E) and (H, I) in log Ee is the key factor for generating the
process behaviors that cannot be expressed by HM. With this discovery from
IDC, we replace the activity F under environment (H, I) by a new activity
named F0 and the activity F under environment (D, E) by using F1 in log Ee so
1
    http://www.promtools.org.
30        Yaguang Sun and Bernhard Bauer


     An Example Event Log : Ee                          Process Model Generated by Heuristics Miner

          <A, B, C, D, F, E, G>1

       <A, B, C, D, E, H, F, I, G>64

       <A, B, C, D, E, F, H, I, G>2                                        D
      <A, B, C, D, E, H, I, F, G>127      A         B         C                     E       H         I
                                                                                                              G
          <A, B, C, D, E, F, G>1                                           F

       <A, B, C, F, D, E, H, I, G>78
      <A, B, C, D, F, E, H, I, G>247
                                                    ICS Fitness : 0.7748
        <A, B, C, D, E, H, I, G>3


Fig. 1. The example event log Ee and the business process model mined from Ee by
HM.


that a new log Ee∗ (in Figure 2) can be generated. As shown in Figure 2, a more
fitting process model (with a fitness value 0.9987) is output by HM executed on
the newly generated log Ee∗ . This positive evaluation results indicate that our
technique IDC successfully locate the inexpressible process behaviors from Ee
for HM.


     Newly Generated Log : Ee*                          Process Model Generated by Heuristics Miner

         <A, B, C, D, F1, E, G>1

      <A, B, C, D, E, H, F0, I, G>64
                                                                                E       H        I
       <A, B, C, D, E, F, H, I, G>2
                                                                                            F0
                                    127                                                                       G
      <A, B, C, D, E, H, I, F, G>                                 D        F1
                                1                                                                         F
         <A, B, C, D, E, F, G>            A     B         C

      <A, B, C, F, D, E, H, I, G>78
      <A, B, C, D, F1, E, H, I, G>247
                                              ICS Fitness : 0.9987
        <A, B, C, D, E, H, I, G>3


Fig. 2. The newly generated event log Ee∗ and the business process model mined from
Ee∗ by HM.


4    Related Work
In the literature, several effective techniques stemming from the mining algo-
rithm enhancement-based strategy have been devised. Most of these presented
techniques are developed to mine fitting process models expressed by Petri net.
    A Novel Fitness Improvement Method for Mined Business Process Models       31

The authors in [10] put forward the State-based Region Miner which is able to
generate different categories of place-irredundant Petri nets from the transition
systems and assure the fitness of the generated Petri nets. In [9], the authors
point out that the BPMD approaches partly based on heuristic methods cannot
guarantee an accurate mining results. Then, they develop a series of methods
for adapting the technique of Petri net synthesis to the BPMD area so that
the advantage from the Petri net synthesis can be utilised to mine high fitness
process models from real-life event logs. The authors in [11] propose the ILP
Miner based on the idea that places can restrict the possible firing sequences
of a Petri net. The ILP Miner assures to mine highly fitting Petri nets whose
sizes are independent of the number of events in the event logs. The authors
in [12, 13] put forward the Inductive Miner which guarantees to return a fitting
model that is also sound. The process model repair technique is presented by
the authors in [14] for generating a high-fitness process model. Such a technique
tries to divide the original log into several sublogs with non-fitting subtraces.
For every sublog, a sub-model is obtained and added to the original model at
the appropriate place.


5    Conclusion

In this paper, we mainly proposed a technique IDC for helping detect and con-
vert the inexpressible process behaviors for the utilised BPMD techniques from
particular real-life event logs. Through the evaluation results from Section 3, we
demonstrated the effectiveness of our technique.
    A fitting model might still be complex. Our future job will be focused on
designing a new trace clustering technique [2–6] that contains two steps where
the first step focuses on building sub-models with lower complexity and the
second step focuses on improving the accuracy of the mined sub-models by using
the technique proposed in this paper. In the meantime, we will also validate our
method on some other real-life cases.


References

 1. W.M.P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement
    of Business Processes. Springer, Berlin, 2011.
 2. M. Song, C.W. Gnther and W.M.P. van der Aalst. Trace Clustering in Process
    Mining. In Business Process Management Workshops 2008, volume 17 of Lecture
    Notes in Business Information Processing, pages 109-120. Springer, Berlin.
 3. J.D. Weerdt, S. vanden Broucke, J. Vanthienen and B. Baesens. Active Trace
    Clustering for Improved Process Discovery. IEEE Transactions on Knowledge and
    Data Engineering, 25(12):2708-2720, 2013.
 4. Y.G. Sun and B. Bauer. A Novel Top-Down Approach for Clustering Traces. In
    Advanced Information Systems Engineering (CAiSE 2015), volume 9097 of Lecture
    Notes in Computer Science, pages 331-345.
32      Yaguang Sun and Bernhard Bauer

 5. C.C. Ekanayake, M. Dumas, L. Garcia-Banuelos and M. La Rosa. Slice, Mine
    and Dice: Complexity-Aware Automated Discovery of Business Process Models.
    In Business Process Management (BPM 2013), volume 8094 of Lecture Notes in
    Computer Science, pages 49-64.
 6. L. Garcia, M. Dumas, M.L. Rosa, J.D. Weerdt and C.C. Ekanayake. Controlled
    Automated Discovery of Collections of Business Process Models. Inf. Syst., 46:85-
    101, 2014.
 7. A.J.M.M. Weijters, W.M.P. van der Aalst and A.K. Alves de Medeiros. Process
    Mining with the Heuristics Algorithm. TU Eindhoven, BETA Working Paper Series
    166, 2006.
 8. A.A. de Medeiros. Genetic Process Mining. PhD. thesis, Eindhoven University of
    Technology (2006).
 9. R. Bergenthum, J. Desel, R. Lorenz and S. Mauser. Process Mining Based On
    Regions of Languages. In Business Process Management (BPM 2007), volume 4714
    of Lecture Notes in Computer Science, pages 375-383.
10. J. Cortadella, M. Kishinevsky, L. Lavagno and A. Yakovlev. Deriving Petri Nets
    from Finite Transition Systems. IEEE Transactions on Computers, 47(8):859-882,
    1998.
11. J. van der Werf, B. van Dongen, C. Hurkens and A. Serebrenik. Process Discovery
    Using Integer Linear Programming. Applications and Theory of Petri Nets, pages
    368-387, 2008.
12. S.J.J. Leemans, D. Fahland and W.M.P. van der Aalst. Discovering Block-
    Structured Process Models from Event Logs-A Constructive Approach. In: Petri
    Nets, volume 7927 of Lecture Notes in Computer Science, pages 311-329, 2013.
13. S.J.J. Leemans, D. Fahland and W.M.P. van der Aalst. Discovering Block-
    Structured Process Models from Event Logs Containing Infrequent Behaviour. In
    Business Process Management Workshops 2013, volume 171 of Lecture Notes in
    Business Information Processing, pages 66-78.
14. D. Fahland and W.M.P. van der Aalst. Repairing Process Models to Reflect Reality.
    In Business Process Management (BPM 2012), volume 7481 of Lecture Notes in
    Computer Science, pages 229-245.
15. T. Murata. Petri Nets: Properties, Analysis and Applications. Proc. of the IEEE,
    77(4):541-580, 1989.
16. J. Claes and G. Poels. Process Mining and the ProM Framework: An Exploratory
    Survey. In Business Process Management Workshops 2012, volume 132 of Lecture
    Notes in Business Information Processing, pages 187-198.