<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BPIC 2013: Volvo Incident and Problem Management Behavior Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Arias</string-name>
          <email>michael.arias@uc.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Rojas</string-name>
          <email>eurojas@uc.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pontificia Universidad Católica de Chile Computer Science Department Vic. Mackenna 4860</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This essay has the purpose of presenting the results of a work performed as part of Third International Business Process Intelligence Challenge. This challenge presents an event log from Volvo IT Belgium Company related with incident and problem management, focusing on a couple process owner´s questions. The authors of this document present the analysis realized applying different kind of tools and process mining techniques in order to solve the challenge presented. We provide an analysis, which discovered behavior characteristics, associated with products, resources and organizational lines. The results obtained provide useful information that Volvo can use to have more knowledge about the process that they are executing and have more information to make decisions and improve the actual process.</p>
      </abstract>
      <kwd-group>
        <kwd>Process mining</kwd>
        <kwd>Volvo IT Belgium</kwd>
        <kwd>Business Process Intelligence Challenge</kwd>
        <kwd>incident and problem management</kwd>
        <kwd>IT organization</kwd>
        <kwd>products</kwd>
        <kwd>and support teams</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The organizations have evolved over the years. With them, the information systems have become in a key
players for storing the amount of information generated daily. Process mining has emerged with a series of
techniques to extract useful knowledge from data records stored in an event log. For W. van der Aalst [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
process mining is a relative young research discipline that sits between machine learning and data mining
on the one hand and process modeling and analysis on the other hand. The idea of process mining is to
discover, monitor and improve real processes by extracting knowledge from event logs readily available in
today’s systems.
      </p>
      <p>
        This work shows how mining process allows support the organization Volvo Belgium IT, characterizing
and analyzing the business process for managing incidents and problems. We use the event log for the
“Third International Business Process Intelligence Challenge (BPIC’13)” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The aim of this work is
focused on delivering solutions to a number of questions that have the business owner about the incidents
and problems presented, considering cases related with push to front incident management, Ping-Pong
behavior, waiting user status and process conformance. In addition, we incorporate two additional analyzes
as a recommendation for the process owner for a better understanding of the process and to identify
possible improvements.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Push to Front (PTF)</title>
      <p>This section is directly related to the incident management, where the analysis center of looking at the
behavior of how they are managed and resolved, mainly searching on how they are resolved in line one and
not the second or third.</p>
      <sec id="sec-2-1">
        <title>For this specific analysis several questions need to be answered</title>
        <p>•
•
•</p>
        <p>For what products is the Push to front most used and for which not?
Where in the organization is the PTF mostly used, specifically comparing the organizations Org
Line A2 and Org Line C?</p>
        <p>
          What functions are more in line with the PTF?
To observe the PTF and analysis was made to the original log, and then it was filtered according to the
question that needed to be answered in DISCO [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <sec id="sec-2-1-1">
          <title>2.1 For what products is the Push to front most used and for which not?</title>
          <p>In this case we filter the log the following way:
1. The complete cases were filtered, this correspond to the cases where an incident begins with the
initial status Accepted – In Progress and ends with one of the following final status in completed
(Completed-Closed, Completed-In Call, Completed-Closed y Completed-Cancel). All this end
activities were included so the original log did not reduce so much and the analysis could still be
done.
2. Cases with status Unmatched were eliminated, for a total of 5 cases.
3. To have only the products where there were no second or third line support teams (ST), involved,
the log was filtered with only the cases where the first line was involved.</p>
          <p>As it can be seen in the following figure, with this filter we obtained 52 percent of the cases, only including
the ones without support from the second or third level.
From this log we can see that the following products are the ones with more incidents that are resolved in
level one.
4. In the opposite case, to see the products where the PFT is use the least, the log was filter with
those cases where at least one instance of the second or third level was used to resolve it. The
following products are the one with the least use of PTF.
2.2 Where in the organization is the PTF mostly used, specifically comparing the organizations Org</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Line A2 and Org Line C?</title>
          <p>Having the original log filtered It was seen the way the different organizations handle their incidents, we
classify them in two types, those resolved with PTF (incidents resolved in first line) and those resolved
without PTF (incidents resolved in second or third line). We can see both categories in figure 4 and figure
5.
For this numbers, we analyzed the percentage of the total of cases that correspond to the first or second
type. This way we obtain a correct comparison percentage to compare on both organizations the use of
PTF.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.3 What functions are more in line with the PTF?</title>
          <p>To obtain the answer to this specific question, we did the same procedure explained before to filter the log.
The next figure shows the functions that are most associated with PTF, showing clearly that functions V3_2
and A2_1 uses it the most.</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>2.4 Support teams that use most the PTF</title>
          <p>The support groups that use more PTF are the G96, the S42 and the G97 respectively.</p>
        </sec>
        <sec id="sec-2-1-5">
          <title>2.5 Countries that generate more incidents resolved with PTF</title>
          <p>The countries that user more frequently PTF are Sweden, Poland and Brazil.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Ping Pong Behavior (PP)</title>
      <p>This analysis is related to both the incidents and the problem management. It is the behavior in which the
Support Team sends each other back and forth the incidents in a Ping Pong way. This is a behavior that it is
not wanted and it has a direct relation between this behavior and the total time spent resolving an incident.</p>
      <sec id="sec-3-1">
        <title>For this analysis we want to answer several questions:</title>
        <p>•
•
•
•</p>
        <p>Which are the functions responsible for PP?
Which are the organizations responsible for PP?
Which are the STs responsible for PP?</p>
        <p>Which are the products that are more impacted by PP?
To respond these questions the original log had to be analyzed and make some specific filters. Following
are a series of steps of the filter process to get the original to only have the content so we can answer the
questions.</p>
        <p>1.</p>
        <p>The first step is that when we open the original log in the Disco Tool, the resource column should
correspond to the ST column, instead of the default column.</p>
        <p>The complete cases were filtered, this correspond to the cases where an incident begins with the
initial status Accepted – In Progress and ends with one of the following final status in completed
(Completed-Closed, Completed-In Call, Completed-Closed y Completed-Cancel). All this end
activities were included so the original log did not reduce so much and the analysis could still be
done.</p>
        <p>Cases with status Unmatched were eliminated, for a total of 5 cases.</p>
        <p>The next filter corresponds to only leave the activities Queued – Awaiting Assigment, this one
corresponding to the activity that precedes the traspassing an incident from one ST to another.
After we have these variants where we have this activity we filter the log to eliminate all those
cases where we do not have cycles of PP. For this it was discovered in the log that the PP behavior
it is present in the cases where the following Sequence of activities is located: Accepted-In
Progress, Queued – Awaiting Assigment and then again Accepted-In Progress. The objective of
this filter was to leave these activities in.</p>
        <p>After this we did a manual analysis to make sure that this filter was only leaving us the cases with
this behavior for our final analysis.</p>
        <p>After confirming this, we discarded all the activities besides Accepted-In Progress and Queued –
Awaiting Assigment, because it is in these ones where we see the PP activities.</p>
        <p>
          Analyzing the cases in their performance behavior in DISCO, it was discovered that those where
PP is present were the ones with high duration and not the ones that do not take much time in
completing even if they have some kind of PP activities. From this the log was filtered leaving
only the cases with duration higher tan 33 days and 5 hours (only 6% of the total cases).
This resulting log was exported to Prom 5 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for further analysis though the Handover of Work
metric of the Social Mining algorithm. The results of this analysis were not correct and no
relationships were discovered. The results are shown in the following figure.
10. Because of not being able to discover and see the expected behavior, this log was separated in
several clusters according to the sequence of its activities, this way the analysis could be done to
see if we found cycles between the two activities that will determine PP behavior.
11. For the clustering we used the Sequence Clustering algorithm in Prom 5, using as a parameter, that
the log would be partitioned in 5 different clusters, for individual analysis.
12. For each of the resulting clusters further examination was done to identify correct and real PP
behavior so the Handover of Work (HoW), could be applied.
13. After applying the HoW for each cluster, we analyzed the results based on two threshold values.
        </p>
        <p>The first one with a value of 0.0097 and the second with a value of 0.0194.
14. After applying the two thresholds, we obtain two models per cluster and extracted for each model
the components that had PP cycles. In the following table are the cycles identified for each cluster
on each threshold.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Thresholds 0.0097 0.0195 Cluster 4</title>
        <p>
          15. From the models in Table 2 we extracted the following data:
16. Following this, this log was filter in DISCO to include only the 13 STs participants to verify in
Prom 6 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] with HoW if they really had a high presence of cycles. This resulting log was exported
from DISCO to Prom 6 and the following results were obtained:
As it can be seen in figure 10, a group can be identify that executes the majority of the PP behavior when
resolving incidents and managing problems.
        </p>
        <p>If the last results are compared with the results in table 3 we can confirm that the 6 STs with mayor
execution of PP cycles are the same that were discovered when the threshold was 0.0195, having though
the different tools the same result.</p>
        <p>17. Besides from applying this analysis, it was decided to apply to the log obtained in point 3 (6365
cases), the trace alignment algorithm in Prom 6.2. It was selected that the algorithm should divide
the log in 6 clusters according to the most repeated sequences. After a long execution time we had
6 different trace aligments. Each of the traces was exported so an in more detailed analysis could
be done to verify which cluster included the most amounts of ping pong cycles. Of the 6 clusters,
two presented PP, one very frequent and the other less frequent but still present. To confirm if the
PP was present, some cases were analyzed in the log and effectively some of the STs found in
point 15 were found. Next are some images of the aligments with PP. The activities highlighted in
yellow and in light blue have the PP behavior, for example in figure 12 we can see the case
1638742591, taken from the cluster shown in figure 11.</p>
        <p>Fig. 11. Cluster 4/6: Cluster with cases that have PP</p>
        <p>Fig. 13. Cluster 6/6: Cluster with cases that present PP
With this information the answer to the following questions can be answered.</p>
        <sec id="sec-3-2-1">
          <title>3.1 Which are the functions responsible for PP?</title>
          <p>The function responsible for the PP behavior its function A2_1, with the second level STs V32 2nd and
V37 2nd, that do not possess information to which function they belong to.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2 Which are the organizations responsible for PP?</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>The organizations responsible for PP are:</title>
        <p>! Org line V7n, which includes V32 2nd and V37 2nd
! Org line A2, which includes D4 and D8
! Org line C, which includes D2 and D5</p>
        <sec id="sec-3-3-1">
          <title>3.3 Which are the STs responsible for PP?</title>
          <p>The STs with mayor amount of cases with PP are D8, V37 2nd, D2, D4, D5 and V32 2nd.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.4 Which are the products that are more impacted by PP?</title>
          <p>The products associated to mayor amount of cases with PP are PROD542, PROD236, PROD424,
PROD158, PROD235 and PROD215.</p>
          <p>Note: There could exist more STs executing PP, however from this analysis we obtained which did the
most according to the activity included in the log.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Status Wait User</title>
      <p>Another aspect that it should be analyzed it is the incorrect use of the status Wait User. Initially this status
is used when an incident or a problem is waiting for a user for an action or a response. The negative way
that this status has been used in the organization it is that moving an incident or problem to this status will
reduce the time being in progress for a lot of time, this is because some users may move their incidents to
this status to reduce the work effort associated to it. The work effort for an incident or a problem it’s
calculated for the time it is in the in progress status, and it does not stop until it is changed to another one.
What users do to reduce this time is when they are not able to resolve an incident or problem, they will
move it to the Accepted - Wait User status, stopping the wait effort. Analyzing the log you can see that are
some cases where the user changes the status to Wait User and after some time it returns to Accepted – In
Progress to continue working on it, this is the incorrect way of using it. As an additional note, the correct
use of this status is the one in which for the resolution of the incident or problem, there is some answer or
request expected from a specific user to continue and resolve the issue.</p>
      <sec id="sec-4-1">
        <title>From this point we would like to answer several questions:</title>
        <p>•
•
•
•
•</p>
        <p>Who is using more this sub status?
Which is the behavior per ST?
Which is the behavior per function?
Which is the behavior per organization?</p>
        <p>Which is the location where this status its most incorrectly used?
To respond each of these questions, the DISCO tool was used and the following filters were applied.
1. The complete cases were filtered, this correspond to the cases where an incident begins with the
initial status Accepted – In Progress and ends with one of the following final status in completed
!</p>
        <p>(Completed-Closed, Completed-In Call, Completed-Closed y Completed-Cancel). All this end
activities were included so the original log did not reduce so much and the analysis could still be
done.</p>
        <p>Cases with status Unmatched were eliminated, for a total of 5 cases.</p>
        <p>The cases that do not include this status were filtered, so we would only analyze the specific ones.
After applying this filter we only have 2060 cases.</p>
        <p>The cases were analyzed to determine if the user of this status was in a correct or incorrect way.
With the incorrect use identified, the log was filtered to only leave the incorrect use. It was
discovered that 10% of the total use of the status (55570 cases), was in a wrong way.
After this another filter was applied to only have the users that moved incidents or problems to this
status, reducing it to only 1707 cases.</p>
        <p>Because the system uses this status in a correct way, the user Siebel was eliminated.
Based on this results and final filter, it is that all the questions can be answered. For this final
analysis this status was present 1660 times in 791 cases.
With the data from the last group of filters we proceed to answer the related questions:</p>
        <sec id="sec-4-1-1">
          <title>4.1 Who is using more this sub status?</title>
          <p>Analyzing the data obtained from the filter the next figure shows the users that use incorrectly in more
cases this sub status:</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.2 Which is the behavior per ST?</title>
          <p>Analyzing the data obtained from the filter, the next figure shows the STs that use incorrectly in more cases
this sub status:</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.3 Which is the behavior per function?</title>
          <p>Analyzing the data obtained from the filter, the next figure shows the functions that use incorrectly in more
cases this sub status:</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>4.4 Which is the behavior per organization?</title>
          <p>Analyzing the data obtained from the filter, the next figure shows the organizations that use incorrectly this
sub status:</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>4.5 Which is the location where this status its most incorrectly used?</title>
          <p>Analyzing the data obtained from the filter, the next figure shows the locations where they use incorrectly
in more cases this sub status:</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Process Conformance by Organization</title>
      <p>Overall, Volvo is divided into two major organizations: the Org line A2 and Org line C. Is important
analyze how are aligned both organizations in relation to the incident management process and problem
management.</p>
      <sec id="sec-5-1">
        <title>To analyze this point we performed the following steps:</title>
        <p>Complete cases were filtered. These are those cases in which an incident begins with the initial
status of Accepted - In Progress and ends with the final status Completed-Closed. We excluded
other complete like final case, as we wanted to have only complete cases.</p>
        <p>Traces with Unmatched status were removed.</p>
        <p>This event log generated includes the process performed by all organizations in Volvo, including
organizations Org line A2 and Org line C.</p>
        <p>This log was exported to perform the analysis through ProM 5.2.
5. In ProM 5.2, algorithm Heuristic Mining was selected and the result obtained was the following
model. This is the base model with which to analyze the pattern of the two organizations (Org line
A2 and Org line C).
6. If we analyze the general model and the event log in DISCO tool, we obtain the following aspects:
a. Have 11 activities and 1174 resources.
b. In these 11 activities, the most commonly performed activities are Accepted-In progress
y Queued-Awaiting assignment that represents the 64.09% of all activities. This means
that the second activity is Queued-Awaiting, waiting to be assigned.
c. There are 3 activities that take 6 or more days to be completed. These activities are</p>
        <p>Accepted-waiting user, Accepted-waiting implementation and Accepted-waiting vendor.
7. With the same log obtained in the point 3, we proceed to make a filter considering all activities
that include the cases performed only in the Org line A2. The resulting log was exported to
perform the analysis through ProM 5.2.
8. In ProM 5.2, algorithm Heuristic Mining was selected and the result obtained was the following
model. This model corresponds to the process performed only by organization Org line A2.
9. Analyzing the model corresponding to Org line A2, and the event log in DISCO tool, we obtain
the following aspects:
a. Have 10 activities and 628 resources.
b. In these 10 activities, the most commonly performed activities are Accepted-In progress
y Queued-Awaiting assignment that represents the 67.6% of all activities. This means
that the second activity is Queued-Awaiting, waiting to be assigned.
c. There are 3 activities that take 7.4 or more days to be completed. These activities are
Accepted-waiting user and Accepted-waiting implementation. Accepted-waiting vendor.</p>
        <p>Besides, the activity Accepted- waiting vendor takes on average 19.7 hours.
d. There are 3 resources (apart from system) that are working more. They are Marcin, Olga
y Krzysztof. Between them and Siebel (the system) perform the 18.34% of all activities.
10. With the same log obtained in the point 3, we proceed to make a filter considering all activities
that include the cases performed only in the Org line C. The resulting log was exported to perform
the analysis through ProM 5.2.
11. In ProM 5.2, algorithm Heuristic Mining was selected and the result obtained was the following
model. This model corresponds to the process performed only by organization Org line C.
12. Analyzing the model corresponding to Org line A2, and the event log in DISCO tool, we obtain
the following aspects:
a. Have 11 activities and 841 resources.
b. In these 11 activities, the most commonly performed activities are Accepted-In progress
y Queued-Awaiting assignment that represents the 63.28% of all activities. This means
that the second activity is Queued-Awaiting, waiting to be assigned.
c. There are no activities that take on average more than 6.4 days. The activities with longer
duration are Accepted-wait implementation and Accepted- wait vendor.
d. There are 3 resources (apart from system) that are working more. They are Pawel,
Andreas y Brecht. Between them and Siebel (the system) perform the 12.88% of all
activities.
13. Analyzing the general model with the model obtained corresponding to Org line A2, we obtain the
following aspects:
a. It has one activity less than the original model. This means that Org line A2 not execute
the activity Completed-In Call, they do not resolve user incidents at the phone.
b. The activities that take longer in the original model, take less time than the same
activities in the Org line A2. In general, the activities of this organization take less time
than the general model.
14. Analyzing the general model with the model obtained corresponding to Org line C, we obtain the
following aspects:
a. It has the same amount of activities than the original model.
b. The activities that take longer in the original model spend more time than the same
activities in the Org line C. In general, the activities of this organization take less time
than the activities in the general model.
15. With the information obtained, we create the following summary table:
Analyzing the above table that includes both comparisons, we can deduce that the organization that
complies with the overall process more is the Org line C, performed all activities included in the original
model, and performing them in a slightly more efficient, except the activity Accepted- waiting vendor, that
takes just 7.8 hours in the Org line A2.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Additional Analysis</title>
      <p>In addition to previously answered questions, is interesting to analyze the following:</p>
      <p>Analyze the number of incidents by country and its duration
Analyze in which Org Line handled only incidents are managed, or also, problem administration is
performed too.</p>
      <sec id="sec-6-1">
        <title>For each of these questions, the analysis discussed below.</title>
      </sec>
      <sec id="sec-6-2">
        <title>Analyze the number of incidents by country and its duration</title>
        <p>• Complete cases were filtered. These are those cases in which an incident begins with the initial
status of Accepted - In Progress and ends with the final status Completed-Closed. We excluded
other complete like final case, as we wanted to have only complete cases.
• Traces with Unmatched status were removed.
• To analyze which countries that have more incidents, we analyze the start activity Accepted- In</p>
        <p>Progress.
• We use DISCO tool to analyze the event log filtering. The result shows the three countries that
generate the greatest number of incidents:
• Sweden is the country with the most incidents, 6736 cases
• Poland (pl) is second one with 1266 cases
• India (in), is third with 966 cases
• We use DISCO tool to analyze the event log filtering. The result shows the two countries that
generate the fewer number of incidents:
• Argentina with only 1 case
• Austria with 3 cases</p>
        <p>Below is the listing of all countries and their percentage present in the start activities of incidents
(note that this only serves as an indicator, and later to analyze the countries):
When analyzing the countries and the number of incidents that generate, we can identify why in
each country are generating a certain amount of incidents, and also what improvements can be
included to optimize this. In addition, we can analyze if the countries with the highest number of
incidents sold more number of vehicles or can be attributed to the lack of knowledge or poor
training. Also, this allows have notions of what is the concept of an incident or problem in
different countries.
b.</p>
        <p>What are the products that generate the highest number of incidents?
•
•
•
•</p>
        <p>Complete cases were filtered. These are those cases in which an incident begins with the initial
status of Accepted - In Progress and ends with the final status Completed-Closed. We excluded
other complete like final case, as we wanted to have only complete cases.</p>
        <p>Traces with Unmatched status were removed.</p>
        <p>To analyze which products generate the highest number of incidents, we analyze the start activity
Accepted- In Progress.</p>
        <p>We use DISCO tool to analyze the event log filtering. The result shows the three products that
generate more incident cases:
• Prod424 with 487 cases
• Prod660 with 248 cases
• Prod253 with 180 cases
When analyzing the products and the number of incidents generated, we can identify them, and
also make a more detailed study of what these incidents and see what opportunities for
improvement have.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W. Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer. (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>VINST - User</given-names>
            <surname>Guide</surname>
          </string-name>
          . V 3.13. Volvo Information Technology
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[3] Official Web Page DISCO 1.3</source>
          .5 http://www.fluxicon.com/disco/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Official</given-names>
            <surname>Web</surname>
          </string-name>
          <article-title>Page ProM 5.2 www</article-title>
          .promtools.org/prom5/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[5] Official Web Page ProM 6</source>
          .2: http://www.promtools.org/prom6/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>