<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BPI Challenge 2013 - Applied process mining techniques for incident and problem management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Van den Spiegel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leen Dieltjens</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liese Blevi KPMG Advisory</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IT Advisory</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bourgetlaan</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brussels</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Belgium [pvandenspiegel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ldieltjens</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lblevi]@kpmg.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mining</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The incident and problem management process forms an essential part in every organization. Since businesses rely heavily on IT, each outage, issue or user service request should be dealt with as quickly as possible in order to minimize its impact on operations. For Volvo IT Belgium, we analyzed the event log file of an incident and problem management system called VINST, in order to objectively verify the efficiency and effectiveness of the underlying process. Our analysis was performed by means of a of a combination of process mining and data mining techniques and tools, including Disco, ProM, Minitab and MS Excel. The log file itself consisted of 65.533 incident records and 6.660 problem records. As part of the exercise, we investigated aspects, such as total resolution times of tickets, actual resolution process being followed, ping-pong behavior between the different helpdesk lines, differences between distinct support teams etc. Finally, we also made recommendations to improve the current process and increase integration between incident and problem management.</p>
      </abstract>
      <kwd-group>
        <kwd>BPI Challenge</kwd>
        <kwd>Process management</kwd>
        <kwd>Problem management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2 Executive Summary</title>
      <p>Based on our analyses performed, we identified several areas for improvement
within the current incident and problem management processes, in particular with
respect to the overall through-put time (time between registration and closing of
incident), the wait-time and the ping-pong behavior. Although a reasonable
percentage of incidents was solved in first line (i.e. 60%), we noted that the
throughput time (time between registration and closing of incident) appears to be longer than
10 days, in more than 31% of the cases. 12% of all incidents have been open for more
than 20 days.</p>
      <p>As the incident process is supposed to be resolving incidents as quickly as possible
(e.g. via quick-fix or work-around), we believe that the way in which the incident
process is currently being executed, could have a negative on the business operations.
We are of the opinion that through better assigning the correct support team and thus
limiting the number of ping-pong taking place, limiting or closely monitoring the use
of status “waiting..” as well as aligning IT departments A2 and C could significantly
improve the overall incident management process and business support.</p>
      <p>We also recommend linking the incident and problem management process, in
order to evaluate to what extent recurring or critical incidents are properly handled
within the problem management process. We noted 819 open problems. However, as
we did not have sufficient data available to investigate whether a) incidents are
recurring and b)incidents are “resulting” into problems, we were unable to evaluate
the effectiveness of the problem management process.</p>
      <p>Below, we provide an answer on the key questions as asked by the process owner:
- Push to Front</p>
      <p>We noted that the majority (60%) of the incidents have been solved by the first
line. However, the IT department C had a significant higher push-to-front (68,85%)
compared to only 22,8% for organization A2. We also noted that 17% of the incidents
managed in second line, have not been initiated by a first line, but were immediately
handled in second line. The product which was most appearing in the incidents, both
in first line as in second line is Product 424 with respectively 15% of all incidents in
first line and 7,82% of all incidents in second line. Looking at the problem list, we
noticed that only 1,41% of the problems was related to this product. This leads us to
conclude that incidents regarding this product might not be sufficiently picked up
within the problem management process (in order to find a permanent fix).
- Ping Pong Behavior</p>
      <p>Our analyses showed significant evidence of ping pong behavior amongst teams.
For example, we noted that for calls solved within the first line, only 72% have been
solved by the initially appointed team. Overall this percentage was 49%. In 23,5% of
the cases the incident was passed on to another team. For 27%, two or more
movements were involved. On average we noted that 2,25 support teams are involved
per incident.
- Wait User abuse</p>
      <p>Based on our analysis, More than 34% of the total through-put time is caused by
“wait-user”. The “wait-user” is particularly used in the first line and lasts more than 1
week in more than 29% of the cases. This gives us indication that the “wait…” status
might be abused to reduce the actual resolution time. Waiting time in relation to the
total through-put time is higher for C (38,45%) compared to A2 (28,35%).
- Process Conformity per Organization</p>
      <p>Clear differences were noted between the way both IT organizations execute their
incident management processes. Within organization C, the percentage push-to-front
is significantly higher (i.e. 68%) than within organization A (22,8%). We also noted
that although the through-put time for both Organisation A2 and C is similar (median
is respectively 8,27 days and 7,48 days), the variation of through-put time is
significantly higher for A2 (standard deviation 57,46 days) compared to C (standard
deviation of 22,52 days). Based on our analyses we noted that overall predictability of
the incident handling process for organization C is higher than it is for organization
A2. Moreover, given the significant higher push-to-front for C compared to A2, we
are of the opinion that the organization C is performing better than organization A2
(with the exception of the use of waiting time).</p>
    </sec>
    <sec id="sec-2">
      <title>3 Understanding the process</title>
      <sec id="sec-2-1">
        <title>3.1 Mapping statuses on the standard process flow</title>
        <p>On the basis of the general process information provided by Volvo IT Belgium and
the Information Technology Infrastructure Library (ITIL v3) we determined the
standard process flow (Figure 1).</p>
        <p>In our standard process flow, we identified 5 main activities:</p>
        <p>Register incident: logging, categorization and prioritization of the incident
Investigate &amp; Diagnose: research of the issue to determine cause and
remediation options
Request input: request for input from customer, user, vendor, etc. in order to
continue investigation and diagnosis
Resolve incident: implementation of the solution or workaround in order to
restore service
Close incident: verification whether service has restored and closure of the
incident</p>
        <p>To facilitate the understanding of the process and further analysis, we mapped the
13 statuses on the main activities. The mapping is based on the description of the
statuses, as listed in Table 1. We assumed that the status of an incident is changed
upon execution of an activity. For example, if an activity owner completes activity
Register Incident the status of the incident is changed to Accepted/Assigned. We
marked the moments of status change on the process flow in Fig. 1. The same logic is
applied for the other statuses.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Activity</title>
        <sec id="sec-2-2-1">
          <title>Register Incident</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Investigate &amp; Diagnose</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>Request Input</title>
        </sec>
        <sec id="sec-2-2-4">
          <title>Close incident</title>
        </sec>
        <sec id="sec-2-2-5">
          <title>Other</title>
        </sec>
        <sec id="sec-2-2-6">
          <title>Resolve incident</title>
        </sec>
        <sec id="sec-2-2-7">
          <title>Completed/In Call</title>
        </sec>
        <sec id="sec-2-2-8">
          <title>Completed/Resolved</title>
        </sec>
        <sec id="sec-2-2-9">
          <title>A solution is implemented.</title>
        </sec>
        <sec id="sec-2-2-10">
          <title>Completed/Cancelled</title>
        </sec>
        <sec id="sec-2-2-11">
          <title>The incident is assigned and acknowledged by the suggested Support Team (ST).</title>
        </sec>
        <sec id="sec-2-2-12">
          <title>The incident is acknowledged and currently being worked on by the ST.</title>
        </sec>
        <sec id="sec-2-2-13">
          <title>The incident is acknowledged, but input is requested from a third party in order to diagnose the issue.</title>
        </sec>
        <sec id="sec-2-2-14">
          <title>The incident is acknowledged, but input is requested from the User in order to diagnose the issue.</title>
        </sec>
        <sec id="sec-2-2-15">
          <title>The incident is acknowledged, but input is requested from the Customer in order to diagnose the issue.</title>
        </sec>
        <sec id="sec-2-2-16">
          <title>The incident is acknowledged, but input is requested from the Vendor in order to diagnose the issue.</title>
        </sec>
        <sec id="sec-2-2-17">
          <title>The incident is acknowledged, but cannot be solved immediately because of implementation restrictions.</title>
        </sec>
        <sec id="sec-2-2-18">
          <title>A solution is found and implemented during call.</title>
        </sec>
        <sec id="sec-2-2-19">
          <title>The incident is cancelled. No solution needs to be implemented, so the incident can be considered as resolved.</title>
        </sec>
        <sec id="sec-2-2-20">
          <title>The solution is verified and the incident is closed.</title>
        </sec>
        <sec id="sec-2-2-21">
          <title>The incident cannot be solved by the assigned ST and is transferred to another one.</title>
        </sec>
        <sec id="sec-2-2-22">
          <title>The incident could not be matched to existing incidents in the system.</title>
          <p>Remark 1. An incident can be reassigned multiple times, also within the same support
line. This means that the status of an incident can be changed to Queued/Awaiting
Assignment anywhere in the process flow.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>3.2 Main scenarios</title>
        <p>From the standard process flow, we derived 5 main scenarios:</p>
      </sec>
      <sec id="sec-2-4">
        <title>1. Incident is solved in First Line (normal flow)</title>
        <p>Register Incident  Investigate and Diagnose ( Request input)  Resolve Incident
 Close Incident</p>
      </sec>
      <sec id="sec-2-5">
        <title>2. Incident is not closed yet</title>
        <p>Register Incident  Investigate and Diagnose ( Request input)  Resolve Incident</p>
      </sec>
      <sec id="sec-2-6">
        <title>3. Incident is reopened</title>
        <p>Register Incident  Investigate and Diagnose ( Request input)  Resolve Incident
 Close Incident  Investigate and Diagnose</p>
      </sec>
      <sec id="sec-2-7">
        <title>4. Incident is transferred to Second Line</title>
        <p>Register Incident  Investigate and Diagnose ( Request input) 
Queued/Awaiting Assignment  Investigate and Diagnose  Resolve Incident 
Close Incident</p>
      </sec>
      <sec id="sec-2-8">
        <title>5. Incident is transferred to Third Line</title>
        <p>Register Incident  Investigate and Diagnose ( Request input) 
Queued/Awaiting Assignment (x2)  Investigate and Diagnose  Resolve Incident
 Close Incident</p>
        <p>In our analysis we will not focus on scenario 2 and 3. We decided to focus on the
questions that were asked by the process owner, which can be linked to scenarios 1, 4
and 5.</p>
      </sec>
      <sec id="sec-2-9">
        <title>3.3 Link between incidents and problems</title>
        <p>According to leading practices in incident and problem management processes,
there should be a close relationship between incidents and problems. Whereas
incident management primarily focuses on helping the end-user as quickly as
possible, the problem management process investigates the root cause of recurring
incidents to provide a long term solution. Known root causes and their solutions
should be entered in a known-error database (KeDB), allowing the service desk to
reach a higher percentage of first-line fix and higher resolution times.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4 Pre-assessment of the data set</title>
      <p>Before analyzing the process, we performed a pre-assessment of the data set, in
order to increase our understanding of the data received.</p>
      <p>We generated a frequency table based on the creation dates of the tickets (i.e. the
moment a first registration in the system is available) until the closure of a ticket (i.e.
status completed)</p>
      <p>Given the apparent low number of tickets created in the preceding months,
compared to the tickets created in May, we believe that tickets that had been created
and closed before May 2012, have not been taken into account for the download. As a
result, we believe there is a risk that our analysis might not represent the actual
situation in terms of resolution times, involved support teams and comparison of
performance between teams.</p>
    </sec>
    <sec id="sec-4">
      <title>5 Analysis of the process</title>
      <p>In our analysis of the process we focused on the questions formulated by the
process owner.
1
2
3
4</p>
      <p>Push to Front (incidents only) Is there evidence that cases are pushed to the 2nd
and 3rd line too often or too soon?
Ping Pong Behavior How often do cases ping pong between teams and which
teams are more or less involved in ping-pong?
Wait User abuse (incidents only) Is the “wait user” substatus abused to hide
problems with the total resolution time?
Process Conformity per Organisation Where do the two IT organisations differ
and why</p>
      <p>We used a combination of different tools to perform our analysis. We used the
demo version of Disco that was provided by Fluxicon, the open source process
mining tool ProM 5.2, Microsoft Excel and MiniTab.</p>
      <sec id="sec-4-1">
        <title>5.1 Push to Front</title>
        <p>The main objective of incident management is helping the end-user as quickly as
possible. To reach this objective it is important to have a good push to front process.
We evaluated the push to front process by answering the following questions:
1. How many incidents are resolved in First Line, without interference of a</p>
        <p>Second or Third Line Support Team?
2. Where in the organization is the push to front process most implemented,
specifically if we compare the Org line A2 with the Org line C?
3. For what products is the push to front mechanism most used and where not?
4. What functions are most in line with the push to front process?
5.1.1 How many incidents are resolved in First Line, without interference of a</p>
      </sec>
      <sec id="sec-4-2">
        <title>Second or Third Line Support Team?</title>
        <p>To answer this question, we used Disco’s built-in filtering algorithm. First of all
we removed all open incidents by setting an Endpoint filter on Activity. The result of
this filter was a reduction of less than 1% of the total number of cases. In absolute
figures we noted that out of the 7.554 incidents, 7.546 are completed. By setting an
additional Attribute filter on org:group (Involved ST), we were able to exclude all
cases that were completed with interference of a Second and/or Third Line Support
Team. More specifically, as shown in Figure 1, we used filtering mode Forbidden to
remove all cases that have an org:group with 2nd or 3nd in their name.</p>
        <p>By applying the filter algorithm, we identified that 60% of the incidents was closed
in First Line. The variant statistics in Disco show us the following:</p>
        <p>Among the 4.542 incidents closed in First Line there is a total of 942 process
flow variants.</p>
        <p>The most common process flow variant, representing 37,87% of the cases, goes
through the following 3 statuses in sequence: Accepted/In Progress, Accepted/In
Progress and Completed/In Call.</p>
        <p>The most common process flow variant has a mean duration of 38 minutes and 4
seconds.</p>
        <p>We noted that 72% of the incidents is resolved by the initially assigned support team.
33%</p>
        <p>7%
14,0
s12,0
ya10,0
d
fo 8,0
reb 6,0
um4,0
N 2,0
0,0
60%
2,9
2%
18%
0,1
1% 0% 0% 0%
6%
72%</p>
        <p>We compared the mean throughput time of the incidents across the different
support lines. As shown in the Fig. 4 below, the throughput time increases
significantly once the incident is transferred to Second Line.</p>
        <p>First Line Second Line Third Line</p>
        <p>First Line Second Line Third Line
5.1.2 Where in the organization is the push to front process most implemented,
specifically if we compare Org line A2 with Org line C?</p>
        <p>If we compare the metrics, we can conclude that Org line C resolves the most
incidents without interference of Second and/or Third Line. Therefore we can say that
Org line C is most in line with the Push to Front process.
5.1.3 For what product is the push to front mechanism most used and where not?</p>
        <p>In Second Line we see the highest variety in affected products. We noted that in
both First Line and Second Line most incidents are related to product PROD424. In
Third Line, product PROD607 is most affected.</p>
        <p>If we compare the products across the different support lines, we note that
incidents related to product PROD566 are always resolved in First Line. In the tables
below an overview is given of the products that are most affected in First Line and an
overview of those products that are always solved in First Line.</p>
      </sec>
      <sec id="sec-4-3">
        <title>5.1.4 What functions are most in line with the push to front process?</title>
        <p>V3_2
A2_1
E_5
A2_5
E_6
A2_2
E_10
occurrence
3533
498
256
11
2
1
1</p>
        <p>We identified that across the 20 functions, 8 are involved in incidents resolved in
First Line. Based on the figures as shown in the table above, function V3_2 is most in
line with the push to front process. We noted that out of the 4804 incidents handled by
V3_2, 3533 are resolved in First Line.</p>
      </sec>
      <sec id="sec-4-4">
        <title>5.2 Ping Pong Behavior</title>
        <p>Questions:
1.1 What are the support teams that are responsible for most of the ping pong?
1.2 What are the functions that are responsible for most of the ping pong?
1.3 What are the organizations that are responsible for most of the ping pong?
1.4 What products are most affected by it?</p>
      </sec>
      <sec id="sec-4-5">
        <title>5.2.1 INCIDENTS 5.2.1.1What are the support teams that are responsible for most of the ping pong?</title>
        <p>Out of 649 different support teams, we noticed that three teams stand out the
most in effectuating the most statuses: G97, G96, S42. When taking a look at the
top ten, we noticed that almost all of them are part of the first line support. The
average number of statuses where each support team is involved in, is about 100.</p>
        <p>Relative occurrence (%)
11,39
9,15
6,68
2,53
2,53
2,42
2,41
1,98
1,86
1,76</p>
        <p>We considered these ten support teams in our analysis in ProM, using the social
network miner, and we saw that some of these support teams hand over the work to
each other.</p>
        <p>G96 and G97 more often exchange the work, in comparison to the others. And
G96 is very popular for receiving work from others.</p>
        <p>When we applied this mining technique on all support teams, we discovered the
following regarding the handover of work: (threshold = 0)</p>
        <p>G96 and G97 are the most popular support teams in receiving work from other
teams. The only circular relationship that was found is the one between G96 and
G97.
5.2.1.2</p>
      </sec>
      <sec id="sec-4-6">
        <title>What are the functions that are responsible for most of the ping pong?</title>
        <p>In considering the function division of the support team we noticed that this is
not always filled out. We filtered out these blank function division lines and only
selected those service requests where G96 and G97 are involved in. Obviously,
function division V3_2 is responsible for most of the Ping Pong. Also note that
multiple support team function divisions are active within one service request.
5.2.1.3
pong?</p>
      </sec>
      <sec id="sec-4-7">
        <title>What are the organizations that are responsible for most of the ping</title>
        <p>When selecting only those service requests with G96 and/or G97 involvement,
we can state that Organization Line C is affected the most by the ping pong
behavior.</p>
      </sec>
      <sec id="sec-4-8">
        <title>What products are most affected by it?</title>
        <p>For the products that are affected the most with the Ping Pong behavior, we
selected only those service requests with G96 and/or G97 involvement;</p>
        <p>Product ‘PROD424’ is obviously the most affected.
We noted that there are 187 different support teams that contribute to the open
problems. The three teams that issue the most statuses are G42 3rd, S33 2nd and
G88 2nd. In the top ten of this list are also no first line support teams. The
average number of statuses where each support team is involved in, is about 12,
and the median is 4.</p>
        <p>If we use the social network miner in ProM (threshold = 0), we noticed that
there is a handover of work from the third line support team to the second line
support team: from G273 3rd to G88 2nd.</p>
        <p>When we applied this mining technique on all the support teams, we
discovered the following handover of work; (threshold = 0)</p>
        <p>Fig. 8. Handover of work.</p>
        <p>G88 is very popular for receiving work from other support teams. G273 3rd
tends to hand over its work instead of receiving work from other teams. No
circular relationship was found.
5.2.2.2</p>
      </sec>
      <sec id="sec-4-9">
        <title>What are the functions that are responsible for most of the ping pong?</title>
        <p>With the help of an SQL query we extracted the problem requests with
involvement of the support teams that were involved in the handover of work.
Support team function division E_4 is then responsible for most of the ping pong.</p>
      </sec>
      <sec id="sec-4-10">
        <title>What are the organizations that are responsible for most of the ping</title>
        <p>Using the output of the same SQL query we can count which Organization
Lines have been responsible for most of the Ping pong behavior: Org line C.</p>
        <p>Based on the SQL query previously run, we discovered the products that are
affected the most with the Ping Pong behavior: PROD802.</p>
      </sec>
      <sec id="sec-4-11">
        <title>What are the support teams that are responsible for most of the ping</title>
        <p>There are 130 different support teams that have worked on the closed
problems. The three teams that were involved in the most status changes are
G199 3rd, S33 2nd and G21 2nd. The third line support is involved in the
majority of the closed problems. When we look at the top ten of the teams
involved in closed problems, we notice that mostly second and third line support
are involved.
G42 3rd
G51 2nd
M3 2nd
2,08
1,40
1,32</p>
        <p>If we use the social network miner in ProM (threshold = 0) for these 10
support teams, we noticed that there is a handover of work from the second line
support team to the third line support team: from G21 2nd to G199 3rd.</p>
        <p>After applying the social network miner on the whole population of support
teams, we discovereed the following handover of work; (threshold = 0)</p>
        <p>S21 2nd and G97 are the most popular support teams to handover the work to.</p>
        <p>No circular relationship was found.
5.2.3.2</p>
      </sec>
      <sec id="sec-4-12">
        <title>What are the functions that are responsible for most of the ping pong?</title>
        <p>By using an SQL query we selected only those problem requests that have an
involvement in the handover of work: S21 2nd, S30 2nd, N22 2nd G294 2nd,
G290 3rd, G270 3rd, G21 2nd, G199 3rd, G152 3rd, G260 2nd, G130 3rd, G181
2nd, G97.</p>
        <p>We noted that for more than 2000 status changes the support team function
division was not filled out.</p>
        <p>Support team function division C_6 is responsible for most of the ping pong.</p>
      </sec>
      <sec id="sec-4-13">
        <title>What are the organizations that are responsible for most of the ping</title>
        <p>Using the output of the same SQL query we can count which Organization
Lines have been responsible for most of the Ping pong behavior: Org line C.
Product</p>
        <sec id="sec-4-13-1">
          <title>PROD97</title>
        </sec>
        <sec id="sec-4-13-2">
          <title>PROD98</title>
        </sec>
        <sec id="sec-4-13-3">
          <title>PROD802</title>
          <p>PROD96</p>
        </sec>
        <sec id="sec-4-13-4">
          <title>PROD374</title>
        </sec>
        <sec id="sec-4-13-5">
          <title>PROD412</title>
          <p>PROD793</p>
        </sec>
        <sec id="sec-4-13-6">
          <title>PROD597</title>
        </sec>
        <sec id="sec-4-13-7">
          <title>PROD660 PROD236</title>
          <p>5.2.3.4</p>
        </sec>
      </sec>
      <sec id="sec-4-14">
        <title>What products are most affected by it?</title>
        <p>Based on the SQL query previously run, we discovered the products that are
affected the most with the Ping Pong behavior: PROD97.
We understand that Volvo IT Belgium applies a lot of KPI’s related to the resolution
time of incidents. The use of sub status “Wait User” may have a significant impact on
those KPI’s. In order to provide insight in the “Wait User” usage across the
organization, we provided an answer to the following questions:
1.
2.
3.
4.
5.
6.</p>
        <p>Who is making most use of the sub status “Wait-User” (action owner)?
What is the behavior per support team?
What is the behavior per function?
What is the behavior per organization?
Is there any (mis)-usage per location?</p>
        <p>What is the average duration an incident is in status “Wait – User”?
5.3.1</p>
      </sec>
      <sec id="sec-4-15">
        <title>Who is making most use of the sub status “Wait-User”?</title>
        <p>General metrics
Number of records with sub status “Wait – User”
Number of owners
Number of owners who made use of the sub status “Wait – User”</p>
        <p>We listed all owners with the number of times the owners used the sub status
“Wait – User” versus the number of times the owners were involved in any incident.
This analysis was performed to identify the “action owner – wait user ratio”, i.e. what
is the usage percentage of status “Wait – User” by an action owner in comparison
with the total number of times the owner was involved in an incident.</p>
        <p>We noted 143 action owners who used the sub status “Wait – User” in 20% or
more of all their actions, whereby 1 action owner “Sreeraghu” used the sub status
“Wait – User” in 100% of all his actions. However, this owner was only involved in 1
incident.</p>
        <p>To eliminate this type of users, we excluded all action owners who performed less
than 50 actions in order to identify those action owners who are a lot involved in
incidents.</p>
        <p>We noted 15 action owners, which were involved in 50 actions or more, who used
the sub status “Wait – User” in 20% or more of all their actions. (See appendix Wait
user behavior – 1)</p>
      </sec>
      <sec id="sec-4-16">
        <title>5.3.2 What is the behavior per support team?</title>
        <p>Number of records with sub status “Wait – User”
Number of support teams
Number of support teams who made use of the sub status “Wait – User”
We listed all support teams with the number of times the support teams used the
sub status “Wait – User” versus the number of times the support teams were involved
in any incident. This analysis was performed to identify the “support team – wait user
ratio”, i.e. what is the usage percentage of status “Wait – User” by a support team in
comparison with the total number of times the support team was involved in an
incident. Both databases were linked based on the common data field “support team”.</p>
        <p>We noted 30 support teams who used the sub status “Wait – User” in 20% or more
of all their actions, whereby “G32 2nd” used it the most (i.e. 9 times Wait – User sub
status used with a total of 26 actions). To identify the support teams who used the sub
status Wait – User significantly more, we excluded all support teams who performed
less than 50 actions.</p>
        <p>We noted 3 support teams, which were involved in 50 actions or more, who used
the sub status “Wait – User” in 20% or more of all their actions: N45, L50 3rd and
G73. (See appendix Wait user behavior – 2)</p>
      </sec>
      <sec id="sec-4-17">
        <title>5.3.3 What is the behavior per function?</title>
        <p>Number of records with sub status “Wait – User”
Number of functions
Number of functions who made use of the sub status “Wait – User”</p>
        <p>We listed all functions with the number of times a function used the sub status
“Wait – User” versus the number of times the function was involved in any incident.
This analysis was performed to identify the “function – wait user ratio”, i.e. what is
the usage percentage of status “Wait – User” by a function in comparison with the
total number of times the function was involved in an incident. Both databases were
linked based on the common data field “function”.</p>
        <p>We noted 11 functions who used the sub status “Wait – User” in 5% or more
of all their actions, whereby “D_1” used it the most (in %: 10,82% - i.e. 161
times Wait – User sub status used with a total of 1.488 actions). (See appendix
Wait user behavior – 3)</p>
      </sec>
      <sec id="sec-4-18">
        <title>5.3.4 What is the behavior per organisation?</title>
        <p>Number of records with sub status “Wait – User”
Number of organizations
Number of organizations who made use of the sub status “Wait – User”
We listed all organizations with the number of times the organizations used the sub
status “Wait – User” versus the number of times the organizations were involved in
any incident. This analysis was performed to identify the “organization – wait user
ratio”, i.e. what is the usage percentage of status “Wait – User” by an organization in
comparison with the total number of times the organization was involved in an
incident. Both databases were linked based on the common data field “organization”.</p>
        <p>We noted 10 organizations that used the sub status “Wait – User” in 5% or
more of all their actions, whereby “Org line I” used it the most (in %): 20%. Note
however that Org line I only used the Wait – User sub status for 2 times (i.e. 20%
out of a total of 10 actions). If we compare Org line A2 with Org line C, we note
that the “organization – wait user ratio” is similar (i.e. 6,77% vs. 6,60%). (See
appendix Wait user behavior – 4)</p>
      </sec>
      <sec id="sec-4-19">
        <title>5.3.5 What is the behavior per organization?</title>
        <p>General metrics
Number of records with sub status “Wait – User”
Number of locations
Number of locations who made use of the sub status “Wait – User”
of</p>
        <p>We listed all locations with the number of times the location used the sub status
“Wait – User” versus the number of times the location was involved in any incident.
This analysis was performed to identify the “location – wait user ratio”, i.e. what is
the usage percentage of status “Wait – User” by a location in comparison with the
total number of times the location was involved in an incident. Both databases were
linked based on the common data field “location”.</p>
        <p>We noted 9 locations who used the sub status “Wait – User” in 5% or more of
all their actions, whereby Germany used it the most (in %): 14,55% (i.e. 8 times
Wait – User sub status used with a total of 55 actions).</p>
        <p>We noted 9 locations who used the sub status “Wait – User” in 5% or more of
all their actions, whereby Germany used it the most (in %): 14,55% (i.e. 8 times
Wait – User sub status used with a total of 55 actions). (See appendix Wait user
behavior – 5)</p>
      </sec>
      <sec id="sec-4-20">
        <title>5.3.6 What is the average duration an incident is in status ‘Wait-User’?</title>
        <p>In order to calculate how long an incident remains in the Wait-User substatus we
eliminated all service requests that were still stuck in this status. When looking at the
top ten of incidents that are have the longest Wait User time, we noticed that these
have been in this status for more than 100 days. There even is an incident with more
than 1 year of Wait-User time.</p>
        <p>The average duration is 7,74 days. Furthermore, in 29,72% of the cases an incident
is more than 1 week in this status.
'Total Wait-User duration per SR'
1-523391859
1-580987781
1-565045794
1-605313141
1-559795575
1-613379581
1-606902814
1-626766981
1-603556351
210,82
205,63
199,66
167,97
164,26
163,00
155,23
128,14
113,24</p>
        <p>Moreover, we calculated the correlation between the the wait-user time and the
throughput time of incidents: 0,54. So the waiting time has a strong impact on the
total duration of the incident.</p>
        <p>When we considered the total user waiting time in function of the total duration of
all incidents together, we saw that the waiting user time contributes to the total
duration time for 38,42%.</p>
        <p>We also took a look at the portion of the waiting time in the throughput time, for
Organizations A2 and C separately. For organization A2, 28,35% of their throughput
time is user waiting time. For organization C it is 38,45%. (See appendix Wait user
behavior – 5)</p>
      </sec>
      <sec id="sec-4-21">
        <title>5.4 Process Conformity per Organization</title>
        <p>First we entered the complete incident log in ProM, with our own activities. This
resulted in the process flow as shown in Fig. 11. The process map shows us that the
standard process flow is well followed, however a lot of transfers between support
teams take place. We also see that the in a high number of cases input is requested,
which points at the “Wait User” usage. It is remarkable that Register Incident is not
the main starting point.
If we compare the process flows followed in Org line A2 and Org line C, as shown in
Fig. respectively Fig. ,we see a similar sequence. This shows that the process is
consistently followed across the different organisations. However, from our analysis
of the Push to Front process we can derive that in Org line A2 the status
Queued/Awaiting Assignment is significantly more linked to a transfer to Second or
Third Line.</p>
        <p>Fig. 12 Process map of the incident management process followed in Org line A2
As part of the verification on whether the organisation A2 and C are operating in a
similar way, we performed an comparison between both organisation on the
throughput time of incidents. As can be noted, 35% of all incidents for organisation C are
closed within a day. For A2 this is only 8%.
% of
Total
closed
29%
4%
4%
33%
19%
9%
2%
1%
0%
0%
100%
8%
14%
19%
59%
80%
92%
95%
98%
100%
100%
100%</p>
        <p>Cumul %
for C
Using Minitab, we confirmed that the distribution of through-put time is not normally
distributed and applying the test of equal variances (i.e. Levene test) we rejected the
null hypothesis of equal variances (P &lt; 0,05) and concluded that there is a difference
between the variances in the population.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Other interesting findings</title>
      <sec id="sec-5-1">
        <title>When is the helpdesk called the most?</title>
        <p>When we consider all the data over the 3 years, we can see that in general the most
incidents are reported on Wednesday, Thursday and Friday.</p>
        <p>Total number of incidents</p>
        <p>reported</p>
        <sec id="sec-5-1-1">
          <title>Fig.15. Reported incidents.</title>
          <p>6.2</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>How long does it take to solve an incident?</title>
        <p>We have calculated the time spans of the incidents.</p>
        <p>The average duration is 12 days, and falls thus in the category of &lt; 15 days. Most
of the incidents are solved within the week. 64 incidents are remarkably solved within
one minute.</p>
      </sec>
      <sec id="sec-5-3">
        <title>What is the most common impact of the incidents?</title>
        <p>The majority of the incidents have a low or medium impact.
64
1480
611
2229
1750
629
310
274
88
47
16
11
36
7
1
3245
4045
260
3</p>
        <p>The incidents with a major impact have only lasted 1 or 2 weeks. The low and high
impact incidents are mostly solved within two weeks, most the medium and high
impact incidents within one week. There is however one incident that has taken
longer than 2 years.</p>
        <p>Number of incidents
Number of incidents</p>
        <p>Medium
30
636
296
1490
868
293
140
148
51
32
14
9
31
6
1
4045</p>
        <p>High</p>
        <p>Major
4
16
100
79
23
12
12
8
4
2
260
1
2</p>
        <p>Grand Total</p>
        <p>64
1480</p>
        <p>611
2229
1750
629
310
274
88
47
16
11
36
7
1
7553
1. IT Governance institute: Cobit 4.1 framework, control objectives, management
guidelines, maturity models (2007)
2. BSI: BS ISO/IEC 20000-1:2005 Information technology – service management
3. BSI: BS ISO/IEC 20000-2:2005 Information technology – service management.
4. Oklahoma government, OSF service support – incident management process,
http://www.ok.gov/cio/documents/ServiceRequestProcessOverview.doc.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Appendix: Wait user behavior</title>
      <p>1</p>
      <sec id="sec-6-1">
        <title>Action owner – wait user ratio</title>
        <p>OWNER_FIRST_
NAME
Amer
MUDIT
Mohammad
Anjali
Sharath
Patryk
Muthu
Aneesh V
Prashant
Meishan
25
21
27
31
14
26
80
21
27
12
74
66
86
122</p>
        <p>58
108
355</p>
        <p>97
126</p>
        <p>57</p>
        <p>NO_WAIT_USER NO_INCIDENTS %_WAIT_USER_VS_INCIDENTS
2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Support team – wait user ratio</title>
        <p>INVOLVED_ST NO_WAIT_USER NO_INCIDENTS %_WAIT_USER_VS_INCIDENTS
N45 19 88 21,59
L50 3rd 24 113 21,24
G73 11 55 20,00
G356 2nd 19 105 18,10
W4 11 62 17,74
V17 3rd 42 247 17,00
S24 20 118 16,95
G92 191 1154 16,55
N20 10 62 16,13
G297 15 94 15,96
33,78
31,82
31,40
25,41
24,14
24,07
22,54
21,65
21,43
21,05</p>
      </sec>
      <sec id="sec-6-3">
        <title>Function – wait user ratio</title>
        <p>INVOLVED_ST
FUNCTION_DIV
D_1
V3_3
E_5
E_10
A2_1
D_2
A2_2
V3_2
E_4</p>
        <p>E_8
4</p>
      </sec>
      <sec id="sec-6-4">
        <title>Organization – wait user ratio</title>
        <p>NO_
RECORDS
INVOLVED_
ORG_LINE_3
NO_WAIT
_USER
5
COUNTRY</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>