<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Process Deviation Classification (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Manal Laghmouch Faculty of Business Economics UHasselt - Hasselt University Hasselt</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>II. METHODOLOGY</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Process mining is a family of process analysis techniques
that enables the discovery of process models from an event log
(process discovery), the ability to check conformance between
the actual and the assumed process (conformance checking),
and other process-related analyses to enhance the process
(process enhancement) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The process mining types process
discovery and conformance checking are of primary interest in
the context of auditing. While process discovery can be used
to get an overall and objective understanding of the business
processes of a company, conformance checking enables the
automatic detection of process deviations between the actual
and assumed process [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Although conformance checking has great potential in
detecting process deviations, challenges exist that hinder it from
full adoption in auditing practice. First, because a normative
process model is an idealistic and simplified representation
of a process, performing a conformance check results in a
large set of process deviations. This makes it impossible for
the auditor to check each deviation one-by-one and therefore
forces the auditor to take samples instead of auditing the
complete set of transactions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Second, the large amount
of detected process deviations consists not only of deviations
that are harmful from an auditing perspective (anomalies),
but also of exceptional, but acceptable, behaviour (exceptions)
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]–[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Since the auditor is only interested in anomalies,
the exceptions need to be filtered out. Manually classifying
process deviations is a time consuming and costly task, making
the use of conformance checking unpractical.
      </p>
      <p>In this doctoral research, we aim to cope with the challenges
associated with conformance checking in auditing by focusing
on the classification of process deviations. For this purpose, we
develop an active learning framework that combines machine
learning techniques with domain expertise. The framework
provides auditors with a practical approach to audit the
complete set of transactions instead of a sample. The idea is to
use conformance checking output (deviations) and combine
this with domain expertise in an iterative process.</p>
      <p>This extended abstract is structured as follows. In Section
II, we describe our research methodology and walk through
the different steps of our framework. An overview of related
work is given in Section III. Finally, we conclude our research
in Section IV.</p>
      <p>The doctoral thesis will consist of designing and testing a
deviation classification framework. In this section, we present
the current state of the framework and provide some
information on how we will validate our study.</p>
    </sec>
    <sec id="sec-2">
      <title>A. The framework</title>
      <p>In our specific context, we are researching how we can
make conformance checking feasible in auditing practice. To
this end, we propose an active learning framework for the
classification of process deviations. The proposed framework
consists of the six following steps:</p>
      <p>1) Label a small sample of the unlabeled set of deviating
cases: The framework’s goal is to label a set of deviating cases
that were detected using conformance checking. The possible
output labels for each deviating case are either anomaly or
exception. An anomaly is a case that consists of at least
one anomalous process deviation. An exception is a case that
consists of only exceptional process deviations.</p>
      <p>The framework starts with taking a small sample of a set
of deviating cases that an auditor has to label as anomaly
or exception. The labelled small sample is the input for the
second step of the framework.</p>
      <p>
        2) Mine rules from the labelled set: In the second step,
we use the small labelled sample to mine rules. We want to
discover two types of rules: control flow relations and other
data relations. Control flow relations refer to relations between
the order of the activities in a case. To discover such relations,
we are planning to use a declarative constraint miner [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Other
data relations can be discovered by using, for example, rule
association mining [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Further research has to identify which
algorithms best suit our context.
      </p>
      <p>
        3) Transform rules to labeling functions: The mined rules
from step 2 are in this step transformed into so-called labeling
functions. labeling functions are a construction of rules used
in the tool Snorkel. Snorkel is a system that implements
weak supervision to train machine learning models without
the need for labelled data. Instead, Snorkel uses labeling
functions that encode domain knowledge in the form of rules
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. labeling functions individually label deviating cases as
anomalous or exceptional. This has as a consequence that
they can overlap or conflict with each other. For that reason,
labeling functions are combined in a generative model to
predict a final (unambiguous) label [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We choose to use
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
      </p>
      <p>Snorkel because it can indirectly use domain knowledge to
guide the labeling of individual cases.</p>
      <p>
        4) Combine labeling functions in a machine learning model
and label the remaining unlabeled cases: In this step, we
combine the labeling functions of Snorkel in a machine
learning model. Snorkel has a built-in generative model, called the
LabelModel. The LabelModel weights and calculates labeling
function accuracy to assign a final weighted label to each case
in the data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The result of this step is a set of cases that
are labelled as anomaly or exception.
      </p>
      <p>5) Calculate the accuracy: In the last step of the
framework, we calculate the model accuracy by using a labelled
validation set. If the accuracy is below a threshold (that is
beforehand defined by the auditor), then we re-iterate over the
framework by taking an additional sample of the unlabeled
data. If the accuracy is above the predefined threshold, then
the framework stops here, and the auditor ends up with the
set of labelled cases. The auditor can now focus the audit on
anomalous cases.</p>
    </sec>
    <sec id="sec-3">
      <title>B. Validation</title>
      <p>The results of this doctoral study will be validated in two
steps. First, we artificially create event logs and process models
on which we apply our framework. Our approach will be
validated based on the experimental results. We plan to look
at the model accuracy, the impact of high or low-quality rules
on model performance, the impact of the number of rules
on model performance, and the number of iterations over the
framework.</p>
      <p>Second, after succeeding in the artificial setup, we apply
the framework in a real-life setting. For this purpose, we are
working with a Big Four auditing firm. In this stage, we will
again look at model performance, the impact of the quality
and quantity of given rules, and the number of iterations over
the framework.</p>
      <sec id="sec-3-1">
        <title>III. RELATED WORK</title>
        <p>
          Since data is widely available and technology is becoming
better, auditing is forced to change [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The continuous
auditing field of research response to such changes by automating
auditing procedures [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. A family of techniques that has
got increased attention lately is process mining. With only an
event log as input, process mining can provide auditors with
objective views on the process [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. For in-depth analyses,
the type conformance checking is relevant. However, it detects
an alarm flood of deviations between the assumed and real
process of a company that is impossible to check manually.
Consequently, conformance checking does not support a
fullpopulation audit yet [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          Although previous research proposes frameworks to cope
with alarm floods in continuous auditing [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], a full
practical elaboration misses. This research proposes a practical
deviation classification framework that enables the automatic
labeling of process deviations in auditing.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>IV. CONCLUSION</title>
        <p>In this doctoral research, we want to enable full population
tests in auditing by
making conformance checking
more
feasible in practice. We propose an active learning framework
that combines conformance checking output with domain
knowledge with the goal to label deviating cases as anomaly
or exception. The research follows a two-step approach. First,
we set up some experiments on artificially generated event
logs and process models. After that, we test the framework on
a real audit environment of a Big Four auditing firm.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. Van der Aalst</surname>
          </string-name>
          ,
          <source>Process mining: Data Science in Action</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alles</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vasarhelyi</surname>
          </string-name>
          , “
          <article-title>The case for process mining in auditing: Sources of value added and areas of application</article-title>
          ,”
          <source>International Journal of Accounting Information Systems</source>
          , vol.
          <volume>14</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          and W. van der Aalst, “
          <article-title>Trace alignment in process mining: opportunities for process diagnostics</article-title>
          ,” in International Conference on Business Process Management. Springer,
          <year>2010</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Alles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kogan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Vasarhelyi</surname>
          </string-name>
          , “
          <article-title>Putting continuous auditing theory into practice: Lessons from two pilot implementations</article-title>
          ,
          <source>” Journal of Information Systems</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>214</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jans</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hosseinpour</surname>
          </string-name>
          , “
          <article-title>How active learning and process mining can act as Continuous Auditing catalyst</article-title>
          ,”
          <source>International Journal of Accounting Information Systems</source>
          , vol.
          <volume>32</volume>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>58</lpage>
          , Mar.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Swinnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jans</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Vanhoof</surname>
          </string-name>
          , “
          <article-title>A process deviation analysis framework</article-title>
          ,” in International Conference on Business Process Management. Springer,
          <year>2012</year>
          , pp.
          <fpage>701</fpage>
          -
          <lpage>706</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Vasarhelyi</surname>
          </string-name>
          , “
          <article-title>Innovation and practice of continuous auditing</article-title>
          ,”
          <source>International Journal of Accounting Information Systems</source>
          , vol.
          <volume>12</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>160</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Vasarhelyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Alles</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kogan</surname>
          </string-name>
          , “
          <article-title>Principles of analytic monitoring for continuous assurance,” Journal of emerging technologies in accounting</article-title>
          , vol.
          <volume>1</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pesic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schonenberg</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. Van der Aalst</surname>
          </string-name>
          , “Declare:
          <article-title>Full support for loosely-structured processes,” in 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007)</article-title>
          . IEEE,
          <year>2007</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <surname>S. Zhang,</surname>
          </string-name>
          <article-title>Association rule mining: models and algorithms</article-title>
          . Springer,
          <year>2003</year>
          , vol.
          <volume>2307</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ratner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ehrenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          , and C. Re´, “
          <article-title>Snorkel: rapid training data creation with weak supervision,” The VLDB Journal</article-title>
          , vol.
          <volume>29</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>709</fpage>
          -
          <lpage>730</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Ratner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Ehrenberg</surname>
          </string-name>
          , and C. Re´, “Snorkel:
          <article-title>Fast training set generation for information extraction</article-title>
          ,”
          <source>in Proceedings of the 2017 ACM international conference on management of data</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1683</fpage>
          -
          <lpage>1686</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Vasarhelyi</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. B.</given-names>
            <surname>Halper</surname>
          </string-name>
          , “
          <article-title>The continuous audit of online systems</article-title>
          ,” in Auditing:
          <source>A Journal of Practice and Theory</source>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alles</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vasarhelyi</surname>
          </string-name>
          , “
          <article-title>The case for process mining in auditing: Sources of value added and areas of application</article-title>
          ,”
          <source>International Journal of Accounting Information Systems</source>
          , vol.
          <volume>14</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Perols</surname>
          </string-name>
          and U. S. Murthy, “Information fusion in continuous assurance,
          <source>” Journal of Information Systems</source>
          , vol.
          <volume>26</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kogan</surname>
          </string-name>
          , “
          <article-title>Exception prioritization in the continuous auditing environment: A framework and experimental evaluation</article-title>
          ,
          <source>” Journal of Information Systems</source>
          , vol.
          <volume>30</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>157</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>