<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Quality in Process Mining: A Rule-based Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>R.M.E. van Cruchten</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tilburg University</institution>
          ,
          <addr-line>Warandelaan 2 5037 AB Tilburg</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>“Garbage in - garbage out”, a truism in any data analysis technique. Process mining, a form of data analysis that uses data from event logs, provides some unique data challenges in this respect, including missing events, event granularity, and case heterogeneity. Dealing with these challenges is often regarded as an a priori step and not as an integral part of the process analysis itself. This research proposes a novel and integral approach to data quality in process mining. By investigating existing techniques for data quality rule discovery, a more systematic approach is presented to measure and enhance event data quality. Moreover, a framework for the application of data quality and transformation rules will be investigated to create a more transparent and auditable data preparation approach. Lastly, the extent to which data quality rules can be used to express event data compliance will be investigated.</p>
      </abstract>
      <kwd-group>
        <kwd>Process Mining</kwd>
        <kwd>Data Quality</kwd>
        <kwd>Rule-based approach</kwd>
        <kwd>Data pre-processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Process mining is a relatively new technique that fills the gap between data mining at
the one hand and business process analysis and modelling on the other. It aims to extract
process-related knowledge from event data and enables an organization to discover,
monitor and improve its processes
        <xref ref-type="bibr" rid="ref19">(van der Aalst, 2011)</xref>
        . The high dependency on
information systems in business processes has created a situation in which the digital and
the physical world are tightly connected. This connectivity has made it possible to store
large amounts of data on the activities that are occurring in business processes, i.e.,
event data
        <xref ref-type="bibr" rid="ref19">(van der Aalst, 2011)</xref>
        . However, as with any data analysis technique the
saying “garbage in, garbage out” holds true for process mining. The quality of event
data has been recognized as a major challenge in applying process mining in practice
        <xref ref-type="bibr" rid="ref10 ref16 ref4 ref5 ref5">(IEEE Task Force on Process Mining, 2012; Bose, Mans, &amp; van der Aalst, 2013; Bose,
van der Aalst, Žliobaitė, &amp; Pechenizkiy, 2014; Suriadi, Andrews, ter Hofstede, &amp;
Wynn, 2017)</xref>
        .
        <xref ref-type="bibr" rid="ref4 ref5">Bose et al. (2013)</xref>
        identify four broad categories of event data issues
namely: missing, incorrect, imprecise and irrelevant event data. The IEEE Task Force
on Process Mining (2012) defines event data quality by two dimensions: (1) the level
of abstraction of the events and (2) the accuracy of the timestamp in terms of (i) its
granularity, (ii) directness of registration and (iii) correctness
        <xref ref-type="bibr" rid="ref2">(Andrews, Suriadi, Chun,
&amp; Poppe, 2018)</xref>
        . While the event data quality frameworks as presented by
        <xref ref-type="bibr" rid="ref4 ref5">Bose et al.
(2013)</xref>
        and IEEE Task Force on Process Mining (2012) are useful for classifying and
describing the impact of various data quality issues, they do not provide any guidance
on how to identify or address them. Moreover, methodologies for applying process
mining in practice, such as the Process Mining Project Methodology (PM2) by van Eck,
Lu, Leemans, &amp;
        <xref ref-type="bibr" rid="ref20">van der Aalst (2015</xref>
        ) or the L* life cycle model by IEEE Task Force
on Process Mining (2012), only mention the importance of event data quality but do
not define addressing data quality as an explicit step in their methodologies.
Furthermore, research towards systematically addressing event data quality challenges is
scarce
        <xref ref-type="bibr" rid="ref2">(Andrews et al., 2018)</xref>
        . Traditionally, the database field has made use of integrity
constraints in the form of business rules to enforce data quality. The application of
business rules is no new topic in the field of computer science as well but has yet to
find its way in the field of process mining. This research will therefore address the
following research question: how can event data quality in process mining be
systematically addressed using a rule-based approach. The remainder of this paper is
structured as follows. In section 2 the research approach will be discussed. In section 3 a
rule-based approach will be elaborated on and in section 4 a process mining framework
of the proposed rule-based approach is presented. Section 5 presents the conclusion and
future work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Research Approach</title>
      <p>
        This research will apply a design science approach since it aims to create and evaluate
an artifact intended to solve an identified organizational problem
        <xref ref-type="bibr" rid="ref21">(Wieringa, 2014)</xref>
        ,
namely the need to systematically address event data quality. Two type of artifacts will
be designed:
1. Method(s) to identify and solve event data quality issues
2. A methodology for applying the designed method(s) to address event data quality.
The requirements for the design artifacts are as follows: the designed artifacts are
 Systematic
 Automated
 General applicable, i.e., domain and system agnostic
 Transparent, so they can be easily audited.
      </p>
      <p>Lab experiments and case studies using real-life event data will be used to develop and
validate the designed artifacts. The use of event data from real-life environments to
design process mining artifacts is something that has not been done extensively.
Moreover, using real-life event data will contribute to the applicability of the designed
artifacts in practice.
3</p>
    </sec>
    <sec id="sec-3">
      <title>A Rule-based Approach to Data Quality</title>
      <p>
        The relationship between event data quality and process mining results is obvious.
However, dealing with data quality is often seen as an a priori, laborious activity that
requires a lot of manual effort. Research towards a systematic and generalizable
approach in addressing the identified quality issues is scarce.
        <xref ref-type="bibr" rid="ref16">Suriadi et al. (2017)</xref>
        and
        <xref ref-type="bibr" rid="ref2">Andrews et al. (2018)</xref>
        are two recent approaches towards systematically identifying and
addressing event data quality issues. Both papers however recognize the need for
further research towards systematic approaches. This research will address this research
gap by focusing on how data quality rules can be systematically applied to repair data
to improve data quality.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Rules as a uniform language</title>
        <p>
          Dependency theory is as old as relational databases themselves. Data dependencies,
also called integrity rules or data quality rules, provide a uniform logical framework to
describe and define data quality rules
          <xref ref-type="bibr" rid="ref9">(Fan &amp; Geerts, 2012)</xref>
          . Conditional functional
dependencies (CFDs) are an extension of the traditional functional dependencies (FDs)
that make use of patterns of semantically related constants to create stricter rules aimed
at improving the quality of the data. For example CFD1: ([Country = NL, ZIP =]
[Street]). In this case CFD1 is an extension of the FD, meaning the combination of
Country and ZIP uniquely identify Street, that holds on the subset of records that satisfy
the pattern Country = NL. CFDs allow for rules to be more specifically defined,
creating stricter rules that are able to discover semantic errors. Moreover, these CFDs can
be applied to repair data in a semi-automatic way by discovering dirty records and
suggesting the correct value to a user for inspection before a record update
          <xref ref-type="bibr" rid="ref9">(Fan &amp; Geerts,
2012)</xref>
          .
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Discovering data quality rules</title>
        <p>
          <xref ref-type="bibr" rid="ref7">Chiang &amp; Miller (2008)</xref>
          demonstrated a decade ago that conditional functional
dependencies (CFDs) can also be mined from a dataset and subsequently be used as data
quality rules to measure and improve data quality. While CFDs are much used in practice
for data cleaning, research towards the discovery of CFDs has not been conducted
extensively
          <xref ref-type="bibr" rid="ref14">(Rammelaere &amp; Geerts, 2019)</xref>
          . Defining a set of integrity constraints that
reflect an organization’s business rules and domain semantics is often a very time
consuming effort in which business experts having knowledge of that domain are
extensively consulted. Discovery techniques that can (partially) automate this time
consuming effort are thus of added value. Furthermore, domain specific rules may exist in the
dataset that users are not aware off but can still be useful in enforcing semantic data
consistency
          <xref ref-type="bibr" rid="ref7">(Chiang &amp; Miller, 2008)</xref>
          . Discovery of rules therefore provide a more
unbiased approach in data quality rule definition. However, it must be noted that rule
discovery techniques cannot guarantee to produce a set that is complete since it is not
possible to absolutely determine the complete spectrum of possible data issues
          <xref ref-type="bibr" rid="ref16">(Suriadi
et al., 2017)</xref>
          . Therefore, manual validation and refining of the discovered data quality
rules will still be important. This research will investigate whether tacit process and
domain knowledge can be formalized by mining CFDs, or other forms of integrity rules,
from event data in a semi-automatic way, e.g., by mining a set of rules and validating
them with domain experts.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Multi-level perspective on rules</title>
        <p>
          While traditional integrity rules are focussed on identifying data issues at a record level
(i.e., “intra-record”), event data provides a unique data quality challenges because of
the notion of cases (i.e., a subset of records that define a single process instance). For
example, identifying missing events in a case requires “inter-record” integrity rules.
Such rules could be based on business rules that enforce certain process execution
          <xref ref-type="bibr" rid="ref18">(van
Cruchten &amp; Weigand, 2018)</xref>
          . Thus, it is proposed that integrity rules can be defined at
both the recorded system event and case level. Moreover, if it is required to perform
transformations on the recorded system events such as semantic labeling
          <xref ref-type="bibr" rid="ref1">(Alves de
Medeiros et al., 2007)</xref>
          , aggregating events
          <xref ref-type="bibr" rid="ref13 ref15">(Smirnov, Reijers, &amp; Weske, 2012; Montani,
Leonardi, Striani, Quaglini, &amp; Cavallini, 2017)</xref>
          or mapping events to a different level
of abstraction
          <xref ref-type="bibr" rid="ref17 ref3">(Baier, Mendling, &amp; Weske, 2014; Tax, Haakma, Sidorova, &amp; Aalst,
2016)</xref>
          , integrity rules could also be defined at the transformed event level. Having
intergrity rules at this level enables the measurement of the data quality before and after
the transformation effort. Thus, a multi-level perspective on integrity rules is proposed
in which rules can be defined at a system event, event, case and model level as shown
in Figure 1.
When comparing the event as recorded in an information system to the events in a
process model as defined by the organization, one often faces a difference in level of
abstraction. Reason for this is that a model is created as an abstract of reality while
information systems record events at a detailed and finer level of granularity
          <xref ref-type="bibr" rid="ref3">(Baier et al.,
2014)</xref>
          . This difference in granularity can lead to misinterpretation of process mining
results as the discovered process model is less understandable from a business user
perspective. To bridge the difference in level of granularity the recorded system events
should be transformed to understandable business events, which will result in more
understandable process mining results
          <xref ref-type="bibr" rid="ref11">(Jareevongpiboon &amp; Janecek, 2013)</xref>
          . Moreover,
it is argued that event data transformations should be rule-based so that the applied
domain logic is more transparent and thus auditable, so that the quality and integrity of
the transformed data can be guaranteed
          <xref ref-type="bibr" rid="ref18">(van Cruchten &amp; Weigand, 2018)</xref>
          . Rule-based
data transformation has not been researched extensively
          <xref ref-type="bibr" rid="ref12 ref16 ref18 ref8">(Claes &amp; Poels, 2014;
Leonardi, Striani, Quaglini, Cavallini, &amp; Montani, 2018; van Cruchten &amp; Weigand,
2018; Suriadi et al., 2017;)</xref>
          .
          <xref ref-type="bibr" rid="ref8">Claes &amp; Poels (2014)</xref>
          apply rules in merging
inter-organizational event logs and the notion of rule- and ontology-based transformation is also
proposed by
          <xref ref-type="bibr" rid="ref12">(Leonardi et al., 2018)</xref>
          . Previous work by van Cruchten &amp; Weigand (2018)
has successfully demonstrated rules can be used for both cleaning data as well as
transforming process “unaware” data (i.e., data that is not stored with the intentional goal of
process logging) to a higher level of abstraction.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Revised Process Mining Framework</title>
      <p>
        Storing the discovered and/or defined rules in a repository will facilitate that these rules
can be used in a systematic approach to various event log preparation activities (e.g.,
cleaning, transformation, abstraction) regardless of the type of data analysis to be
performed
        <xref ref-type="bibr" rid="ref16">(Suriadi et al., 2017)</xref>
        . Thus, it is proposed that the well-known process mining
positioning framework by van der Aalst (2011) is to be extended with a rule-repository
as shown in Figure 2.
This rule repository could also serve as a conformance checking mechanism that
expresses the compliance of the process from a data perspective rather than a control-flow
perspective. Or put differently, compliance as a process mining goal should be seen as
a multi-level concept, in which the control-flow perspective is the highest level. The
added value of compliance checking at the event data level is that no data is left out of
the analysis (i.e., data of “bad” quality is also considered in the compliance analysis).
It therefore provides more complete and empirical evidence for compliance, which is
important if one is to apply process mining in for example auditing
        <xref ref-type="bibr" rid="ref6">(Caron, Vanthienen,
&amp; Baesens, 2013)</xref>
        . Figure 2 can thus be regarded as the outline of a framework for the
application of a systematic, rule-based approach to event data quality improvement and
compliance checking. Moreover, the existing PM2 project methodology by van Eck,
Lu, Leemans, &amp;
        <xref ref-type="bibr" rid="ref20">van der Aalst (2015</xref>
        ) will be revised to incorporate more specific data
preparation steps to provide process mining practitioners with more guidance on this
topic.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This research proposes a novel systematic, rule-based approach to address event data
quality. Specifically, qualitative data cleaning techniques will be investigated to see if
data quality rules can be mined from event data and subsequently be applied to measure
and improve event data quality. Furthermore, method(s) to apply rules in data
transformation will be designed, to create a more transparent and auditable data preparation
approach. The application of rules in expressing compliance from an event data
perspective will be investigated as well, along with a framework that defines the activities
to apply such a rule-based approach to data quality in practice.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Alves de Medeiros</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedrinaci</surname>
          </string-name>
          , C.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W. M. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rozinat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , …
          <string-name>
            <surname>Cabral</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>An Outlook on Semantic Business Process</article-title>
          . In OTM Confederated International Conferences “
          <article-title>On the Move to Meaningful Internet Systems”</article-title>
          (pp.
          <fpage>1244</fpage>
          -
          <lpage>1255</lpage>
          ). Retrieved from http://dx.doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -76890-6_
          <fpage>52</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suriadi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chun</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Poppe</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Towards Event Log Querying for Data Quality Let's Start with Detecting Log Imperfections</article-title>
          . In
          <string-name>
            <surname>M. R. Panetto</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debruyne</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Proper</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ardagna</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roman</surname>
            <given-names>D</given-names>
          </string-name>
          . (Ed.),
          <article-title>On the Move to Meaningful Internet Systems</article-title>
          .
          <source>OTM 2018 Conferences. Lecture Notes in Computer Science</source>
          , vol
          <volume>11229</volume>
          (pp.
          <fpage>116</fpage>
          -
          <lpage>134</lpage>
          ). Springer, Cham. https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -02610-3
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Baier</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Bridging abstraction layers in process mining</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>46</volume>
          ,
          <fpage>123</fpage>
          -
          <lpage>139</lpage>
          . https://doi.org/10.1016/j.is.
          <year>2014</year>
          .
          <volume>04</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>R. P. J. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mans</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W. M. P.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Wanna Improve Process Mining Results ? It's High Time We Consider Data Quality Issues Seriously</article-title>
          .
          <source>In IEEE Symposium on Computational Intelligence and Data Mining (CIDM</source>
          <year>2013</year>
          )
          <article-title>(pp</article-title>
          .
          <fpage>127</fpage>
          -
          <lpage>134</lpage>
          ). https://doi.org/10.1109/CIDM.
          <year>2013</year>
          .6597227
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>R. P. J. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Aalst</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Žliobaitė</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pechenizkiy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Dealing With Concept Drifts In Process Mining Using Event Logs</article-title>
          .
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <fpage>154</fpage>
          -
          <lpage>171</lpage>
          . https://doi.org/10.1109/TNNLS.
          <year>2013</year>
          .2278313
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Caron</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanthienen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Baesens</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Comprehensive rule-based compliance checking and risk management with process mining</article-title>
          .
          <source>Decision Support Systems</source>
          ,
          <volume>54</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1357</fpage>
          -
          <lpage>1369</lpage>
          . https://doi.org/10.1016/j.dss.
          <year>2012</year>
          .
          <volume>12</volume>
          .012
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Chiang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Discovering Data Quality Rules</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1166</fpage>
          -
          <lpage>1177</lpage>
          . https://doi.org/10.14778/1453856.1453980
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Claes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Poels</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Merging event logs for process mining: A rule based merging method and rule suggestion algorithm</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>41</volume>
          (
          <issue>16</issue>
          ),
          <fpage>7291</fpage>
          -
          <lpage>7306</lpage>
          . https://doi.org/10.1016/j.eswa.
          <year>2014</year>
          .
          <volume>06</volume>
          .012
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Geerts</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Foundations of Data Quality Management</article-title>
          .
          <source>In Synthesis Lectures on Data Management</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>217</lpage>
          ). Morgan &amp; Claypool Publishers. https://doi.org/10.2200/S00439ED1V01Y201207DTM030
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>IEEE Task Force on Process Mining</source>
          . (
          <year>2012</year>
          ).
          <article-title>Process mining manifesto</article-title>
          .
          <source>Business Process Management Workshop 2011</source>
          (Vol.
          <volume>99</volume>
          ). Springer-Verlag. https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -28108-2_
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Jareevongpiboon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Janecek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Ontological approach to enhance results of business process mining and analysis</article-title>
          .
          <source>Business Process Management Journal</source>
          ,
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <fpage>459</fpage>
          -
          <lpage>476</lpage>
          . https://doi.org/10.1108/14637151311319905
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Leonardi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Striani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaglini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cavallini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Montani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Leveraging semantic labels for multi-level abstraction in medical process mining and trace comparison</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>83</volume>
          ,
          <fpage>10</fpage>
          -
          <lpage>24</lpage>
          . https://doi.org/10.1016/j.jbi.
          <year>2018</year>
          .
          <volume>05</volume>
          .012
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Montani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leonardi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Striani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaglini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Cavallini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Multilevel abstraction for trace comparison and process discovery</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>81</volume>
          ,
          <fpage>398</fpage>
          -
          <lpage>409</lpage>
          . https://doi.org/10.1016/j.eswa.
          <year>2017</year>
          .
          <volume>03</volume>
          .063
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Rammelaere</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Geerts</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Revisiting Conditional Functional Dependency Discovery : Splitting the “C” from the “FD”</article-title>
          . In I. G.
          <string-name>
            <surname>Berlingerio</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonchi</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gärtner</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurley</surname>
            <given-names>N</given-names>
          </string-name>
          . (Ed.),
          <source>Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science</source>
          , vol
          <volume>11052</volume>
          . Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Smirnov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reijers</surname>
            ,
            <given-names>H. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>From fine-grained to abstract process models: A semantic approach</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>37</volume>
          (
          <issue>8</issue>
          ),
          <fpage>784</fpage>
          -
          <lpage>797</lpage>
          . https://doi.org/10.1016/j.is.
          <year>2012</year>
          .
          <volume>05</volume>
          .007
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Suriadi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrews</surname>
          </string-name>
          , R., ter
          <string-name>
            <surname>Hofstede</surname>
            ,
            <given-names>A. H. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wynn</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>64</volume>
          ,
          <fpage>132</fpage>
          -
          <lpage>150</lpage>
          . https://doi.org/10.1016/j.is.
          <year>2016</year>
          .
          <volume>07</volume>
          .011
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Tax</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haakma</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Aalst</surname>
            ,
            <given-names>W. M. P. Van Der.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Event Abstraction for Process Mining using Supervised Learning Techniques</article-title>
          . In B. R. Bi Y.,
          <string-name>
            <surname>Kapoor</surname>
            <given-names>S</given-names>
          </string-name>
          . (Ed.),
          <source>Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016. Lecture Notes in Networks and Systems</source>
          , vol
          <volume>15</volume>
          . (pp.
          <fpage>161</fpage>
          -
          <lpage>170</lpage>
          ). Springer, Cham. https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -56994- 9_
          <fpage>18</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>van Cruchten</surname>
            ,
            <given-names>R. M. E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weigand</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Process Mining in Logistics: The need for rule-based data abstraction</article-title>
          .
          <source>In 2018 12th International Conference on Research Challenges in Information Science (RCIS)</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ). https://doi.org/10.1109/RCIS.
          <year>2018</year>
          .8406653
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Process Mining: Discovery, Conformance and Enhancement of Business Processes</article-title>
          . Berlin Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -19345-3
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>van Eck</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leemans</surname>
            ,
            <given-names>S. J. . J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W. M. P.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>PM2: A process mining project methodology</article-title>
          .
          <source>In International Conference on Advanced Information Systems Engineering</source>
          (pp.
          <fpage>297</fpage>
          -
          <lpage>313</lpage>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -19069-3_
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Wieringa</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Design Science Methodology for Information Systems</article-title>
          and
          <string-name>
            <given-names>Software</given-names>
            <surname>Engineering</surname>
          </string-name>
          . Berlin Heidelberg: Springer-Verlag. https://doi.org/10.1145/1810295.1810446
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>