<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Comprehensive Methodology for Process Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Process</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasso Plattner Institute, University of Potsdam</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>Process mining exploits data recorded in information systems of organizations to unleash insight and knowledge into their operational processes. As process mining techniques are reaching maturity, their applications are becoming more widespread across various domains. Therefore, more research on the methodological and practical perspective is required to steer and guide these applications to successful result. This position paper sketches the rst steps to be taken toward a standard methodology for process mining.</p>
      </abstract>
      <kwd-group>
        <kwd>Kiarash Diba</kwd>
        <kwd>Process Mining</kwd>
        <kwd>Mining Reference Model</kwd>
        <kwd>Process Mining Methodology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Process Mining has evolved into a well-known technology to provide valuable
insight into the underlying processes and work ows of organizations. During the
recent years, many process mining techniques and algorithms have been
developed and are reaching maturity and their applications have been investigated
and proved valuable across variety of domains. Despite this level of maturity
in techniques and algorithms, the broader process mining discipline has not yet
matured. Although process mining projects involve various steps and activities
from extracting and preparing required data to providing useful knowledge and
insight, the entire spectrum of activities have not been thoroughly investigated.
Instead most of the academic focus has been concentrated on the development
and improvement of techniques and algorithms. Besides, most case studies and
projects have been carried out in an unstructured and ad-hoc manner involving
a great amount of manual and time intensive work and there is little guidance
on conducting such projects successfully in both industrial and academic
settings. Therefore, this work focuses on methodological aspects of process mining
rather than on speci c methods, in order to provide well-de ned foundations for
process mining. This not only helps practitioners and academics in conducting
successful process mining projects, but positions process mining techniques into
a broader spectrum of process-related knowledge discovery and sheds more light
on steps that have received less research attention. Thus, inspired by related
works in the eld of data mining and practical experience, this work takes the
initial steps towards a comprehensive standard methodology for process mining.</p>
      <p>The remainder of this paper is structured as follows. The next section
discusses related works and their limitations. Section 3 outlines the approach to be
followed and steps to be taken for establishment of the comprehensive
methodology, followed by a high-level overview of the initial developments of the
methodology in section 4. Finally, section 5 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Methodology provides the theoretical foundation for understanding which
methods, set of methods, or best practices can be applied to a speci c case, which is
employed for the design, planning, implementation and achievement of project
objectives [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Currently there are few works focusing on methodologies for
process mining namely the L* lifecycle model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Process Diagnostic Method (PDM)
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], its extension in healthcare domain [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and PM2 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, these
methodologies are not comprehensive and suitable for every project and have a
number of limitations. Besides, they have not been widely evaluated and applied
in various projects. PDM has a narrow scope focusing on a limited number of
capabilities of process mining. Besides, it neglects the importance of business
considerations, planning and domain knowledge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. L* lifecycle model primarily
focuses on discovery of a single process model enriched with performance and
resource information and therefore, it is more suitable for structured processes and
narrowly scoped projects [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition, non of the two o er su cient exibility
and iterations. The sequence of activities suggested are assumed to be followed
rather strictly for every project which is rarely the case in complex projects.
In di erent projects depending on di erent requirements, some steps might be
skipped or performed in a di erent sequence. Although PM2 addresses a few
limitations of the previously mentioned methodologies, it can still be improved
and extended with more exibility, more detailed and speci c steps, techniques,
best practises and practical guidelines. Successful examples of methodologies can
be found in the eld of data mining where similar works such as CRISP DM [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
have been applied successfully for many years in variety of settings.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach</title>
      <p>In order to construct a comprehensive methodology we will rst de ne
methodology and clarify what a methodology is and what it should contain and motivate
the use and bene ts of such methodologies. Then we will formally compare
related work from both process mining literature and related elds such as data
mining based on their structure, applicability and reputation. We will also
analyze case studies and use cases from a structural point of view before establishing
the methodology. In addition, we will make use of questionnaires among process
mining experts both in academia and industry to consolidate the motivation and
formation of the methodology. Afterwards, the methodology needs to be tested,
evaluated and consequently adjusted followed by continuous re nements. The
Towards a Comprehensive Methodology for Process Mining
next section provides an initial high-level overview of the methodology to be
developed.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Outline of the methodology</title>
      <p>
        A methodology should be able to be applied to speci c projects in di erent
context with di erent goals and requirements while remaining as generic as possible
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Therefore, the proposed methodology in this work will consist of di erent
levels of abstraction each having di erent characteristics and di erent purposes.
The highest level consists of general phases and stages involved in process mining
projects. This high level view needs to be as generic as possible accounting for
all possible scenarios and contexts process mining can be applied. The lower
levels consist of more detailed generic and speci c activities for each phase driven
by the context of speci c project goals and requirements. The lowest level of
methodology involves an actual run of these activities for a speci c project. This
hierarchical nature of the methodology allowing exibility and addressing the
challenge of balancing genericity and speci city is one of the main features of
the methodology.
      </p>
      <p>
        In addition, the methodology contains a user guide with best practises,
common approaches and techniques in order to guide the user with various
challenges. The methodology describes the overall approach to extract knowledge
and insight into processes and provide a roadmap to follow while planning and
carrying out process mining projects, addressing two of the process mining
challenges stated in the process mining manifesto [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] namely Improving Usability
for Non-Experts, and Improving Understandability for Non-Experts. It also
facilitates and encourages e orts for automation and reusability of process mining
project ows and currently manual (or partially automated) and time consuming
steps such as data extraction and preparation.
      </p>
      <p>Process mining projects usually involve the following high level phases:
Planning which focuses on both the business and technical aspects of the
project. In this phase project plan, requirements, objectives and available
resources are identi ed and discussed and a concrete project plan is prepared.</p>
      <p>Data Discovery containing data extraction and event log preparation. The
journey here, which could be one of the most challenging parts of the project
starts with identifying relevant data sources, nding, extracting, merging and
cleaning the extracted data and leads to preparing an event log in required
formats.</p>
      <p>Process Discovery consisting of explorative analysis, process overview and
control- ow discovery. Depending on requirements and the nature of the project,
explorative or goal-driven, the steps taken in this phase vary. Initial insights and
statistics into the process is gained which assist the following step of analysis.
based on this initial insights project might go back to previous phases to modify
and adjust plans or to collect additional data or to adopt di erent views on the
data.</p>
      <p>Analysis focusing on the main analysis and evaluation of the result.
Different types of analysis can be performed and process mining techniques can
be combined with data mining, statistics and other types of analysis to provide
useful insight and knowledge and address the project objectives.</p>
      <p>Knowledge Transfer phase which can be reporting diagnostics and
improvement insights and/or preparing a monitoring system for operational
support. Due to the iterative nature of process mining projects, there should be
multiple iterations introduced between di erent phases.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper outlines an overview and the landscape of a comprehensive
methodology for process mining. The prospective methodology will involve several
hierarchical levels, consisting of high level phases to speci c activities for each
phase. In addition, a user guide and best practises and techniques will be
included to facilitate successful projects in various settings. Continuous extension,
re nement and evaluation need to be performed before and after establishment
of the methodology to ensure generality, completeness and applicability.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>van Der Aalst</surname>
            ,
            <given-names>W.M.P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adriansyah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Medeiros</surname>
            ,
            <given-names>A.K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arcieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baier</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blickle</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          , Van Den Brand,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Brandtjen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Buijs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            and
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Process mining manifesto</article-title>
          .
          <source>In International Conference on Business Process Management</source>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>194</lpage>
          . Springer, Berlin, Heidelberg (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bozkaya</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabriels</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Werf</surname>
          </string-name>
          , J.:
          <article-title>Process diagnostics: a method based on process mining</article-title>
          . In: International Conference on Information, Process, and Knowledge Management,
          <year>eKNOW 2009</year>
          , pp.
          <fpage>2227</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. van Eck,
          <string-name>
            <given-names>M.L.</given-names>
            ,
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Leemans</surname>
          </string-name>
          , S.J. and
          <string-name>
            <surname>van der Aalst</surname>
          </string-name>
          , W.M.
          <article-title>: PM 2: A Process Mining Project Methodology</article-title>
          .
          <source>In International Conference on Advanced Information Systems Engineering</source>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>313</lpage>
          . Springer, Cham (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Irny</surname>
            ,
            <given-names>S.I.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rose</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Designing a Strategic Information Systems Planning Methodology for Malaysian Institutes of Higher Learning (isp- ipta</article-title>
          ),
          <source>Information System</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ), (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rebuge</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          :
          <source>Business Process Analysis in Healthcare Environments: a Methodology based on Process Mining. Information Systems</source>
          <volume>37</volume>
          (
          <issue>2</issue>
          ),
          <volume>99116</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Shearer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The crisp-dm model: the new blueprint for data mining</article-title>
          .
          <source>Journal of data warehousing 5(4)</source>
          ,
          <volume>1322</volume>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>