<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dealing with Artifact-Centric Systems: a Process Mining Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guangming Li</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renata Medeiros de Carvalho</string-name>
          <email>r.carvalho@tue.nl</email>
        </contrib>
      </contrib-group>
      <fpage>80</fpage>
      <lpage>84</lpage>
      <abstract>
        <p>Process mining provides a series of techniques to analyze business processes based on execution data in enterprises. It has been successfully applied to classical processes on WFM/BPM systems, in which one process execution consists of events attached with the same case id. However, existing process mining techniques sufer from problems when dealing with artifact-centric systems, such as ERP and CRM, in which a business process involves a set of interacting artifacts and a case notion for the whole process is missing. Some typical problems are convergence and divergence in XES logs, and lost interactions between multiple instances in process models. Existing artifact-centric approaches try to address these problems, but have not yet solved them satisfactorily. For instance, one has to pick an instance notion in each artifact, the description of the end-to-end behavior is distributed over multiple diagrams, and the interactions between the data perspective and the behavioral perspective are not explicitly presented. This paper proposes a set of new techniques, such as a novel log format and a novel modeling language, to enable process mining for artifact-centric systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Artifact-Centric Systems</kwd>
        <kwd>Process Mining</kwd>
        <kwd>Object-Centric</kwd>
        <kwd>Event Logs</kwd>
        <kwd>Process Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Nowadays, information systems are widely used in enterprises to support their daily business
process executions. Such an information system is called Process-Aware Information System
(PAIS), since it needs to be aware of business processes [Du05]. A typical class of PAISs is
formed by generic systems that are process-centric and driven by explicit process models,
i.e., one process execution on these systems is constituted by a single case with a unique
case identifier. E xamples a re W orkflow Ma nagement (W FM) sy stems [v dAvH04] and
Business Process Management (BPM) systems [We07]. Another class of PAISs consists of
artifact-centric systems that do not have a unique case notion, which could be used to trace
and isolate its executions. The entire process on these systems is seen as a set of interacting
business entities called artifacts. Examples are Enterprise Resource Planning (ERP) systems
(SAP, Oracle, etc.) and Customer Relationship Management (CRM) systems [O’00].
Process executions on PAISs generate various data, e.g., relational database tables and
event logs, which can be analyzed to discover insights to reflect the "health"condition of
2</p>
    </sec>
    <sec id="sec-2">
      <title>Challenges</title>
      <p>As mentioned above, artifact-centric systems do not assume case notions in their business
processes. Therefore, existing process mining techniques sufer from the following problems
when they are applied to these systems.</p>
      <p>The XES log format harms the quality of original data. There often exist one-to-many and
many-to-many relationships in the data generated by artifact-centric systems. Therefore,
a case notion for the whole process is dificult to be identified. If we straightjacket such
data into XES logs, it flattens multi-dimension data as separate traces, which leads to
convergence and divergence problems. Besides, the XES format focuses more on the
behavioral perspective (i.e., only considering events and information related to events),
which may not present useful information on the data perspective.</p>
      <p>Existing modeling languages are dificult to model interactions (i) between process instances
and (ii) between the data perspective and the control-flow perspective. Existing process
modeling languages (e.g., Petri nets, EPCs and BPMN) consider process instances in
isolation. The interactions between instances cannot be described properly. Besides, they
mainly focus on the control-flow perspective. Powerful constructs present in ER models
[Ch88] and UML class models, which can easily deal with one-to-many and many-to-many
relationships are not employed at all. Moreover, constraints on the data perspective must
influence behavior, but this interaction is not described by existing languages .
Deviations are not totally detected. Some deviations on the behavioral perspective can only
be detected by considering multiple instances and constraints in the class model. In this
situation, the weak data perspective in existing models makes such deviations undetectable.</p>
      <p>Dealing with Artifact-Centric Systems: a Process Mining Approach 82
Performance analysis may not be reliable. Due to convergence and divergence problems,
the performance analysis result may be imprecise (e.g., inaccurate frequencies). Besides,
because of missing information on data perspective, useful insights for users on this
perspective may not be provided by performance analysis.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Approaches</title>
      <p>The problems discussed in Section 2 prevent the employment of “classic” process mining
techniques on artifact-centric systems. In this section, we propose new process mining
techniques to solve these problems, as shown in Figure 1. In general, the spectrum of our
approaches are consistent with the lifecycle of “classic” process mining research, i.e., our
new process mining techniques try to reach the same goals on artifact-centric systems, as the
“classic” process mining approaches reach on WFM/BPM systems. More precisely, based on
a novel log format and a new type of models, we propose new process mining approaches
covering log extraction, model discovery, conformance checking and performance analysis,
to enable process mining on artifact-centric systems.</p>
      <p>“business
processes”
supports/
controls
artifact-centric
information system
reflect</p>
      <p>diagnosis
conformance</p>
      <p>performance
extraction</p>
      <p>discovery
XOC logs</p>
      <p>OCBC models
eXtensible Object-Centric event logs. We propose a novel log format named eXtensible
Object-Centric (XOC) to organize the data generated by artifact-centric systems [vdALM17].
Artifact-centric systems do not have a clear case notion for the whole process, but they
follow an intuitive principle that each occurred event on the system changes the state of the
system (i.e., adding, updating or deleting records in the underlying database). Triggered by
this idea, a XOC log consists of a set of ordered events and each event corresponds to an
object model representing the database, which provides an evolutionary view of the system.
Note that an object model may represent only the tables involved in the target process when
the database covers multiple processes.</p>
      <p>Object-Centric Behavioral Constraint models. We propose a novel modeling language
[vdALM17], that combines data modeling languages (ER, UML, or ORM) and declarative
languages (Declare, CMMN, or GSM), resulting in Object-Centric Behavioral Constraint
(OCBC) models. More precisely, as shown in Figure 2, an OCBC model consists of a
class model (presenting cardinality constraints between classes on the data perspective), a
behavioral model (presenting declarative constraints between activities on the control-flow
perspective), and relationships ( 1 v 4 ) which connect these two models by relating
activities to classes. Unlike existing declarative languages, the scope of each behavioral
constraint (e.g., 7 ) is identified by classes (e.g., “order line") rather than case notions.
create
order
1</p>
      <p>1
1
order
*
1
6
5
pick
item
1
2</p>
      <p>7
1
1. * order line
*
1
1. *
9</p>
      <p>8
wrap
item
1
3
1 product</p>
      <p>Besides, we propose approaches to automatically extract XOC logs from relational databases
of artifact-centric systems and discover OCBC models from XOC logs [LdCvdA]. Based on
a XOC log and a reference OCBC model, a set of rules are defined to check the conformance
between them [vdALM17]. In future, we also plan to analyze the performance based on
the statistics of frequencies and time, which can be obtained by replaying a XOC log on an
OCBC model. More precisely, various metrics can be defined to analyze the performance
of business processes from diferent angles.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>The artifact-centric approaches [CH09] (including the earlier work on proclets [Aa01])
consider the entire process as a set of interacting artifacts. Each of these artifacts can be
described by an information schema (called an artifact schema) and a non-trivial lifecycle
(indicating how the artifact evolves through a process execution). However, these approaches
sufer from the following problems: (i) within an artifact (proclet, or subprocess), one is
forced to pick a single instance notion (although a case notion for the whole process is
not required); (ii) the description of the end-to-end behavior needs to be distributed over
multiple diagrams (e.g., one process model per artifact); (iii) the control-flow cannot be
related to an overall data model (i.e., there is no explicit data model or it is separated from
the control-flow); (iv) interactions between diferent entities are not visible or separated
(because artifacts are distributed over multiple diagrams); and (v) cardinality constraints in
the data model cannot be exploited while specifying the intended dynamic behavior.
Besides, colored (data-aware) Petri nets [Je96] add “color” on tokens to attach a data
perspective on the behavioral perspective. BPMN [Gr10], Data flow chart and UML activity
diagram [EP00] can describe behavioral perspective and its communication with data
perspective by data objects and data stores. Concepts like lanes, pools, and message flows
in conventional languages like BPMN can model interactions between process instances. In
summary, these models mentioned above can describe the data perspective and interactions
to some extent, but more powerful constructs present in ER models and UML class models
are not employed at all in these models.
[Aa01]
[CH09]
[EP00]
[Je96]
[LdCvdA]
[O’00]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Ch88] [Du05] [Gr10]
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          et al.:
          <article-title>Proclets: A framework for lightweight interacting workflow processes</article-title>
          .
          <source>International Journal of Cooperative Information Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>04</issue>
          ):
          <fpage>443</fpage>
          -
          <lpage>481</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>P.P.S.:</given-names>
          </string-name>
          <article-title>The Entity-Relationship Model-Toward a Unified View of Data. In: Readings in artificial intelligence and databases</article-title>
          , S.
          <fpage>98</fpage>
          -
          <lpage>111</lpage>
          . Elsevier,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Hull,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Business Artifacts: A Data-Centric Approach to Modeling Business Operations and Processes</article-title>
          .
          <source>IEEE Data Engineering Bulletin</source>
          ,
          <volume>32</volume>
          (
          <issue>3</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.:
          <article-title>Process-aware information systems: bridging people and software through process technology</article-title>
          . John Wiley &amp; Sons,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Eriksson</surname>
            ,
            <given-names>H.E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Penker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Business Modeling with UML</article-title>
          . New York, S.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Group</surname>
          </string-name>
          , Object Management:
          <article-title>Business Process Model and Notation</article-title>
          . OMG,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          Springer-Verlag, Berlin,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G</given-names>
          </string-name>
          .; de Carvalho, R.M.;
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Automatic Discovery of Object-Centric Behavioral Constraint Models</article-title>
          .
          <source>In: Proceedings of BIS 2017. S. 43-58.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>O'Leary</surname>
          </string-name>
          , Daniel E:
          <article-title>Enterprise resource planning systems: systems, life cycle, electronic commerce, and risk</article-title>
          . Cambridge university press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [vdALM17]
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Marco,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Object-Centric Behavioral Constraints</article-title>
          .
          <source>Corr technical report</source>
          , arXiv.org e-Print archive,
          <year>2017</year>
          . Available at https://arxiv.org/abs/1703.05740.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [vdAvH04]
          <string-name>
            <surname>van der Aalst</surname>
            , W.M.P.; van Hee,
            <given-names>K.M.:</given-names>
          </string-name>
          <article-title>Workflow management: models, methods, and systems</article-title>
          . MIT press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          Springer-Verlag New York, Inc., Secaucus, NJ, USA,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>