<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving process model retrieval by accounting for gateway nodes: an ongoing work</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Montani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>G. Leonardi (</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Quaglini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Baudi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>(1) DISIT, Computer Science Institute, Universita del Piemonte Orientale</institution>
          ,
          <addr-line>Alessandria</addr-line>
          ,
          <country country="IT">Italy (</country>
          <institution>2) Dipartimento di Informatica e Sistemistica, Universita di Pavia</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process model comparison and similar processes retrieval is a key issue to be addressed in many real world situations, and a particularly relevant one in some applications (e.g., in medicine), where similarity quanti cation can be exploited in a quality assessment perspective. In a recent work, we have introduced a framework which allows to: (i) extract the actual process model from the available process execution traces, through process mining techniques; and (ii) compare (mined) process models, by relying on a novel distance measure. The tool has been successfully tested in stroke management. In this paper, we are further extending that contribution, by explicitly taking into account complex control ow information, represented in the process model as gateway nodes (i.e., AND/XOR join/splits). The work is still ongoing, but, once completed and tested, it will represent a signi cant advance with respect to the state of the art research in the eld, with a range of applications which is of course not limited only to medicine.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Process model comparison and similar processes retrieval is a key issue to be
addressed in many real world situations. For example, when two companies are
merged, process engineers need to compare processes originating from the two
companies, in order to analyze their possible overlaps, and to identify areas for
consolidation. Moreover, large companies build over time huge process model
repositories, which serve as a knowledge base for their ongoing process
management/enhancement e orts. Before adding a new process model to the repository,
process engineers have to check that a similar model does not already exist, in
order to prevent duplication.</p>
      <p>Particularly interesting is the case of medical process model comparison,
where similarity quanti cation can be exploited in a quality assessment
perspective. Indeed, the process model actually implemented at a given healthcare
organization can be compared to the existing reference clinical guideline, e.g., to
check conformance, or to understand the level of adaptation to local constraints
that may have been required. As a matter of fact, the existence of local resource
constraints may lead to di erences between the models implemented at di erent
hospitals, even when referring to the treatment of the same disease (and to the
same guideline). A quanti cation of these di erences (and maybe a ranking of
the hospitals derived from it) can be exploited for several purposes, like , e.g.,
legal purposes, performance evaluation and funding distribution.</p>
      <p>
        The actual medical process models are not always explicitly available at the
healthcare organization. However, a database of process execution traces (also
called the \event log") is typically maintained, since all the hospital activities
are normally recorded by means of the work ow technology. In this case, process
mining techniques [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] can be exploited, to extract process related information
(e.g., process models) from log data.
      </p>
      <p>Stemming from these considerations, we have recently started to work at a
framework [10], which allows to:
1. extract the actual process model from the available process execution traces,
through process mining techniques;
2. perform process model comparison, to ful ll the objectives described above.</p>
      <p>
        Issue 2 has required the introduction of proper metrics, in order to quantify
process model similarity. We could rely on an extensive literature when studying
this topic (see, e.g., [
        <xref ref-type="bibr" rid="ref2">11, 2</xref>
        ]). In particular, since process mining extracts the
process model in the form of a graph, where nodes represent activities, and
edges represent the control ow, our work is located in the research stream on
structural similarity, and on graph-edit-distance-based approaches [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ].
      </p>
      <p>
        The state of the art on structural similarity on process models is represented
by the work by Dijkman et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In our previous contribution [10], we have
extended the work in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], by: (i) exploiting domain knowledge; (ii) exploiting
process mining outputs (e.g., edge reliability, see section 3.1). The use of domain
knowledge is particularly signi cant in medical applications, the eld in which
our work has been tested so far.
      </p>
      <p>
        However, as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], in [10] we have simpli ed the mined output, by removing
all information about parallel or mutually exclusive execution of activities.
      </p>
      <p>Addressing that limitation is the goal of the present work. In this paper, we
will thus describe:
{ a proper mapping of the mined output to a graph that includes gateway
(i.e., AND/XOR join/splits) nodes;
{ an enhanced version of the metric in [10], able to take into account gateway
nodes.</p>
      <p>This approach will be carefully analysed and experimentally tested in the
near future. While the rst experiments are foreseen in the eld of stroke
management (as we did in [10]), the work is meant to be general enough to be
exploited in very di erent applications as well.</p>
      <p>The paper is organized as follows: section 2 introduces the process mining
technique we are using; section 3.1 summarizes the metrics we de ned in [10];
section 3.2 illustrates the enhancements we are now developing, and represents
the core contribution of this work. Finally, section 4 addresses conclusions and
future work directions.</p>
    </sec>
    <sec id="sec-2">
      <title>Process Mining and the ProM tool</title>
      <p>Process mining describes a family of a-posteriori analysis techniques exploiting
the information recorded in event logs, to extract process related information
(e.g., process models). Typically, these approaches assume that it is possible
to sequentially record events such that each event refers to an activity (i.e., a
well de ned step in the process) and is related to a particular process instance.
Furthermore, some mining techniques use additional information such as the
timestamp of the event, or data elements recorded with the event.</p>
      <p>Traditionally, process mining has been focusing on discovery, i.e., deriving
process models and execution properties from event logs. It is important to
mention that there is no a-priori model, but, based on logs, some model, e.g.,
a Petri net, is constructed. However, process mining is not limited to process
models (i.e., control ow), and recent process mining techniques are more and
more focusing on other perspectives, e.g., the organisational perspective, the
performance perspective or the data perspective. Moreover, as well stated in [6],
process mining also supports conformance analysis and process enhancement. In
this paper, however, we only deal with the process perspective.</p>
      <p>In our work, we are relying on the Process Mining tool called ProM,
extensively described in [12]. ProM is a platform independent open source framework
which supports a wide variety of process mining and data mining techniques,
and can be extended by adding new functionalities in the form of plug-ins.</p>
      <p>In particular, we have chosen ProM's heuristic miner for mining the process
models. Heuristic miner takes in input the event log, and considers the order of
the events within every single process instance execution. The time stamp of an
activity is used to calculate this ordering. Heuristics miner is tolerant to noise,
and can be used to express the main behavior (i.e., not all details) registered in a
log. Indeed, some abstract information, such as the presence of composite tasks
(i.e., tasks semantically related to their constituent activities by means of the
\part-of" relation), cannot be derived by Heuristic Miner, that will only build a
model including ground (i.e., not further decomposable) activities. On the other
hand, it can mine the presence of short distance and long distance dependencies
(i.e., direct or indirect sequence of activities), and information about parallelism,
with a certain reliability degree (see also section 3.1). The output of the mining
process is provided as a graph, also called \dependency graph", where nodes
represent activities, and arcs represent control ow information.
3</p>
      <p>Extending graph edit distance for process model
comparison
3.1</p>
      <p>
        Exploiting domain knowledge and process mining output
In order to compare process models, in [10] we have introduced a distance de
nition that extends previous literature contributions [
        <xref ref-type="bibr" rid="ref2 ref4">4, 2</xref>
        ] by properly considering
domain knowledge, and additional information, learned through process mining.
      </p>
      <p>
        In particular, since mined process models are represented in the form of
graphs (where nodes represent activities and edges provide information about
the control ow), we de ne a distance based on the notion of graph edit
distance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Such a notion calculates the minimal cost of transforming one graph
into another by applying insertions/deletions and substitutions of nodes, and
insertions/deletions of edges.
      </p>
      <p>While string edit distance looks for an alignment that minimizes the cost of
transforming one string into another by means of edit operations, in graph edit
distance we have to look for a mapping. A mapping is a function that matches
nodes to nodes, and edges to edges. Among all possible mappings, we will select
the one that leads to the minimal cost, having properly quanti ed the cost of
every type of edit operation.</p>
      <p>
        As in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we have provided a normalized version of the approach in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Moreover, with respect to both [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we have introduced two novel
contributions:
1. we calculate the cost of node substitution f subn (see De nition 4 below)
by applying taxonomic distance [9, 8] (see De nition 1), and not string
edit distance on node names as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Indeed, we have organized the various
activities executable in our domain in a taxonomy, where activities of the
same type (e.g., Computer Assisted Tomography (CAT) with or without
contrast) are connected as close relatives. The use of this de nition allows
us to explicitly take into account this form of domain knowledge, since the
distance between two activities is set to the normalized number of arcs on the
path between the two activities themselves in the taxonomy (see De nition
1);
2. we add a cost contribution related to edge substitution (f sube in De
nition 4 below), that incorporates information learned through process mining,
namely (i) the percentage of traces that have followed a given edge, and (ii)
the reliability of a given edge, i.e., of the control ow relationship between
two activities. Both items (i) and (ii) are outputs of heuristic miner (see
De nitions 2 and 3 below).
      </p>
      <p>Formally, the following de nitions apply:
De nition 1: Taxonomic Distance.</p>
      <p>Let and be two activities in the taxonomy t, and let be the closest common
ancestor of and . The Taxonomic Distance dt( ; ) between and is de ned
as:
dt( ; ) =</p>
      <p>N1 + N2
N1 + N2 + 2</p>
      <p>N3
where N1 is the number of arcs in the path from and in t, N2 is the number
of arcs in the path from and , and N3 is the number of arcs in the path from
the taxonomy root and .</p>
      <p>De nition 2: Reliability. The reliability of the edge ei assessing that activity
a directly follows activity b in sequence (i.e., ei is an arc from b to a) is calculated
as [13]:
rel(ei) =</p>
      <p>ja &gt; bj jb &gt; aj
ja &gt; bj + jb &gt; aj + 1
where ja &gt; bj is the number of occurrences in which activity a directly follows
activity b in the event log, and jb &gt; aj is the number of occurrences in which
activity b directly follows activity a.</p>
      <p>A negative reliability value means that we must conclude that the opposite
pattern holds, i.e., activity b follows activity a. Indeed, the reliability of a
relationship (e.g., activity a follows activity b) is not only in uenced by the number of
occurrences of this pattern in the logs, but is also (negatively) determined by the
number of occurrences of the opposite pattern (b follows a). However, edges with
a negative reliability will not appear in the dependency graph (due to threshold
mechanisms and proper heuristics [13], that rule them out). Therefore, we will
deal with reliability values 2 (0; 1)
De nition 3: Percentage of traces. The percentage of traces that crossed
edge ei, assessing that activity a directly follows activity b in sequence, is
calculated as:
ptrace(ei) =</p>
      <p>
        ja &gt; bjt
jALLT RACEj
where ja &gt; bjt is the number of traces in which activity a directly follows
activity b in the event log, and jALLT RACEj is the total number of available
traces in the event log. With this de nition, the percentage of traces 2 [0; 1].
De nition 4: Extended Graph Edit Distance. Let G1 = (N 1; E1) and
G2 = (N 2; E2) be two graphs, where Ei and N i represent the sets of edges
and nodes of graph Gi. Let M be a partial injective mapping [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that maps
nodes in N 1 to nodes in N 2 and let subn, sube, skipn and skipe be the sets
of substituted nodes, substituted edges, inserted or deleted nodes and inserted
or deleted edges with respect to M . In particular, a substituted edge connects
a pair of substituted nodes in M . The fraction of inserted or deleted nodes,
denoted f skipn, the fraction of inserted or deleted edges, denoted f skipe, and
the average distance of substituted nodes, denoted f subn, are de ned as follows:
f skipn =
f skipe =
      </p>
      <p>jskipnj
jN 1j + jN 2j</p>
      <p>jskipej
jE1j + jE2j
∑n;m2M dt(n; m)
jsubnj
where n and m are two mapped nodes in M .</p>
      <p>The average distance of substituted edges f sube is de ned as follows:
f sube =
∑(n1;n2);(m1;m2)2M (jrel(e1) rel(e2)j + jptrace(e1) ptrace(e2)j)
jsubej
where edge e1 (connecting node n1 to node m1) and edge e2 (connecting
node n2 to node m2) are two substituted edges in M , rel(ei) is the reliability of
edge ei, and ptrace(ei) is the percentage of traces that crossed edge ei.</p>
      <p>The extended graph edit distance induced by the mapping M is:
extedit =
wskipn f skipn + wskipe f skipe + wsubn f subn + wsube f sube
wskipn + wskipe + wsubn + wsube
where wsubn, wsube, wskipn and wskipe are proper weights 2 [0; 1].</p>
      <p>
        The extended graph edit distance of two graphs is the minimal possible
distance induced by a mapping between these graphs. To nd the mapping that
leads to the minimal distance we resort to the greedy algorithm described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3.2
      </p>
      <p>Exploiting gateway nodes comparison
The dependency graph may contain complex control ow information (i.e., other
than sequence) between the mined process activities. Actually, parallelism and
mutual exclusion can be identi ed during the mining process. Activity nodes in
the dependency graph, in this case, can contain up to three types of information
(see \action B" in gure 1, left part): an input label (e.g., XOR, AND, or more
complex combinations, such as an AND of two XORs); the current activity name;
an output label. We will focus on simple labels (i.e. AND or XOR), for the sake
of clarity.</p>
      <p>The input label provides information about the incoming ow. Two situations
may take place:
1. if only one edge enters the node (see node B in gure 1), the input label
means that the current node is preceded by a split. E.g., it has to be carried
out in mutual exclusion with one or more other activities, if the label is a
XOR;
2. if, on the other hand, multiple edges enter the node, the input label means
that the current node is preceded by a join (see node D in gure 1).</p>
      <p>The output label works dually: multiple outgoing edges mean that the node
is followed by a split (see node A in gure 1), a single outgoing edge means that
the node is followed by a join (see node B in gure 1).</p>
      <p>
        Such a representation can make the dependency graph very complex, and
di cult to read. Moreover, a direct application of the metric in section 3.1 to
this graph is not possible. Actually, the information in input and output labels
was simply disregarded in the metric in section 3.1 (in line with the approach in
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
      </p>
      <p>
        In the current version of our work, on the other hand, we are exploiting this
information, by mapping the dependency graph onto another graph structure,
in which gateway nodes (AND/XOR join/splits) are explicitly introduced, and
control ow patterns are clearer. The new graph structure is also more similar
to classic work ow representation languages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Speci cally, somehow similarly to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the mapping works as follows:
{ if a node has only one incoming edge, we remove the input label, if any;
{ otherwise, if the node has multiple incoming edges, we introduce a proper join
gateway node, as the input label indicates (i.e., AND or XOR); connecting
arcs are also introduced;
{ if a node has only one outgoing edge, we remove the output label, if any;
{ otherwise, if the node has multiple outgoing edges, we introduce a proper
split gateway node; connecting arcs are also introduced.
      </p>
      <p>For instance, gure 1 shows the mapping of a small dependency graph (on the
left) into a graph containing both activity and gateway nodes (on the right).</p>
      <p>While the metric in section 3.1 properly compares activity nodes, a new
contribution has to be added to it, in order to account for gateway nodes.</p>
      <p>To this end, we proceed as follows:
1. if the two gateway nodes x and y are of di erent types (i.e. a XOR and an</p>
      <p>AND) their distance is set to 1;
2. if the two gateway nodes x and y are of the same type (e.g., two ANDs), we
have to calculate the di erence between:
(a) the incoming gateway nodes;
(b) the incoming activity nodes;
(c) the outgoing gateway nodes;
(d) the outgoing activity nodes.</p>
      <p>As regards item 2.(b), let S1 be the sequence of incoming activity nodes of
the rst gateway node x; let S2 be the sequence for the second gateway node y.
Without loss of generality, suppose that S2 is not longer than S1. In order to
compare S1 ad S2, we try all possible permutations in the order of the activity
nodes in S2, and take the one that leads to the minimal distance with respect
to S1. The distance between the two sequences is the average of the distance
between single elements (i.e., pairs of activities), over the length of the longest
sequence. The distance between a pair of activities is calculated resorting to
taxonomic distance (see De nition 1 in section 3.1). Every activity in S2 that
cannot be mapped to any activity in S1 (because S1 is longer than S2) contributes
with a distance of 1.</p>
      <p>Item 2.(d) works analogously.</p>
      <p>Items 2.(a) and 2.(c) are simpler: identical incoming (respectively, outgoing)
gateway nodes in the two sequences (e.g., two AND) provide a contribution of 0;
di erent gateway nodes (i.e., an AND and a XOR) provide a contribution of 1.
As above, we then calculate the average over the length of the longest sequence
of gateway nodes. This procedure is obviously a simpli cation, since incoming
gateway nodes may have other gateway nodes in input as well, but we do not
consider this (recursive) information. Similar considerations hold for outgoing
gateway nodes. This choice was motivated by computational complexity issues,
but could be reconsidered in the future.</p>
      <p>The four contribution are then combined in a weighted average df (x; y), in
which weights can be experimentally set.</p>
      <p>In the current version of our framework, the f subn contribution of De nition
4 (see section 3.1) is therefore modi ed as follows:
2 (∑n;m2MA dt(n; m) + ∑x;y2MG df (x; y))</p>
      <p>jsubnj
where MA represents the set of mapped activity nodes in the mapping M ,
MG represents the set of mapped gateway nodes in M , and df (x; y) calculates the
distance between two gateway nodes x and y in MG by applying the procedure
described above.</p>
      <p>Action B</p>
      <p>Action C
XOR
XOR</p>
      <p>Action A</p>
      <p>XOR</p>
      <p>XOR
Action D</p>
      <p>XOR
XOR</p>
      <p>Action A</p>
      <p>XOR
Action B</p>
      <p>Action C</p>
      <p>XOR
Action D</p>
    </sec>
    <sec id="sec-3">
      <title>Discussion, conclusions and future work</title>
      <p>
        In this paper, we have described how we are extending a framework for process
mining and process comparison, whose rst version has been published in [10].
In particular, we are making distance calculation more general, and closer to
the semantic meaning of the mined process model, by explicitly taking into
account gateway nodes. This enhancement can potentially represent a signi cant
advance with respect to the state of the art research in the eld, since gateway
nodes are typically disregarded in process comparison approaches, with a few
exception (see e.g., [7], which is however limited to identify changes with respect
to an ongoing process instance, in an agile work ow context; see also [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which
makes gateway nodes explicit in the process model, but does not introduce a
contribution related to them in the metric for process comparison).
      </p>
      <p>The current approach will undergo a thorough analysis in the next months,
and several experimental evaluations will be planned. We also wish to test
our work in combination with di erent mining algorithms (other than heuristic
miner). Finally, we plan to consider loops, a control ow structure which was
never observed in the stroke management domain, but which could be signi cant
in other applications.</p>
      <p>We believe then, once extended and validated, the tool will represent a useful
means for process comparison and retrieval, in medicine as well as in other
application elds.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Ackowledgements</title>
      <p>We are grateful to A. Cavallini and G. Micieli (IRCCS Fondazione \C. Mondino",
Pavia, Italy) for providing stroke management data.</p>
      <p>This research is partially supported by the GINSENG Project, Compagnia
di San Paolo.
6. http : ==www:win:tue:nl=ieeetf pm. IEEE Taskforce on Process Mining: Process</p>
      <p>Mining Manifesto.
7. M. Minor, A. Tartakovski, D. Schmalen, and R. Bergmann. Agile work ow
technology and case-based change reuse for long-term processes. International Journal
of Intelligent Information Technologies, 4(1):80{98, 2008.
8. S. Montani and G. Leonardi. Retrieval and clustering for
supporting business process adjustment and analysis. Information Systems, DOI:
http://dx.doi.org/10.1016/j.is.2012.11.006.
9. S. Montani and G. Leonardi. Retrieval and clustering for business process
monitoring: results and improvements. In B. Diaz-Agudo and I. Watson, editors, Proc.
International Conference on Case-Based Reasoning (ICCBR) 2012, Lecture Notes
in Arti cial Intelligence 7466, page 269283. Springer-Verlag, Berlin, 2012.
10. S. Montani, G. Leonardi, S. Quaglini, A. Cavallini, and G. Micieli. Mining and
retrieving medical processes to assess the quality of care. In Sarah Jane Delany
and Santiago Ontan~on, editors, ICCBR, volume 7969 of Lecture Notes in Computer
Science, pages 233{240. Springer, 2013.
11. G. Valiente. Algorithms on Trees and Graphs. Springer, 2002.
12. B. van Dongen, A. Alves De Medeiros, H. Verbeek, A. Weijters, and W. Van der
Aalst. The proM framework: a new era in process mining tool support. In G. Ciardo
and P. Darondeau, editors, Knowledge Mangement and its Integrative Elements,
pages 444{454. Springer, 2005.
13. A. Weijters, W. Van der Aalst, and A. Alves de Medeiros. Process Mining with
the Heuristic Miner Algorithm, BETA Working Paper Series, WP 166. Eindhoven
University of Technology, Eindhoven, 2006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Work ow management coalition, http : ==www:wf mc:org=wf mc publications:html.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunke</surname>
          </string-name>
          .
          <article-title>On a relation between graph edit distance and maximum common subgraph</article-title>
          .
          <source>Pattern Recognition Letters</source>
          ,
          <volume>18</volume>
          (
          <issue>8</issue>
          ):
          <fpage>689694</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>W.</given-names>
            <surname>Van der Aalst</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Dongen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Herbst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Maruster</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Schimm, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Weijters</surname>
          </string-name>
          .
          <article-title>Work ow mining: a survey of issues and approaches</article-title>
          .
          <source>Data and Knowledge Engineering</source>
          ,
          <volume>47</volume>
          :
          <fpage>237</fpage>
          {
          <fpage>267</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Garca-Banuelos</surname>
          </string-name>
          .
          <article-title>Graph matching algorithms for business process model similarity search</article-title>
          .
          <source>In Proc. International Conference on Business Process Management</source>
          , pages
          <volume>48</volume>
          {
          <fpage>63</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gfeller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kuster</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Volzer</surname>
          </string-name>
          .
          <article-title>Identifying refactoring opportunities in process model repositories</article-title>
          .
          <source>Information and Software Technology</source>
          ,
          <volume>53</volume>
          :
          <fpage>937</fpage>
          {
          <fpage>948</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>