<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adaptive Process Model Matching { Improving the E ectiveness of Label-based Matching through Automated Con guration and Expert Feedback</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christopher Klinkmuller</string-name>
          <email>christopher.klinkmuller@data61.csiro.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data61 CSIRO</institution>
          ,
          <addr-line>Eveleigh NSW 2015</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Organizations often store information about their process landscape in large process model collections. To utilize those knowledge bases, process model matchers assist experts in keeping track of model relationships by automatically identifying corresponding activities. Yet, state-of-the-art matchers achieve an overall low e ectiveness. That is, their results contain only a few existing, but many non-existing correspondences. Hence, practical application is far from being realized. In light of this, the thesis introduces ADBOT, an interactive matching technique that analyzes manually corrected matcher results to adapt itself to the characteristics of the respective model collection. At heart, ADBOT relies on BOT and OPBOT. BOT is a con gurable matcher that solely examines activity labels and OPBOT analyzes model collection characteristics to automatically con gure BOT without human intervention. While the three matchers pose di erent requirements on model and feedback availability, they all achieve a high e ectiveness in relation to the state-of-the-art, as shown in a comparative evaluation. ADBOT as the most e ective technique even yields improvements of up to 70%.</p>
      </abstract>
      <kwd-group>
        <kwd>Process model matching</kwd>
        <kwd>process similarity</kwd>
        <kwd>process model collection management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Process models are an essential tool to communicate, analyze, design and
continuously improve business processes. In organizations that use process models
large process model collections are not uncommon. To leverage the knowledge
that is hidden in such extensive collections, research has devised approaches for
model collection management [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] such as process similarity search and querying,
process model merging, compliance checking, model comparison and clone
detection. Many of those approaches require the identi cation of correspondences, i.e.,
activities that represent similar functionality in di erent models. Yet, this step
is generally associated with huge manual e orts and research has thus developed
process model matchers to automate it. But, comparative evaluations in terms
      </p>
      <p>Review
Literature
Phase I</p>
      <p>Develop
Propositions</p>
      <p>Model
Collections &amp;
Gold Standards</p>
      <p>Develop
Techniques</p>
      <p>Assess</p>
      <p>Effectiveness
Matching
Propositions</p>
      <p>Technique
Candidates</p>
      <p>Matching</p>
      <p>Techniques</p>
      <p>Phase II</p>
      <p>
        Fig. 1. Research design
of the process model matching contests in 2013 and 2015 [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ] demonstrate that
state-of-the-art matchers achieve a low e ectiveness when applied to real-world
model collections. That is, the matchers detect only a few of the existing
correspondences, while making many irrelevant suggestions. In this context, the thesis
studies how the e ectiveness can be improved to achieve practical applicability.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Research Methodology</title>
      <p>
        The research methodology that the thesis is based on is oriented towards the
information systems research framework [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. That is, the research design (shown
in Figure 1) is driven by the need for e ective matchers in real-world scenarios
and aims to contribute new ndings to the scienti c knowledge base.
      </p>
      <p>
        The rst phase deals with a literature review which follows guidelines for
literature reviews in information systems research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The goal is to examine
prior work and to identify the shortcomings that the thesis must address.
      </p>
      <p>The second phase attends to the development of matching techniques under
the consideration of the shortcomings. The activities in this phase are carried out
iteratively. Each iteration starts with the development of matching propositions
where design decisions are examined with regard to their validity. Valid decisions
are then considered as matching propositions. In the development of technique
candidates the propositions are integrated into matching algorithms in di erent
ways. Lastly, the e ectiveness is assessed and those candidates that achieve a
high e ectiveness are proposed as matching techniques. It is important to note
that due to this research approach, matching techniques are not treated as black
boxes, i.e., as the sum of their parts. Instead, the design decisions are individually
examined. Thus, the thesis does not only deliver matching techniques, but a
catalogue of (in-)validated design decisions that future work can build upon.</p>
      <p>
        The respective experiments rely on empirical datasets that contain model
collections, i.e., sets of model pairs, and the respective gold standards which
comprise manually identi ed correspondences. There are two types of datasets.
Development datasets are used in each iteration to evaluate the design decisions
and the e ectiveness. By contrast, evaluation datasets are exclusively applied in
a nal iteration to examine the generalizability of the ndings. In total, there
are four datasets, each containing 36 model pairs. While the birth registration
(BR) and the university admission (UA) dataset from the matching contest in
2013 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are used for development, the SAP reference model (SR) and the alma
web (AW) dataset are the evaluation datasets. SR and AW were created in the
context of this thesis following the procedure that was applied to establish BR
and UA. SR was made available to the second matching contest [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In the experiments a variety of research methods is employed to assess the
design decisions and the e ectiveness of the matchers. Statistical measures such
as the Kolmogorov-Smirnov test [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the information gain [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], or the correlation
coe cient are applied to quantitatively examine cause-e ect relationship
associated with the design decisions. If a deeper understanding of the relationships
is required, categorizing qualitative analyses [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are carried out. Additionally,
well-established measures from information retrieval [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the matching
contests are used to evaluate the e ectiveness based on a comparison of the gold
standards and the matchers' suggestions. These measures are (i) the precision
(P) as the percentage of correct suggestions; (ii) the recall (R) as the percentage
of identi ed correspondences; and (iii) the f-measure as their harmonic mean.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Hypotheses and Evidence</title>
      <p>The results of the thesis are summarized in four hypotheses. In the following,
the hypotheses are outlined along with the respective evidence from the thesis.</p>
      <p>H1: The identi cation of correspondences between business process models
is a challenge for organizations which is not su ciently supported by
existing approaches.</p>
      <p>This hypothesis addresses the practical relevance which was already outlined
in Section 1 where various problems related to model collection management
are listed. Moreover, the scienti c demand for the development of more e ective
matchers was veri ed through the literature review. A total of 17 publications
that introduced matchers were identi ed and examined. In nine publications the
applicability of matchers was restricted by assumptions that e.g., limit the
supported modeling languages, that impose requirements on the labeling vocabulary,
or that exclude situations where sets of activities correspond. Further, the
matchers' e ectiveness is low and varies across datasets. Finally, prior work focused on
illustrating the matchers' usage or on black-box evaluations that examined the
e ectiveness, but not the validity of design decisions. These observations con rm
the need for further research aimed at improving the e ectiveness.</p>
      <p>
        The remaining hypotheses address di erent sources of information for
matching techniques. Each perspective is associated with its own matcher. Table 1
shows the e ectiveness of these matchers and of the top-performing matchers
from the matching contests [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. Note that matchers from this thesis were
excluded and that for each dataset a di erent matcher achieved the best results.
      </p>
      <p>H2: Label-based matching techniques yield a varying and generally insu
cient e ectiveness.</p>
      <p>Various natural language analysis techniques related to (i) the decomposition
of labels into words, (ii) the syntactic and semantic comparison of words, and (iii)
the reduction of di erences in label speci city were discussed and analyzed. The
results were then used to devise BOT, a label-based matcher that can be adjusted
to the model collection via ve parameters. BOT comprises a default con
guration that performs nearly as good as the three best performing matchers from
the matching contests. Finally, a qualitative analysis of BOT's results reveals
that many falsely suggested correspondences are due to only small variations in
the labels, e.g., when two labels only di er with regard to one word. Further,
labels of correspondences that are not detected contain synonyms and homonyms
or involve very generic descriptions. This analysis shows that to achieve a high
e ectiveness, domain-speci c knowledge sources are required.</p>
      <p>H3: The maximization of the e ectiveness of label-based matching
techniques is enabled by the analysis of control ow information.</p>
      <p>There are three scenarios for the use of control- ow information. First,
activities are compared with regard to control- ow properties such as their position in
the models. Second, candidates for complex correspondences can be established
by extracting connected sub-graphs from the models. Third, there is the
consistency analysis which assesses the degree to which control- ow dependencies
between activities in one model resemble those of their corresponding
counterparts in a second model. While experiments show that the rst two options are
not generally valid, the third option is identi ed as a promising design decision.
Consequently, OPBOT automatically executes many BOT con gurations and
suggests a combination of the most consistent results. OPBOT generally detects
con gurations that perform better or close to BOT's default con guration.</p>
      <p>H4: The e ectiveness of matching techniques is improved by the utilization
of expert feedback.</p>
      <p>The process of feedback collection is iterative: a pair of models is selected,
then automatically matched, and experts nally correct the suggestions. That
way, a ground truth is established which can be used to adapt matchers and
to prepare them for yet unmatched model pairs. In particular, ADBOT builds
upon OPBOT and contains three strategies to learn from the feedback. First, the
assessment of semantic word relations is adapted to better re ect the
domainspeci c relationships. Second, it relies on transitivity which is a reliable indicator,
i.e., two activities are matched, if they correspond to the same activity. Third,
OPBOT's BOT con gurations are constantly re-adjusted. ADBOT outperforms
the automated matchers from the contests by up to 70%. To reduce the expert
workload, ADBOT also sorts the model pairs and feedback collection can be
turned after 30 to 50% of the pairs, as the improvements level o at this point.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The results of the thesis are summarized in the main research hypothesis which
was validated through the veri cation of the four sub-hypotheses H1 to H4:
H0: The adaptation of business process model matching techniques to model
collections is necessary to ensure a high e ectiveness and the analysis
of the control ow as well as of expert feedback provides means to
implement this adaptation.</p>
      <p>An important aspect in future work is the extension of the datasets to cover
further matching scenarios and achieve a broader evaluation. Moreover, more
emphasis should be put on the expert involvement. As experts operate in di
erent contexts, matchers should be adaptive to di erent views. To reduce human
e orts, modeling tools must provide interfaces that allow for easily interpreting
and correcting correspondences. This also includes strategies to derive expert
feedback from other model collection tasks like model merging or querying.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Antunes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakhshandeh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borbinha</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardoso</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dadashnia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Di Francescomarino</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dragoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fettke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hake</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khiat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Klinkmuller,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Kuss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Leopold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Loos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Meilicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Niesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Peus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Schoknecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sheetrit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Sonntag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Thaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Weidlich</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>The process model matching contest 2015</article-title>
          .
          <source>In: EMISA</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. vom Brocke, J.,
          <string-name>
            <surname>Simons</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niehaves</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niehaves</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reimer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plattfaut</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleven</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Reconstructing the giant: On the importance of rigour in documenting the literature search process</article-title>
          .
          <source>In: ECIS</source>
          <year>2009</year>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cayoglu</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dijkman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fettke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
          </string-name>
          a-Ban~uelos, L.,
          <string-name>
            <surname>Hake</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Klinkmuller,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Leopold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Ludwig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Loos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Oberweis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Schoknecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sheetrit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Thaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Ullrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Weidlich</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>The process model matching contest 2013</article-title>
          . In: PMC-MR
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dijkman</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>La</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Reijers</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.A.</surname>
          </string-name>
          :
          <article-title>Managing large collections of business process models { current techniques and challenges</article-title>
          .
          <source>Comput. Ind</source>
          .
          <volume>63</volume>
          (
          <issue>2</issue>
          ),
          <volume>91</volume>
          {
          <fpage>97</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hevner</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>March</surname>
          </string-name>
          , S.T.,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ram</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Design science in information systems research</article-title>
          .
          <source>MIS Quart</source>
          .
          <volume>28</volume>
          (
          <issue>1</issue>
          ),
          <volume>75</volume>
          {
          <fpage>105</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, England (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Massey</given-names>
            <surname>Jr.</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.J.:</surname>
          </string-name>
          <article-title>The kolmogorov-smirnov test for goodness of t</article-title>
          .
          <source>J. Am. Stat. Assoc</source>
          .
          <volume>46</volume>
          (
          <issue>253</issue>
          ),
          <volume>68</volume>
          {
          <fpage>78</fpage>
          (
          <year>1951</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mayring</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Qualitative content analysis</article-title>
          .
          <source>FQS</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ) (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Introduction to Data Mining. Pearson Education Limited</article-title>
          ,
          <string-name>
            <surname>Harlow</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>