<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ciently Querying Business Process Models with BeehiveZ</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tao Jin</string-name>
          <email>jint05@mails.thu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jianmin Wang</string-name>
          <email>jimwang@tsinghua.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lijie Wen</string-name>
          <email>wenlj00@mails.thu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Technology, Tsinghua University</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Key Laboratory for Information System Security, Ministry of Education</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Software, Tsinghua University</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The technology of business process management is being more widely used, and there are more and more business process models. In this demonstration, we show how to query a large number of models e ciently with BeehiveZ. Four types of queries are all supported in BeehiveZ, i.e. exact query based on structure, similarity query based on structure, exact query based on behavior, similarity query based on behavior. BeehiveZ provides di erent indexes to facilitate the e cient query processing of di erent types of queries. To make BeehiveZ more applicable, label similarity is considered.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>With the technology of business process management being more widely used,
there are more and more business process models. For example, there are more
than 6,000 models in Suncorp, an Australian bank and insurance company, and
there are more than 200,000 models in China CNR Corporation Limited. How
to query such a large number of models e ciently is challenging. For example,
before a designer creates a new process model, if the designer can retrieve the
related models and work on them instead of starting from scratch, it would save
a lot of time and is less error-prone. Another example is that, before China CNR
came into being there were more than 20 subsidiary companies, when these small
companies were merged into a bigger one, the similar business processes needed
to be integrated, so similarity query is needed.</p>
      <p>Since a business process can be represented as topologically di erent graphs,
behavior is their essential characteristic. Therefore, besides the query based on
structure the query based on behavior is also needed. In some cases, we cannot
nd the exact matches according to the user's requirement, but we can nd some
su ciently similar ones. Therefore, apart from the exact query the similarity
query is also needed. Based on these two orthogonal dimensions (i.e. (structure,
behavior) and (exact, similar)), we can classify the queries into four types: (1)
exact query based on structure, (2) similarity query based on structure, (3) exact
query based on behavior, (4) similarity query based on behavior.</p>
      <p>
        There have already been several research prototypes on business process
model query. BPMN-Q[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a graph-based query language, which extends the
notations of BPMN with some new elements (e.g. variable node). WISE[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a
work ow information search engine, where work ow models are represented
hierarchical. It takes keywords from users as input and returns synthesized work ow
hierarchies containing the matched keywords. The results contain only minimal
views of relevant work ow hierarchies. The distinguishing feature of VisTrails[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
is its capability to leverage provenance information of work ows. It allows users
to query work ows by example and to re ne work ows by analogies. BP-QL[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is
based on an abstraction of the BPEL standard for distributed environment and
supports query by example. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], feature-based similarity estimation is used
to improve the e ciency of similarity search. However, all the above work only
deals with the queries based on structure, while all the four types of queries are
supported in BeehiveZ. Moreover, since the number of models is large, query
e ciency is very important, BeehiveZ provides di erent indexes to speed up the
query processing. All these indexes are inverted indexes, which set up the
mapping between the indexed items and the models. The indexed items are di erent
according to di erent types of queries. These indexes can be used as lters to
discard many models that are impossible to be resulting models, so that the query
e ciency can be improved. Our contributions can be summarized as follows.
{ All four types of process model queries are supported in BeehiveZ, and
indexes are used to improve query e ciency, as presented in Section 3.
{ Since di erent business analysts would use di erent labels to describe
identical business activities, label similarity is considered in BeehiveZ so that
two tasks having the label similarity not less than a speci ed threshold are
regarded as identical, as presented in Section 2.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Implementation of BeehiveZ</title>
      <p>
        The BeehiveZ system is implemented in Java, and it can be found at http:
//code.google.com/p/beehivez/. There are many functions in BeehiveZ
system. Here we only focus on the query function of BeehiveZ. All the models are
stored as Text in RDBMS(e.g. MySQL). There are many di erent notations to
capture business processes, to deal with the models with di erent formats in a
uniform way, we assume that all the models in the repository are represented as
or transformed to Petri nets. All the indexes used in this paper are managed by
Lucene1 or stored in B+ tree (only the path indexes are stored in B+ trees). We
use the source code from ProM[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to represent and display Petri nets.
      </p>
      <p>Since di erent analysts would use di erent labels for identical business
activities, label similarity is considered. Two tasks having the label similarity not
less than a speci c threshold are regarded as identical. Whether to enable label
similarity and the label similarity threshold both can be con gured by users</p>
      <sec id="sec-2-1">
        <title>1 http://lucene.apache.org/</title>
        <p>during query time. To process queries e ciently when label similarity is
considered, the retrieval of similar labels should be e cient, so a label index is used in
BeehiveZ. The label index sets up the mapping between words and labels where
the investigated word appears.</p>
        <p>
          Let W (l) be the number of words can be extracted from the label string
l. Let SCW (l1; l2) be the number of words in the label l1 whose synonymous
can be found in the words of the label l2. In BeehiveZ, the similarity between
two labels is calculated using Equation 1, which is similar to Dices Coe cient[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
except that the synonymous are considered now.
        </p>
        <p>labelSim(l1; l2) =
2</p>
        <p>
          SCW (l1; l2) :
W (l1) + W (l2)
(1)
Note that before this equation is used, all the words must be extracted from the
labels, and converted into lower case. Stop words (e.g. a, the) must be removed
and the remaining words are replaced with their stem. Of course, we can replace
the Equation 1 with other term based similarity measures. To retrieve synonyms
quickly, we mapped WordNet2 synonyms into memory, which consumed
approximately 10MB. During the query processing, BeehiveZ will expand queries with
the synonymous labels existing in the repository with the help of the label index.
More details of how label index is constructed and used can be found in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Querying with BeehiveZ</title>
      <p>
        The four types of business process model queries are all supported in BeehiveZ.
We will discuss them seperately. Fig. 1 shows the screenshot of querying business
process models with BeehiveZ. A model example is input. The user can choose
di erent indexes and con gure model similarity threshold for similarity queries.
The results are shown in a list and they can be viewed in a viewer.
Exact query based on structure Before a designer creates a new business
process model, s/he can draw a small part and then use it as a query to retrieve
the models containing the given query condition model as a subgraph. Since
subgraph isomorphism algorithm is NP-complete, we cannot scan all the models
in the repository sequentially. Path indexes (e.g. LengthOnePathIndex ) can be
used in BeehiveZ, and they can be used as a lter to discard many models that
are impossible to be the resulting models and obtain a set of candidate models
whose size is always much smaller than the size of the repository. More details of
our path indexes can be found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Since we only need to use subgraph
isomorphism algorithm to check whether the candidate models contain the given query
condition model as a subgraph, the query e ciency can be improved greatly. For
example, if we use the model in Fig. 3(a) as a query on a repository containing
the models in Fig. 2, we will get pn3 in Fig. 2 as the resulting model.
      </p>
      <sec id="sec-3-1">
        <title>2 http://wordnet.princeton.edu/</title>
        <p>
          Similarity query based on structure In some cases, we cannot nd a model
through the exact query based on structure, but if we relax the query we can nd
su ciently similar ones. In other words, we can get some models containing a
subgraph su ciently similar to the given query condition model. There are many
methods to compute the similarity between graphs based on their structure such
as the method based on graph edit distance, maximum common subgraph. In
BeehiveZ, we use the maximum common edge subgraph (MCES) based
similarity, which is widely used. MCES-based similarity is superior to graph edit based
similarity in that no particular edit operations together with their costs need to
be de ned. Since the computation of MCES-based similarity is NP-hard, we use
a ltering-veri cation framework to reduce the number of times of MCES-based
similarity computation. In the ltering stage, an index based on task edges (i.e.
TaskEdgeLuceneIndex ) can be used. For example, if we use the model in Fig. 3(b)
as a query on the repository containing the models in Fig. 2, we cannot nd a
model containing this query as a subgraph, but with the MCES-based similarity
threshold con gured as 0.6, we can get pn2 as the resulting model.
Exact query based on behavior The behavior characteristic is the essential
characteristic of business process models. When we query the repository, we
often want to obtain some models satisfying the given behavior requirements. For
example, nd the models in which task \A" can be executed parallelly with \B".
To process this type of queries, BeehiveZ provide a language for users to describe
the requirements. To compute the ordering relations between tasks e ciently, we
use the unfolding technology[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] instead of the methods based on reachability
graph so that the problem of state-explosion can be solved. To improve the
e ciency of query processing, an index based on the ordering relations between
tasks (i.e. TaskRelationIndex ) is used. More details of our language and task
relation index can be found in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For example, if we use the string in Fig. 3(c)
as a query on the repository containing the models in Fig. 2, we can get pn1 as
the resulting model.
        </p>
        <p>
          Similarity query based on behavior When some small companies are merged
into a bigger one, the similar business processes should be integrated, so the
function of similarity query based on behavior is needed. We measure the similarity
between business process models based on their task adjacency relations[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The
similarity measure based on task adjacency relations has been proved to be a
metric. To compute the task adjacency relations e ciently, we use the unfolding
technology in BeehiveZ. An index based on task adjacency relations (i.e.
TARLuceneIndex ) is used to facilitate the e cient query processing. For example, if
we use the model in Fig. 3(d) as a query with the model similarity threshold
con gured as 1:0 on the repository containing the models in Fig. 2, we can get
pn1 and pn2 as the resulting models.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Demonstration</title>
      <p>What will be shown in the demo? We will demonstrate the query
processing of four types of queries, i.e. exact query based on structure,
similarity query based on structure, exact query based on behavior, similarity query
based on behavior. Videos for the demonstration are available at http://code.
google.com/p/beehivez/downloads/list. Two collections of models are used.
One consists of hundreds of thousands of synthesized models which are generated
by our generator. The other consists of 591 SAP Reference models. Through the
demonstration on the rst collection of models, we can see the query
performance is good with use of indexes. Through the demonstration on the second
collection of models, we can see the query performance is still good with the
label similarity considered. Besides the query demonstration, we will also show
the generators used to generate a large number of models for experiments and
discuss about the indicators to evaluate di erent indexes.</p>
      <p>Signi cance in BPM area. This paper shows that four types of business
process model queries can be processed by BeehiveZ e ciently. It is helpful for
e cient model design, business process model integration and so on.
Acknowledgments. We would like to thank Arthur H.M. ter Hofstede for his
suggestion to improve the writing of the rst half of this paper. The work is
supported by a HGJ project of China (No. 2010ZX01042-002-002-01), the National
Basic Research Program (973 Plan) of China (No. 2009CB320700), the National
High-Tech Development Program (863 Plan) of China (No. 2008AA042301) and
an NSF Project of China (No. 61003099).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Sherif</given-names>
            <surname>Sakr</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ahmed</given-names>
            <surname>Awad</surname>
          </string-name>
          .
          <article-title>A Framework for Querying Graph-based Business Process Models</article-title>
          .
          <source>In WWW</source>
          , pages
          <volume>1297</volume>
          {
          <fpage>1300</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Qihong</given-names>
            <surname>Shao</surname>
          </string-name>
          , Peng Sun, and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>WISE: A Work ow Information Search Engine</article-title>
          . In ICDE, pages
          <volume>1491</volume>
          {
          <fpage>1494</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Eduardo</surname>
          </string-name>
          <string-name>
            <surname>Scheidegger</surname>
          </string-name>
          , Huy T. Vo, David Koop,
          <string-name>
            <given-names>Juliana</given-names>
            <surname>Freire</surname>
          </string-name>
          , and
          <string-name>
            <surname>Claudio</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Silva</surname>
          </string-name>
          . Querying and
          <article-title>Re-Using Work ows with VisTrails</article-title>
          .
          <source>In SIGMOD Conference</source>
          , pages
          <volume>1251</volume>
          {
          <fpage>1254</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Catriel</given-names>
            <surname>Beeri</surname>
          </string-name>
          , Anat Eyal, Simon Kamenkovich, and
          <string-name>
            <given-names>Tova</given-names>
            <surname>Milo</surname>
          </string-name>
          .
          <article-title>Querying Business Processes</article-title>
          .
          <source>In VLDB</source>
          , pages
          <volume>343</volume>
          {
          <fpage>354</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Zhiqiang</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Remco M. Dijkman</surname>
            , and
            <given-names>Paul</given-names>
          </string-name>
          <string-name>
            <surname>Grefen</surname>
          </string-name>
          .
          <article-title>Fast Business Process Similarity Search with Feature-Based Similarity Estimation</article-title>
          .
          <source>In OTM Conferences (1)</source>
          , pages
          <fpage>60</fpage>
          {
          <fpage>77</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Boudewijn</surname>
            <given-names>F. van Dongen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ana Karla A. de Medeiros</surname>
            ,
            <given-names>H. M. W.</given-names>
          </string-name>
          <string-name>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
          </string-name>
          , and
          <string-name>
            <surname>Wil M. P. van der Aalst</surname>
          </string-name>
          .
          <article-title>The ProM Framework: A New Era in Process Mining Tool Support</article-title>
          . In ICATPN, pages
          <volume>444</volume>
          {
          <fpage>454</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>L.R.</given-names>
            <surname>Dice</surname>
          </string-name>
          .
          <article-title>Measures of the Amount of Ecologic Association between Species</article-title>
          . Ecology,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <volume>297</volume>
          {
          <fpage>302</fpage>
          ,
          <year>1945</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Tao</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jianmin</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Lijie</given-names>
            <surname>Wen</surname>
          </string-name>
          .
          <source>Querying Business Process Models Based on Semantics. In DASFAA</source>
          , pages
          <volume>402</volume>
          {
          <fpage>409</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Tao</given-names>
            <surname>Jin</surname>
          </string-name>
          , Jianmin Wang,
          <string-name>
            <surname>Nianhua Wu</surname>
          </string-name>
          , Marcello La Rosa, and
          <string-name>
            <surname>Arthur H. M. ter Hofstede</surname>
          </string-name>
          .
          <article-title>E cient and Accurate Retrieval of Business Process Models through Indexing - (Short Paper)</article-title>
          .
          <source>In OTM Conferences (1)</source>
          , pages
          <fpage>402</fpage>
          {
          <fpage>409</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Javier</surname>
            <given-names>Esparza</given-names>
          </string-name>
          , Stefan Romer, and
          <string-name>
            <given-names>Walter</given-names>
            <surname>Vogler</surname>
          </string-name>
          .
          <article-title>An Improvement of McMillan's Unfolding Algorithm</article-title>
          .
          <source>Formal Methods in System Design</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <volume>285</volume>
          {
          <fpage>310</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Haiping</surname>
            <given-names>Zha</given-names>
          </string-name>
          , Jianmin Wang, Lijie Wen,
          <string-name>
            <given-names>Chaokun</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jiaguang</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>A Work ow Net Similarity Measure Based on Transition Adjacency Relations</article-title>
          . Computers in Industry,
          <volume>61</volume>
          (
          <issue>5</issue>
          ):
          <volume>463</volume>
          {
          <fpage>471</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>