Refining Semantically Annotated Business Process Diagrams Hector G. Ceballos, Eduardo Fernandez, Jorge Espinosa ceballos@itesm.mx, yopichoi@gmail.com, arima121@gmail.com Tecnologico de Monterrey Av. Eugenio Garza Sada 2501 Monterrey, N.L., Mexico Abstract. Business Process Diagrams (BPDs) provide a rich graphical expressiveness for modeling not only business processes but also the de- velopment of collective human activities. In Multiagent platforms, BPDs has been used for decoupling the modeling of agent behavior from its im- plementation. Additionally, the annotation of BPD actions and events with conjunctive queries has opened the door for monitoring process de- velopment by querying a knowledge base. We study the properties of this annotation scheme for the refinement of activity diagrams through the removal of redundant nodes, the detection of disjoint alternative paths and the merge/split of event nodes. Keywords: Business Process Diagrams, Semantic Annotation, Descrip- tion Logics. 1 Introduction BPMN is a standard notation for modeling business processes widely adopted by industry that provides a rich graphical representation that can be used for common understanding of processes [13]. Furthermore, BPMN has been used for process automation with support of agent technologies given its direct correspon- dence with some MAS architectures [11, 9]. In these approaches the specification of agent behavior is decoupled from the implementation of agent actions (de- noted as task nodes). BPMN BPDs has been also used for modeling collective human activities, observable through the effects of human actions [2]. On the other hand, the use of conjunctive queries has been previously pro- posed for describing the meaning of events and actions [6, 3]. In [6], actions are described by their direct effects and these annotations are used for determining the status of a process instance. In [3], each possible outcome of a random vari- able representing an event or an action is annotated with a conjunctive query in order to enable process monitoring. Muehlen and Indulska performed a representational analysis of modeling lan- guages for business processes and business rules, finding that the most complete combination is BPMN and SWRL [12]. Rules in the Semantic Web Rule Lan- guage (SWRL) are constituted by conjunctive queries [7]. 11 ___ Copyright © 2016 for this paper by its authors. Copying permitted for private and academic purposes. 2 H. Ceballos, E. Fernandez, J. Espinosa This paper is organized as follows. We introduce basic notions of BPMN Busi- ness Process Diagrams and Conjunctive Queries in section 2. Then, in section 3 we introduce semantic descriptors on a BPD normal form, describe the proper- ties provided by the use of conjunctive queries and show how these properties can be used for refining a BPD by detecting disjoint alternative paths, delet- ing redundant nodes, and merging/splitting nodes. We conclude with closing remarks and future work in section 4. 2 Background Next we introduce the formal notions used for introducing conjunctive queries as semantic descriptors on top of BPMN BPDs. 2.1 Business Process Diagrams Business Process Modeling and Notation (BPMN) provides a formal represen- tation of Business Process Diagrams (BPDs), which basically describe a process in terms of events and tasks connected through control flows that indicate valid sequences in the process development. Gateways are special nodes connected through control flows that indicate whether the process develops in parallel (AND), alternatively (XOR) or optionally (OR). The beginning of the process is denoted by an initial event node and its conclusion by a set of end event nodes. Figure 1 shows a subset of the graphical notation of the BPMN 2.0 specifi- cation [13]. These graphical elements are used in approaches for software engi- neering [14, 4] and for modeling human activities [2] through BPDs. Fig. 1. BPMN graphical notation. Industry has adopted XML file formats for authoring and exchanging BPDs such as XML Process Definition Language (XPDL)[15] and BPMN 2.0[13]. In 12 Refining Semantically Annotated Business Process Diagrams 3 them, nodes representing tasks, events, gateways and control flows are repre- sented through unique identifiers along with properties and relationships with other nodes. 2.2 DL Conjunctive Queries We use the notation and properties of conjunctive queries given by Description Logics (DL) [1], which states that the interpretation of a query is not only given by statements specified in the query, but by constraints and definitions specified in the domain model (T ) as well. This model, known as ontology, is basically constituted by a set of concepts (used to group/classify objects) and relations (used to specify entity’s attributes or used to relate entities among them). Ontology languages such as OWL1 allows expressing concepts in terms of other concepts (definitions), and specifying constraints between concepts (e.g. disjointness or subsumption) and properties (e.g. reflexivity or transitivity). A DL conjunctive query (CQ) has the form: Q = (s1 , ...sn ).{T1 , ..., Tm } where si is the set of distinguished variables, denoted Dis(Q), that define the resulting binding sets (the information retrieved), and Ti is a finite set of ei- ther concept clauses (s rdf:type C) or relation clauses (s r s’), where s, s0 ∈ (NV ∪ NC ), C is a concept/class and r is a relation/property both defined in T , NV is a finite set of variables denoted V ar(Q), and NC is a finite set of constants. In SPARQL, the query language for RDF[5], a conjunctive query is coded as a SELECT query without filters. Distinguished variables (Dis(Q)) of a conjunctive query Q are listed in the section SELECT, whereas atoms Ti are specified in the section WHERE as statements or subject-predicate-object triplets. Query containment between conjunctive queries can be decided automati- cally observing the constraints and definitions given in T , as proposed by [8]. A query Q1 is contained in a query Q2 with respect to T (written T |= Q1 v Q2 ), if and only if, for every model I of T , Q1 (I) v Q2 (I). In other words, Q1 v Q2 and K |= Q1 implies that K |= Q2 also, i.e. the answer to Q1 will be included in the answer to Q2 . By definition, every well-formed query Q is subsumed by >, i.e. T |= Q v >. Queries Q1 and Q2 are equivalent if Q1 v Q2 and Q2 v Q1 , denoted Q1 ≡ Q2 . Q1 and Q2 are considered disjoint if K becomes inconsistent whenever K |= Q1 u Q2 for any replacement of common variables in both queries, also denoted by Q1 u Q2 v ⊥. By definition, if Q1 v Q2 then Q1 and Q2 are not disjoint. Merge of two queries Q1 and Q2 is given by its intersection and it is denoted by Q1 u Q2 , whereas the opposite operation is called split. 1 OWL Web Ontology Language. https://www.w3.org/TR/owl-features/ 13 4 H. Ceballos, E. Fernandez, J. Espinosa 3 Semantic Description of Business Process Diagrams Now we introduce semantic descriptors for describing the meaning of lanes, ac- tions and events, provide some basic properties for annotated nodes and show how these can be used for refining a BPD. 3.1 Semantic Descriptors In this paper we use the formalization of BPDs and the normal form proposed in [2]. In this approach, the BPD has the purpose of illustrating alternative sequences of human actions performed by activity participants, mediated by in- termediate events that the subject or other participants can observe. XOR gate- ways are used for representing disjoint alternatives. Activity development has a triggering condition (initial event) and a set of successful or failure outcomes (end events). A Business Process Diagram W is represented by a set of pools (P), lanes (L), nodes (N) and control flows (F). Nodes (N) allowed in the diagram are: start events (N S ), intermediate events (N I ), end events (N E ), human tasks (N A ) and gateways (N G ). A gateway g ∈ N G can be of type Parallel-AND (A), Optional- OR (O), or Exclusive-XOR (X), denoted respectively type(g, {A, O, X}). Uncon- ditional sequence flows are denoted as F (ni , nj ) ∈ F, ni → nj for short, where ni , nj ∈ N . All nodes n ∈ N are situated in a lane l ∈ L, denoted in(n, l). W = {P, L, N, F} N = NS ∪ NI ∪ NE ∪ NA ∪ NG Definition 1 (Semantic Descriptor). A semantic descriptor Ann(n, Q) uses the conjunctive query Q for representing the meaning of a lane, an observable event, or human task n ∈ (L ∪ N S ∪ N I ∪ N E ∪ N A ) in a BPD W. A lane descriptor describes the kind of individual that play a role in the activ- ity. It has the form Ann(l, Ql ), where l ∈ L and Ql might represent an absolute or relative role. An absolute role annotation is given by a Ql = (?role).{?role a RoleClass} where RoleClass indicates the type of individual, denoted by ?role, associated to l. A relative role annotation is given by a Ql = (?role).{?role rel ?role2} where the role associated to the lane (?role) is defined in terms of its relation- ship (rel) with a participant represented by another lane (?role2). An event descriptor has the form Ann(z, Qz ), where z ∈ (N S ∪ N I ∪ N E ) and Qz is a conjunctive query describing a condition (constraints between indi- viduals) that denotes the occurrence of the event. A task descriptor has the form Ann(x, Qx ), where x ∈ N A , Qx = (?act).{?act a T askClass . ?act doneBy ?role . ?act propi ?valuei }, 14 Refining Semantically Annotated Business Process Diagrams 5 TaskClass indicates the type of task performed or initiated by ?role, and the task description is expressed by statements such as ?act propi ?valuei . According to Activity Theory[10], task description might include the participation of other agents playing a role in the activity, the use of artifacts and the location where the task occurs. The distinguished variable ?act denotes the task execution (the action in act). Definition 2 (Annotated BPD). An annotated BPD WD is a BPD W where each node n ∈ (L ∪ N S ∪ N I ∪ N E ∪ N A ) is annotated with exactly one semantic descriptor Ann(n, Q) ∈ D. 3.2 Properties of Annotated BPDs The use of semantic descriptors allows to verify automatically if the meaning of a BPD node is equivalent to, disjoint with, or subsumed by another BPD node. Definition 3 (Node Disjointness). The BPD node n1 is disjoint with the BPD node n2 , denoted n1 ⊥n2 , if and only if Q1 u Q2 v ⊥ with respect to (w.r.t.) T , given Ann(n1 , Q1 ) and Ann(n2 , Q2 ). Definition 4 (Node Subsumption). The BPD node n1 is subsumed by the BPD node n2 , denoted n1 v n2 , if Q1 v Q2 and Q2 6v Q1 w.r.t. T , given Ann(n1 , Q1 ) and Ann(n2 , Q2 ). Definition 5 (Node Equivalence). The BPD node n1 is equivalent to the BPD node n2 , denoted n1 ≡ n2 , if and only if Q1 ≡ Q2 w.r.t. T , given Ann(n1 , Q1 ) and Ann(n2 , Q2 ). These definitions can be used for detecting redundant nodes, i.e. consecutive event nodes annotated with equivalent descriptions. Theorem 1 (Node Redundancy). Given two consecutive event nodes z1 , z2 ∈ (N S ∪ N I ∪ N E ), z1 → z2 , such that z2 v z1 , z2 will hold whenever z1 holds, making z2 redundant in the diagram. Proof. During process monitoring, the occurrence of the event z2 will be evalu- ated right after z1 . z2 being subsumed by z1 means that z2 will hold whenever z1 holds, hence checking the occurrence of z2 becomes unnecessary once z1 has occurred. In the example of Figure 2, the event z7.2 is redundant as long as z7.2 v z5.1 , making unnecessary the observation of z7.2 . The following corollary complements Theorem 1. Corollary 1. Given two consecutive event nodes z1 , z2 ∈ (N S ∪ N I ∪ N E ), z1 → z2 , such that z2 ≡ z1 , either z1 or z2 becomes redundant. Proof. Given that z1 will hold whenever z2 does and vice versa, either of the two can be chosen as redundant node. 15 6 H. Ceballos, E. Fernandez, J. Espinosa Fig. 2. Eliminating redundant nodes. Two other properties apply to event nodes preceded by the same gateway. Two BPD events zi and zj are called siblings, denoted as zi ||zj , if both of them are directly preceded by a gateway g ∈ N G , i.e. zi ← g → zj . Determining that all sibling events following to a gateway g are disjoint in- dicates that g must be XOR-Exclusive. Figure 3 shows two alternative scenarios as result of a medical consultation, denoted by two sibling events, z4.2 and z4.3 , with their corresponding descriptors. Ann(z4.2 ) and Ann(z4.3 ) are disjoint as long as the cardinality of the boolean role followUp is constrained to a single value in the concept Prescription, stated in T as: Prescription v ≤ 1 followUp Fig. 3. Detecting XOR gateways. Theorem 2 (Disjoint sibling events). If every pair of sibling events zi ||zj immediately preceded by a splitting gateway g ∈ N G are disjoint to each other (zi ⊥zj ), then g must be an event-driven exclusive (XOR) gateway, i.e. type(g, X). Proof. Once g is reached, process development will continue through g if and only if some event zi occur after g. Being disjoint every pair of sibling events, only one of them can occur at a given time, hence the first that occur will continue process’ monitoring, which is the definition of an event-driven exclusive (XOR) 16 Refining Semantically Annotated Business Process Diagrams 7 gateway. In contrast, g cannot be parallel (AND) as long as once that zi holds the condition represented by any of its sibling will not hold anymore. For the same reason g cannot be optional (OR), i.e. more than one sibling cannot hold. On the contrary, overlaps between sibling events following to an exclusive gateway violates process development rules. Theorem 3 (XOR-Equiv overlap). A pair of sibling events zi ||zj immedi- ately preceded by a splitting Exclusive-XOR gateway g ∈ N G such that zi ≡ zj , will provoke a violation of process monitoring after g. Proof. Once g is reached, process development must continue through one and only one path given by the sibling event holding next. zi ≡ zj means that both events will hold, violating the rule imposed by the XOR gateway. Theorem 4. A pair of sibling events zi ||zj immediately preceded by a splitting Exclusive-XOR gateway g ∈ N G such that zi v zj , can potentially provoke a violation of process monitoring after g. Proof. Observing zj means that zi will hold as well, enabling process develop- ment through both branches which constitutes a violation of the splitting XOR gateway rule. Nevertheless, observing zi does not assure that zj will hold as well, in which case the violation would not occur. Semantic description of sibling events following XOR gateways have similar or complementary descriptions as long as they represent alternative scenarios. In these cases, annotations of sibling events have a common sub-expression that can be captured before the splitting gateway. Definition 6 (Common Precondition). The conjunctive query Qg is consid- ered a common precondition for all sibling events zi preceded by g if and only if Qi ≡ Qg u Q0i for every Ann(zi , Qi ) ∈ D. Given that the normal form does not allow to annotate gateways, the common precondition must be represented through the introduction of an intermediate event preceding the gateway. Theorem 5. The observation of a precondition Qg , included in a set of disjoint sibling events zi following to the gateway g, can be represented through the in- troduction of an event node zg such that: np → g is replaced by np → zg → g, and D = D ∪ Ann(zg , Qg ) \ Ann(zi , Qi ) ∪ Ann(zi , Q0i ). Proof. Process monitoring requires observing zg before reaching g, and then evaluating every zi . Assuming world state does not change between the obser- vation of zg and the observation of zi , the observation of Q0i simultaneously to Qg is by definition equivalent to observing Qi . 17 8 H. Ceballos, E. Fernandez, J. Espinosa Fig. 4. Adding a common precondition. In both alternatives shown in Figure 3 the doctor prescribes medication, but in z4.3 the patient requires follow up whereas in z4.2 he does not. Figure 4 shows an equivalent representation where the event z4.2 3 represents the common condi- tion, and original events are replaced by the necessity of a follow up appointment (z4.3B ) or not (z4.2B ). Properties provided by semantic descriptors also permits to validate if the workflow can be monitored properly. For instance, Theorem 2 provides a rule for checking the correspondence between XOR gateways and disjoint events. Descriptors of nodes in the structure shown in Figure 4 must be also checked to prevent monitoring issues. For instance, if doctor’s prescription (evaluated as common precondition) indicates both more medication and a follow-up ap- pointment, and one of the exclusive alternative paths check for more medication whereas the other asks for the follow-up appointment, then monitoring could continue through both paths, but the XOR gateway forces to continue for only one (chosen arbitrarily). Theorem 6. A common precondition zg subsuming at least two alternative events zi immediately following a XOR gateway g, i.e. zi v zg , will provoke a violation of process monitoring after g. Proof. Assuming that all sibling events zi are evaluated immediately after the precondition zg , once that zg holds any zi will hold as well. Process monitoring will follow through the branch of the first zi evaluated, despite another branch could be followed (the one indicated by the other zi v zg ). In the same circumstances but having a Parallel-AND gateway, the precon- dition becomes redundant. Theorem 7. A common precondition zg is redundant in the diagram if it sub- sumes all alternative events zi immediately following a Parallel-AND gateway g, i.e. zg → g → zi and zi v zg . Proof. Given that process monitoring continues through all branches of a split- ting Parallel-AND gateway, the evaluation of zg before every zi is unnecessary as long as zi will necessarily hold once that zg holds. 18 Refining Semantically Annotated Business Process Diagrams 9 3.3 Refining and Validating Annotated BPDs Theorems 1 and 7, and corollary 1 are used for simplifying the BPD by absorbing redundant nodes. Theorem 5 is used for introducing common preconditions. Theorem 2 is used for signaling disjoint alternatives. Finally, theorems 3, 4 and 6 are used for warning about errors on process monitoring. The following procedure can be used for validating and automatically refining the annotated BPD WD : 1. Identify consecutive events z1 → z2 such that z1 , z2 ∈ (N I ∪ N E ), (a) if z1 v z2 then absorb z2 . i. if z2 ∈ N E then N I = N I \ z1 and N E = N E ∪ z1 . (b) if z1 ≡ z2 then absorb z1 . 2. For each splitting gateway g immediately followed by sibling events zi ||zj , (a) If all pairs (zi , zj ) are disjoint (zi ⊥zj ), then set type(g, XOR). (b) If exists a common precondition Qg such that Qi ≡ Qg u Q0i for each descriptor Ann(zi , Qi ) of all sibling events zi , insert an event zg → g, D = D ∪ Ann(zg , Qg ) \ Ann(zi , Qi ) ∪ Ann(zi , Q0i ). (c) If type(g, XOR) and zi ≡ zj , then warn: Monitoring violation after g. (d) If type(g, XOR) and zi v zj , then warn: Potential monitoring violation after g. 3. For each splitting gateway g preceded by an event zg and followed by two or more sibling events zi , (a) If type(g, AN D) and zi v zg for all zi , then absorb zg . (b) If type(g, XOR) and zi v zg for at least two zi , then warn: Monitoring violation after g. In this procedure, the absorption of a node ni means that all the arcs nj → ni must be replaced by arcs nj → nk for each nk given by every arc ni → nk in the diagram. Then node ni as well as all its incoming and outgoing arcs are removed from the diagram. Given that all refinement strategies are local, the complexity of the previous procedure is bounded by the computation of query containment between anno- tations of sibling event nodes. The identification of a common precondition for a set of sibling events zi requires checking if exists a common subset of triplets Q0i in every Qi . As long as query containment is not a standard reasoning service provided by state-of-the-art triplestores or DL reasoners it would be necessary to develop a tool that efficiently decide query containment among a set of CQs. Such a service would decide Q1 v Q2 by transforming Q2 ’s variables into symbols, asserting the resulting statements in a graph and asking Q1 to it (using the desired reasoning level). 4 Conclusions We introduced a taxonomy of semantic descriptors for annotating lanes, events and tasks in BPMN BPDs. The proposed notation not only provides a formal 19 10 H. Ceballos, E. Fernandez, J. Espinosa description of these elements but it can be also used for mapping the business process specification to its implementation in an agent-based software. We demonstrated how using conjunctive queries as semantic descriptors can help to detect relationships between BPMN BPD events. These relationships are then used for refining process specifications by detecting disjoint alternative paths, deleting redundant nodes, and merging/splitting nodes. We also illustrated why human actions should not only be represented through their immediate effects, but breaking them down helps Multiagent System.to de- termine whether the activity takes place according to how it was modeled. Semantic descriptors can be further used for determining common events or actions across diagrams, enabling the composition of BPDs. Diagram com- position would enable to introduce the execution of MAS protocols in human activities, both modeled through BPDs. Semantic descriptors can be also fur- ther used for monitoring process/activity development by inquiring a RDF triple store representing world state. References 1. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applica- tions. Cambridge University Press, New York, NY, USA (2003) 2. Ceballos, H.G., Flores-Solorio, V., Garcia, J.P.: A Probabilistic BPMN Normal Form to Model and Advise Human Activities. In: Baldoni, M., Baresi, L., Dastani, M. (eds.) Engineering Multi-Agent Systems: Third International Workshop, EMAS 2015, Istanbul, Turkey. LNAI 9318 (2015) 3. Ceballos, H.G., Garcia-Vazquez, J.P., Brena, R.: Using activity theory and causal diagrams for designing multiagent systems that assist human activities. In: Pro- ceedings of the 12th Mexican International Conference on Artificial Intelligence, MICAI 2013. pp. 185–198. Mexico City (Nov 2013) 4. Endert, H., Kuster, T., Hirsch, B., Albayrak, S.: Mapping BPMN to agents: An analysis. In: Agent, Web Services, and Ontologies Integrated Methodologies - In- ternational Workshop MALLOW-AWESOME 2007. pp. 43–58 (2007) 5. group, W.S.W.: SPARQL 1.1 Overview (2013), http://www.w3.org/TR/ sparql11-overview/ 6. Hinge, K., Ghosey, A., Koliadisz, G.: Process seer: A tool for semantic effect anno- tation of business process models. In: Proceedings of 13th IEEE International En- terprise Distributed Object Computing Conference, EDOC 2009. pp. 54–63 (2009) 7. Horrocks, I., Patel-Schneider, P., Boley, H., Tabet, S., Grosof, B., Dean, M.: SWRL: A semantic web rule language combining OWL and RuleML. W3C Member Sub- mission 21 May 2004 8. Horrocks, I., Sattler, U., Tessaris, S., Tobies, S.: How to decide query containment under constraints using a description logic. In: Logic for Programming and Au- tomated Reasoning, 7th International Conference, LPAR 2000, Reunion Island, France, November 11-12, 2000, Proceedings. pp. 326–343 (2000) 9. Kuster, T., Lutzenberger, M., Albayrak, S.: A Formal Description of a Mapping from Business Processes to Agents. In: Baldoni, M., Baresi, L., Dastani, M. (eds.) Engineering Multi-Agent Systems: Third International Workshop, EMAS 2015, Istanbul, Turkey. LNAI 9318 (2015) 20 Refining Semantically Annotated Business Process Diagrams 11 10. Leont’ev, A.: Activity, Consciousness, and Personality. Prentice-Hall, Inc. (1978) 11. Mitakides, N., Delias, P., Spanoudakis, N.: Validating Requirements Using Gaia Analysis Models. In: Baldoni, M., Baresi, L., Dastani, M. (eds.) Engineering Multi- Agent Systems: Third International Workshop, EMAS 2015, Istanbul, Turkey. LNAI 9318 (2015) 12. zur Muehlen, M., Indulska, M.: Modeling languages for business processes and business rules: A representational analysis. Information Systems 35(4), 379–390 (2010) 13. OMG: Business Process Model and Notation (BPMN), Version 2.0 (January 2011), http://www.omg.org/spec/BPMN/2.0 14. Ouyang, C., van der Aalst, W., Dumas, M., Hofstede, A.: Translating BPMN to BPEL. BPM Center Report BPM-06-02, BPMcenter.org (2006) 15. Shapiro, R., Gagne, D.: XML Process Definition Language (XPDL). Tech. Rep. WFMC-TC-025, Workflow Management Coallition (8 2012), http://www.omg. org/spec/BPMN/2.0/ 21