=Paper=
{{Paper
|id=Vol-1372/paper13
|storemode=property
|title=Discovery of Functional Architectures From Event Logs
|pdfUrl=https://ceur-ws.org/Vol-1372/paper13.pdf
|volume=Vol-1372
|dblpUrl=https://dblp.org/rec/conf/apn/WerfK15
}}
==Discovery of Functional Architectures From Event Logs==
Discovery of Functional Architectures From Event Logs Jan Martijn E.M. van der Werf and Erwin Kaats Department of Information and Computing Science Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands j.m.e.m.vanderwerf@uu.nl, e.j.kaats@students.uu.nl Abstract. The functional architecture focuses on decomposing func- tionality into modules that offer certain features. These features require interactions in order to complete their functionality. However, functional architectures typically only focus on the static aspects of the system de- sign. Additional modeling techniques, such as message sequence charts are often used in the early phases of software design to indicate how the software should behave. In this paper we investigate the use of process discovery techniques to discover from these scenarios the internal behavior of individual compo- nents. Based on event logs, this paper presents an approach (1) to derive the information flows between features, (2) identify the internal behav- ior of features, and (3) to discover the order between features within a module. The approach results in a sound workflow model for each mod- ule. We illustrate the approach using a running example of a payment system. 1 Introduction One of the principle tasks of a software architect is to design a software sys- tem [17], i.e., to organize the software elements the system is composed of in sets of structures, to allow reasoning about the system [4]. Many different Archi- tectural Description Languages (ADLs) exist to document software architecture. However, due to the large competitive market in the software product industry, architecture is often neglected in software product organizations [14]. Hence, not many ADLs are used in practice. As experienced in [14], in software product or- ganizations, architects rather use informal architectural models as an instrument of communication and discussion. An important aspect of software architecture is the functionality it offers. To decompose and specify the functionality of software, the authors of [6] in- troduced the Functional Architecture Model (FAM), which offers the desired modeling technique used by many software architects in software product orga- nizations [14]. The FAM separates the functionality into so-called features that are offered by the different modules the system is decomposed into. 228 PNSE’15 – Petri Nets and Software Engineering Features interact with other features via information flows to offer their func- tionality. However, FAM only offers a static view on this interaction, i.e., the information flow only shows possible interactions, but imposes no order on or dependencies between these flows. Thus, to show how functionality is offered by the system, the architect requires additional models. One way is to define scenarios on top of the models, in which the architect can specify which features interact in which order. These scenario then result in event logs, that can be analyzed using process mining techniques [2]. Another source for discovering the possible interactions between features is the use of system execution data [21], mapping events to the (partial) execution of features. In this way, execution data can be used to reconstruct a software architecture. In software product organizations, time to market is often a more important priority than having a properly documented software architecture. Consequently, architecture documentation is often outdated or even missing [9]. Therefore, dis- covering architectural models help such organizations in maintaining their soft- ware products. In this paper, we investigate the possibility to use process mining techniques to discover, the functional architecture of the system from an event log. We thereby focus on three basic questions on the functional architecture: 1. Which features interact? 2. What is the internal behavior of features? 3. What is the order in which features are executed within a module? The first question focuses on the discovery of information flows: given an event log, is it possible to derive which features interact? Next, we investigate whether it is possible to derive the internal behavior of features based on event logs. In other words, we focus on the question how does a feature use its in- formation flows to complete its functionality. The last question deals with the high-level view of the functional architecture. To execute the system’s function- ality, the features within a module are called in a certain order. Can process discovery techniques be used to discover these orders? The remainder of this paper is structured as follows. To illustrate the ap- proach, Sec. 2 presents a running example which we will use throughout the paper. Next, Sec. 3 presents the basic notions used in the paper. Section 4 in- troduces the functional architecture model in more detail, after which in Sec. 5 we will focus on solving the three questions posed in the introduction. Section 6 concludes the paper. 2 Running Example As an running example, consider the Payment System as introduced in [10]. The system consists of three modules, Debtor, Payment and Creditor. The payment module serves as an intermediate between the Debtor and the Creditor. An example of such a payment module is the european SEPA standard. The payment module initiates a transaction, which the debtor needs to accept. If the debtor accepts, the payment is continued, and the creditor is contacted to start the J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 229 transaction. If for some reason the creditor rejects the transaction, the debtor is notified, and the transaction is terminated. Similarly, if the creditor accepts, the payment is passed to the debtor, and finally, the creditor receives the final payment information. As the software evolved into the current system, no precise model exists that specifies the behavior of this system. The system only recorded the order in which the different features of the modules have been called in an event log, as shown in Tbl. 1. Each pair in the table represents the feature and the module to which that feature belongs. For readability, the features and modules are abbreviated in this event log. The system is decomposed into three modules: the Debtor module (X), the Payment (Y) module, and the Creditor (Z). Based on the event log, the software architect finds the following features: – Receive transaction request (A); – Reject transaction (B); – Accept transaction (C); – Cancel transaction (D); – Initiate payment (E); – Send payment details (F); – Archive transaction request (G). – Send transaction request (H); – Reject transaction request (I); – Initiate creditor (J); – Cancel transaction (K); – Initiate payment (O); – Handle payment (M); – Archive transaction (N). – Start transaction (Q); – Handle transaction (S). A H B I G C J Q D K E O S F M Debtor N Creditor Payment Fig. 1. Initial functional architecture model of the running example 230 PNSE’15 – Petri Nets and Software Engineering Case Trace 1 (H, Y), (A, X), (B, X), (G, X), (I, Y), (N, Y) 2 (H, Y), (A, X), (B, X), (I, Y), (G, X), N, Y) 3 (H, Y), (A, X), (B, X), (I, Y), (N, Y), (G, X) 4 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (K, Y), (D, X), (G, X), (N, Y) 5 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (K, Y), (D, X), (N, Y), (G, X) 6 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (K, Y), (N, Y), (D, X), (G, X) 7 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z),( O, Y), (E, X), (F, X), (G, X), (M, Y), (S, Z), (N, Y) 8 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (G, X), (M, Y), (N, Y), (S, Z) 9 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (M, Y), (G, X), (S, Z), (N, Y) 10 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (M, Y), (G, X), (N, Y), (S, Y) 11 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (M, Y), (S, Z), (G, X), (N, Y) 12 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (M, Y), (S, Z), (N, Y), (G, X) 13 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (M, Y), (N, Y), (G, X), (S, Z) 14 (H, Y), (A, X), (C, X), (J, Y), (Q, Z), (S, Z), (O, Y), (E, X), (F, X), (M, Y), (N, Y), (S, Y), (G, X) Table 1. System execution data of the payment system Based on this information, the architect can draw the modules with their features, as shown in Fig. 1. In the remainder of this paper, we investigate a method to use event logs, such as the one shown in Tbl. 1, to complete the diagram and derive a behavioral specification of the system. 3 Preliminaries Let S be a set. The powerset of S is denoted by PpSq “ tS 1 | S 1 Ď Su. We use |S| for the number of elements in S. Two sets U and V are disjoint if U X V “ H. Some set S with relation ď is a partial order, denoted by pS, ďq, iff ď is reflexive, i.e. a ď a for all a P S, antisymmetric, i.e. a ď b and b ď a imply a “ b for all a, b P S, and transitive, i.e. a ď b and b ď c imply a ď c for all a, b, c P S. Given a relation R Ď S ˆ S for some set S, we denote its transitive closure by R` , and the transitive and reflexive closure by R˚ . A bag m over S is a function m : S Ñ IN , where IN “ t0, 1, . . .u denotes the set of natural numbers. We denote e.g. the bag m with an element a occurring once, b occurring three times and c occurring twice by m “ ra, b3 , c2 s. The set of all bags over S is denoted by IN S . Sets can be seen as a special kind of bag where all elements occur only once; we interpret sets in this way whenever we use them in operations on bags. We use ` and ´ for the sum and difference of two bags, and “, ă, ą, ď, ě for the comparison of two bags, which are defined in a standard way. A sequence over S of length n P IN is a function σ : t1, . . . , nu Ñ S. If n ą 0 and σpiq “ ai for i P t1, . . . , nu, we write σ “ xa1 , . . . , an y. The length of a sequence is denoted by |σ|. The sequence of length 0 is called the empty sequence, and is denoted by . The set of all finite sequences over S is denoted by S ˚ . We write a P σ if a 1 ď i ď |σ| exists such that σpiq “ a. Concatenation of two sequences ν, γ P S ˚ , denoted by σ “ ν; γ, is a sequence defined by σ : t1, . . . , |ν| ` |γ|u Ñ S, such that σpiq “ νpiq for 1 ď i ď |ν|, and σpiq “ γpi ´ |ν|q for |ν| ` 1 ď i ď |ν| ` |γ|. A sequence σ can be projected over some set U , denoted by σ|U , and is inductively defined by |U “ , pxay; σq|U “ xay; σ|U if a P U , and pxay; σq|U “ σ|U otherwise. J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 231 Petri Nets A Petri net [16] is a tuple N “ xP, T, F y where (1) P and T are two disjoint sets of places and transitions respectively; and (2) F Ď pP ˆT qYpT ˆP q is a flow relation. The elements from the set P Y T are called the nodes of N . Elements of F are called arcs. Places are depicted as circles, transitions as squares. For each element pn1 , n2 q P F , an arc is drawn from n1 to n2 . Let N “ xP, T, F y be a Petri net. Given a node n P pP Y T q, we define its preset N‚ n “ tn1 | pn1 , nq P F u, and its postset n‚N “ tn1 | pn, n1 q P F u. We liftŤthe notation of preset ‚ ‚ ‚ Ť and ‚postset to sets. Given a set U Ď pP Y T q, N U “ nPU N n and U N “ nPU nN . If the context is clear, we omit the N in the subscript. A marking of N is a bag m P IN P , where mppq denotes the number of tokens in place p P P . If mppq ą 0, place p is called marked in marking m. A Petri net N with corresponding marking m is written as pN, mq and is called a marked Petri net. Given a marked Petri net pN, mq, transition t is enabled, denoted by pN, mqrty, if ‚ t ď m. If transition t is enabled in pN, mq, it can fire, resulting in a new marking m1 , denoted by pN, mqrtypN, m1 q, such that m1 ` ‚ t “ m ` t‚ . We lift the firing of transitions to the firing of sequences in a standard way, i.e., a sequence σ P T ˚ of length n is enabled in pN, mq if markings m0 , . . . , mn exist, such that m “ m0 and pN, mi´1 qrσpiqypN, mi q for all 1 ď i ď n. A marking m1 is reachable from some marking m in N , denoted by pN, mqr˚ypN, m1 q, if a firing sequence σ P T ˚ exists such that pN, mqrσypN, m1 q. A marking m1 is a home marking of pN, mq, if for all markings m2 with pN, mqr˚ypN, m2 q, we have pN, m2 qr˚ypN, m1 q. A special class of Petri nets are the workflow nets [1]. A workflow net is a tuple xP, T, F, i, f y with xP, T, F y a Petri net, (2) i P P is the only place with no incoming transitions, (3) f P P is the only place with no outgoing transitions, i.e., ‚ i “ f ‚ “ H, and (4) all transitions have at least one incoming and one outgoing arc, i.e., ‚ t ‰ H ‰ t‚ for all t P T . Open Petri Nets Within a network of asynchronously communicating systems, messages are passed between the elements within the network. The approach we follow is based on Open Petri nets [5]. Communication in an open Petri net (OPN) is represented by special places, called the interface places. An interface place is either an input place, receiving messages from the outside, or an output place that sends messages to the outside of the OPN. An input place is a place that has only outgoing arcs, and an output place has no incoming arcs. Definition 1. An Open Petri net is defined as an 7-tuple xP, I, O, T, F, i, Ωy where (1) xP Y I Y O, T, F y is a Petri net; (2) P is a set of internal places; (3) I is a set of input places, and ‚ I “ H; (4) O is a set of output places, and O‚ “ H; (5) P , I and O are pairwise disjoint; (6) i P IN P is the initial marking, and (7) Ω Ď IN P is the set of final markings. We call the set I Y O the interface places of the OPN. An OPN is called closed if I “ O “ H. An important behavioral property for OPNs is termination: an OPN should always have the possibility to terminate properly. We identify two termination properties: weak termination and soundness. 232 PNSE’15 – Petri Nets and Software Engineering Definition 2. Let xP, I, O, T, F, i, f y be an OPN. It is weakly terminating, if for every reachable marking of the marked Petri net pxP Y I Y O, T, F y, iq a marking f P Ω can be reached. It is sound, if for every reachable marking of the marked Petri net pxP, T, F y, iq a marking f P Ω can be reached. Communication between OPNs is done via the interface places. Two OPNs can only communicate if the input places of the one are the output places of the other, and vice versa. Definition 3. Two OPNs A and B are composable, denoted by A ‘ B, if and only if pIA X OB q Y pOB X IA q “ pPA Y TA Y IA Y OA q X pPB Y TB Y IB Y OB q. If A and B are composable, they can be composed into a new OPN, denoted by A‘B, with A‘B “ xP, I, O, T, F, i, Ωy where P “ PA YPB YG; I “ pIA YIB qzG; O “ pOA Y OB qzG; T “ TA Y TB ; F “ FA Y FB ; i “ iA ` iB ; and f “ ΩA Y ΩB with G “ pIA X OB q Y pOB X IA q. Event Logs and Behavioral Profiles Although event logs are defined as a tuple consisting of a set of case identifiers, events, and an attribute mapping [2], it is in this paper sufficient to consider an event log, denoted by L, as a set of sequences over some alphabet T , i.e., L Ď T ˚ . Given an event log L, we define the successor relation [20] by a ăL b if a sequence σ P L and 1 ď i ď |σ| exist, such that σpiq “ a and σpi ` 1q “ b. Using the successor relation, we define the behavioral profile pÑc , kc , `c qL as three relations: (1) the causality relation Ñc is defined by a Ñc b iff a ăL b and b ăL a, (2) the concurrency relation kc , which is defined by a kc b iff both a ăL b and b ăL a, and (3) the exclusive relation `c is defined by a `c b iff both a ăL b and b ăL a [20]. If the context is clear, we omit the subscript. Given a marked Petri net pN, mq with N “ xP, T, F y, an event log L Ď T ˚ is called complete with respect to pN, mq iff traces σ1 , σ2 P T ˚ exist such that pN, mqrσ1 ; xa, by; σ2 ypN,) implies a ăL b for all a, b P T . 4 Functional Architectures To model the overview of a system, the modules it consists of, and the features these modules offer, we propose the use of the functional architecture model (FAM). The functional architecture of a system is “an architectural model which represents at a high level the software products major functions from a usage perspective, and specifies the interactions of functions, internally between each other and externally with other products” [6]. It offers modules containing fea- tures. Features of different modules interact via so-called information flows. An example is shown in Fig. 2(a). The FAM contains 1 context module, E, 7 modules, A, B, C, D, X, Y and Z. Modules have features, depicted by the rounded rectangles. For example, module C contains two features, K and L. Be- tween features of different modules, information flows exist, e.g., the information flow pF, q, Lq between modules A and C. J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 233 1 2 O p F q L O p F q L G K G K E A C E A C 3 r 5 3 r t s t s N 4 N u u H v M H v M B D B 4 D X Y X Y Z Z (a) Example Functional Architecture (b) Scenario as an overlay Model Fig. 2. Example Functional Architecture Model and corresponding scenario as overlay Definition 4. A Functional Architecture Model (FAM) is defined as a 6-tuple xM, C, F, h, m, Ñy where – M is a finite set of modules; – C is a finite set of context modules; – F is a finite set of features; – h : M Ñ M is the hierarchy function, such that the transitive closure h ˚ is irreflexive; – m : F Ñ M Y C is a feature map that maps each feature to a module, possibly in the context, and this module does not have any children, i.e. h´1 pmpF qq “ H for all F P F; – Ñ Ď F ˆ Λ ˆ F is the information flow, with Λ the label universe, such that for pA, l, Bq PÑ we have mpAq ‰ mpBq. The labels for the information flows are unique per feature, i.e., pA, l, Bq and pA, l, Cq imply B “ C for all labels l P Λ and pA, l, Bq, pA, l, Cq P Ñ. Although the information flows define the possible interactions between mod- ules, it remains a static overview of the system. Therefore, one can use scenarios on top of the functional architecture, e.g. by creating an overlay, highlighting the information flows that are executed and the order in which they should occur. Formally, we represent a scenario as a partial order. Definition 5. Let F “ xM, C, F, h, m, Ñy be a FAM. A scenario of F is a pair pS, ăq with S ĎÑ, such that pS, ďq with ď “ ă˚ is a partial order. An example is shown in Fig. 2(b). The scenario implied by the overlay can be represented by a partial order induced by pO, p, F q ă pF, q, Lq, pF, q, Lq ă pK, s, Hq, pF, q, Lq ă pK, r, N q, pK, r, N q ă pN, u, Hq, pK, s, Hq ă pH, t, Gq, pN, u, Hq ă pH, t, Gq, pK, r, N q ă pM, v, Hq, and pM, v, Hq ă pH, t, Gq. 234 PNSE’15 – Petri Nets and Software Engineering However, such scenarios are typically not specified. Another important draw- back of such scenarios is their analyzability. Although each scenario can be checked, the consistency between the different scenarios remains a difficult task. Therefore, in the remainder of this paper, we search for a method to derive the behavioral specification as a network of asynchronously communicating systems, given the system execution data produced by the actual system in the form of event logs. 5 Discovery of a Functional Architecture In this section, we study the possibilities process mining [2] offers to generate Petri nets for each of the different modules a system consists of. Event logs describe the order in which features of a system have been executed. Such event logs are system wide. Instead of each module having its own event log, only global sequences exist, i.e., sequences concatenate the executed features over all modules. As FAM only allows features to be contained in a single module, we assume that each feature belongs to exactly one module. Also, FAM prescribes communication to be one-directional, i.e., given two communicating features A and B, we assume that either A sends a message to B, or vice versa, that B sends a message to A, but not both. The behavioral specification of a system is three-fold: (1) communication between modules via their features, (2) the internal behavior within each feature, and (3) the order in which features are called within a module. In this section, we explore all three types of behavioral specification to come to a composed system of asynchronously communicating systems. In the remainder, let L be an event log over a set of features T , and let R : T Ñ M , with M the set of modules, be a function that maps each feature onto the module that contains that feature. 5.1 Communication between Features Communication between modules within a system is asynchronous of nature: messages are sent between features in order to complete their functionality. Within an event log, we need to consider the order in which events or fea- tures occur. For example, given some trace σ, if the resource is different for two subsequent events, i.e., Rpσpiqq ‰ Rpσpi ` 1qq, then this might indicate that the former sends a message to the latter. This is expressed by the communication successor. Definition 6 (Communication successor). Let L Ď T ˚ be an event log. We define the communication successor relation ÎL Ď T ˆ T by A ÎL B iff RpAq ‰ RpBq, σpiq “ A, and σpi ` 1q “ B for some σ P L and 1 ď i ă |σ|. Although at first sight the communication successors seem to work, we need to remember the concurrent nature of asynchronous communication. Consider for example the communication between modules M and N as depicted in Fig. 3, and J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 235 M N A E Case Trace B F 1 A, E, F, G, B, C 2 A, E, F, B, G, C 3 A, E, B, F, G, C 4 A, E, B, F, C, G C G 5 A, E, F, B, C, G 6 A, B, E, F, G, C 7 A, B, E, F, C, G Fig. 3. Modules M and N Table 2. Corresponding event log E F G AÑ` ` B k k k C ` Ð k Table 3. Communication behavioral profile the corresponding allowed sequences in Tbl. 4. We have A Î E, which is indeed the communication as modeled in the composition M ‘ N . However, we also find G Î B, indicating a possible communication between G and B. Listing all communication successors, we get A Î E, G Î B, F Î B, B Î G, G Î C, E Î B, B Î F , F Î C, C Î G, and B Î E. Observe that because of the asynchronous nature of the communication, features B and E are concurrently enabled in Fig. 3. Assuming the event log to be complete, this should become visible in the communication successor relation, as for the normal successor relation on event logs. Definition 7 (Communication behavioral profile). Let L Ď T ˚ be an event log, and ÎL Ď T ˆ T the corresponding communication successor relation. The communication behavioral profile is the 3-tuple pÑc , kc , `c qCom L defined by: – A Ñc B iff A ÎL B and B ÎL A; – A kc B iff both A ÎL B and B ÎL A; and – A `c B iff both A ÎL B and A ÎL B. Calculating the behavioral profile of the communicating transitions using the communication successor relation, results in the communication behavioral profile as shown in Tbl. 3. It shows that B and E are concurrently enabled. Following the behavioral profile, we see that the causal relation of the behavioral profile correctly identifies the feature communication. Using the communication behavioral profile, we can construct the informa- tion flows from an event log as follows. If A Ñ B in the communication behavioral 236 PNSE’15 – Petri Nets and Software Engineering Debtor Payment Creditor A B C D E F G H I J K O MN Q S A Ð` ` ` ` ` ` ` ` B ` Ñ` ` ` ` ` ` ` C ` ` Ñ` ` ` ` ` ` D ` ` ` Ð` ` k ` ` E ` ` ` ` Ð` ` ` ` F ` ` ` ` ` Ñ` ` ` G ` k ` ` ` k k ` k H Ñ` ` ` ` ` ` ` ` I ` Ð` ` ` ` k ` ` J ` ` Ð` ` ` ` Ñ ` K ` ` ` Ñ` ` ` ` Ð O ` ` ` ` Ñ` ` ` Ð M ` ` ` ` ` Ð k ` Ñ N ` ` ` k ` ` k ` k Q ` ` ` ` ` ` ` ` ` Ð` ` ` ` S ` ` ` ` ` ` k ` ` ` ÑÑÐ k Table 4. Communication behavioral profile for the running example profile of the event log, then an information flow pA, x, Bq exists, with x a fresh label. This results in the following translation: Definition 8 (Generated FAM). Let L be an event log, and pÑc , kc , `c qCom L be its communication behavioral profile. Its corresponding functional architecture model xM, C, F, h, m, Ñy is defined by: – M “ RpLq; – C “ H; – F “ T; – h “ H; – m “ R; and – Ñ“ tpA, x, Bq | A Ñc B, and x P Λ a fresh labelu. After constructing the communication behavioral profile for the running ex- ample, shown in Tbl. 4, we can complete the functional architecture model. Based on the given system execution data, we see for example that feature H communicates with feature A, and feature S sends messages to features K and O, and receives messages from feature M . The complete functional architecture of the running example is shown in Fig. 4. 5.2 Internal Behavior of Features As can be seen in the running example, features can send and receive multiple messages. For example, feature S sometimes sends a message to feature K and sometimes to feature O. Therefore, the next step in discovering the functional J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 237 A H B I G C J Q D K E O S F M Debtor N Creditor Payment Fig. 4. Functional architecture model of the running example architecture is to reconstruct the internal behavior of each of the features. For this, we create for each of the features an event log, containing the features that it communicates with. We call this the feature log. Definition 9 (Feature log). Let L Ď T ˚ be an event log, and let F P T be some feature. Let pÑc , kc , `c qCom L be the corresponding communication behav- ioral profile. The feature log LF is defined by LF “ tσ|CpF q | σ P L, F P σu where CpF q “ tA | A Ñc F _ F Ñc Au. Consider for example feature S in the running example. This feature com- municates with features K, O and M , i.e., CpSq “ tK, O, M u. Its feature log is the projection of the log on these features, i.e., LS “ txKy, xO, M yu. On these feature logs, we apply the inductive miner [13], that always returns a sound workflow net. Next, we transform the discovered workflow net into an open Petri net, to visualize the messages sent and received by the feature. This results in a feature net for each of the features present in the event log. Definition 10 (Feature Net). Let L Ď T ˚ be an event log, and let F P T be some feature. Let pÑc , kc , `c qCom L be the corresponding communication behav- ioral profile. The Feature net NF is the OPN xP, I, O, T, F, i, Ωy defined by – P “ P̄ , T “ T̄ , i “ r ī s, Ω “ t r f¯ s u; – I “ tpA´F | A Ñc F u; – O “ tpF ´A | F Ñc Au; – F “ F̄ Ytpt, pF ´A q | t P T, λptq “ A, F Ñc Au YtppA´F , tq | t P T, λptq “ A, A Ñc F qu. where xP̄ , T̄ , F̄ , ī, f¯y is the discovered workflow net. In our running example, each of the 16 features are transformed into a feature net. Most of the features are simple, like for feature H and A, consisting of 238 PNSE’15 – Petri Nets and Software Engineering O C S pS-O pJ-C pS-K K pS-K A H pH-A pH-A Q D pJ-Q pK-D M pM-S Feature S Feature H Feature A Feature J Feature K Fig. 5. Some of the feature nets for the running example a single transition sending a message to A, and receiving a message from H, respectively. A more complex feature net is the net for feature S, which internally decides whether it sends a message to K or to O. Figure 5 depicts some of the feature nets generated using the inductive miner [13]. 5.3 Feature Interaction within Modules Now that each feature has its internal behavior defined by means of a feature net, the next step is to determine the order in which features are executed within each of the modules. As for the features, we first create event logs for each of the modules, by filtering each trace on the features it contains. This results in a module log for each of the modules. Definition 11 (Module Log). Let L Ď T ˚ be an event log. Let M P RngpRq be a module. Let pÑc , kc , `c q be the corresponding communication behavioral profile. The Module log LM is defined by LM “ tσ|tF |RpF q“M u | σ P Lu. Within the running example, we obtain three module logs, one for each of the modules. For example, module Debtor, has module log LDebtor “ txA, B, Gy, xA, C, D, Gy, xA, C, E, F, Gyu, and for Creditor we have LCreditor “ txQ, Sy, Fig. 6. Refinement of a transition by a workflow net J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 239 A H C B I J E D K O Q F M S G N Module Debtor Module Payment Module Creditor Fig. 7. Module nets for the running example xQ, S, Syu. Again applying the inductive miner results in the three workflow nets as depicted in Fig. 7. Notice that, although feature S occurs twice in one of the sequences, the algorithm only adds a single feature S in the resulting workflow model. 5.4 Composition of Feature Nets and Module Nets Last step in the process is to combine the feature nets generated for each of the features with the generated module nets. This results in an open Petri net for each of the modules, defining the interaction between the different modules. In the module net, each feature is represented by a single transition. Next step is to refine each feature by its feature net. For this, we first define the refinement of a transition by a workflow model on open Petri nets, as shown in Fig. 6. This refinement connect each input place of the refined transition with each of the transitions in the postset of the initial place of the workflow, and similarly each output place of the refined transition with each of the transitions in the preset of the final place of the refining workflow. It is straight-forward to prove that if (1) the initial net is sound, (2) each input place of the refined transition is 1-bounded, i.e., it can contain at most one token, and (3) workflow net W is sound, then the refinement yields a sound result. 240 PNSE’15 – Petri Nets and Software Engineering A H pH-A pH-A B I pB-I pB-I C J Q pC-J pC-J pJ-Q pJ-Q S D K pK-D pK-D pS-K pS-K E O pO-E pO-E pS-O pS-Q F M pF-M pF-M pM-S pM-S G N Debtor Payment creditor Fig. 8. Composed nets generated for the running example The result of refining each feature by its feature net is shown in Fig. 8. As features G and N have no feature net defining communication, these transitions are not refined. To verify whether the resulting open Petri nets are a true representation of the system, one can compose the nets into a single Petri net, and execute each of the sequences of the event log of Tbl. 1 on the resulting model, which in this example is possible. Further analyzing the resulting model shows that its only deadlocks are desirable markings: either all modules reach their final place, without any pending tokens, or the Creditor module remains untouched, while the Debtor and Payment module reach their final place. 6 Conclusions Within this paper, we discussed a method to automatically generate a functional architecture model from an event log together with a mapping of each feature to the module that offers that functionality. We showed how the information flows can be derived from the communication behavioral profile. This profile not only identifies the information flow for the static structure of the functional archi- tecture, but additionally offers sufficient information to construct the internal behavior for each of the features, and between the features within a module. Lastly, we showed how to compose feature and module nets into an open Petri net. J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 241 Discovering the interaction between different modules is not new. Tchniques like service mining [3], apply process mining on event logs to discover a process model of how the services are orchestrated. In the approach presented in this paper, we focus on the discovery of the behavior of each of the modules, rather than a complete orchestration. In [15], the authors discover the internal behavior of services based on the interaction between two services, guaranteeing deadlock freedom of the discov- ered service. In the setting of this paper, the exact interaction between modules is unknown, and needs to be discovered first. The core idea of this paper is twofold: firstly to derive the information flows for a Functional Architecture Model, and secondly to derive the internal behavior for each of the modules within the architecture. Within software architecture, this is called Software Architecture Reconstruction [12]. Although some tech- niques take the dynamic aspects of the software operation into account, most techniques only focus on the static aspects of software architecture models, us- ing solely the available source code [8]. For example, system execution data is used to enrich architectures with performance data [11] or to visualize traces on how the software is used [19]. In this paper, we propose a method to not only visualize software usage, but to discover module communication and to generate the internal behavior of modules within a software architecture. Although the approach presented in this paper shows an application of the behavioral profile to discover feature interaction, additional research is required. First, the current approach requires the event log to be complete, i.e., if the log grows, the successor relation should not change. Further, for the generation of the internal feature behavior, we assume that if the sending feature is present in the event log, it enables all possible events, which is possibly a too strict assumption that deserves further investigation. The approach in this paper is very flexible, as we derive individual models for the features and modules. For this, we apply standard process discovery algo- rithms returning sound workflow models. However, their composition in general does not result in a sound system of asynchronously communicating systems. Further research is required to study the conditions under which this can be guaranteed. For this, we want to identify conditions which on the one hand re- sult in correct models, and on the other hand have a positive effect on model quality as described by [7]. Not only does this approach provide useful insights for the software architect, we expect the approach applicable to business process management as well, as for the discovery of separate business processes, the Business Process Modelling and Notation offers the swimlane notion. Therefore, we plan to implement the approach in the Process Mining toolkit ProM [18] to experiment and apply the approach on real-life examples. Acknowledgements The authors would like to thank the anonymous reviewers for their valuable feedback, and Sjaak Brinkkemper, Fabiano Dalpiaz, Garm Lucassen, Leo Pruijt and Erik Jagroep for the fruitfull discussions and valuable input on architecture and software products. 242 PNSE’15 – Petri Nets and Software Engineering References 1. W.M.P. van der Aalst. Verification of workflow nets. In Application and Theory of Petri Nets 1997, volume 1248 of Lecture Notes in Computer Science, pages 407 – 426. Springer, Berlin, 1997. 2. W.M.P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin, 2011. 3. W.M.P. van der Aalst. Challenges in service mining: Record, check, discover. In Web Engineering, volume 7977 of Lecture Notes in Computer Science, pages 1–4. Springer, Berlin, 2013. 4. L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. Series in Software Engineering. Addison Wesley, Reading, MA, USA, 2012. 5. D. Bera, K. M. van Hee, and J.M.E.M. van der Werf. Designing weakly termi- nating ros systems. In Applications and Theory of Petri Nets, (33th International Conference, Petri Nets 2012), volume 7347 of Lecture Notes in Computer Science, pages 328 – 347. Springer, Berlin, 2012. 6. S. Brinkkemper and S. Pachidi. Functional architecture modeling for the software product industry. In ECSA 2010, volume 6285 of Lecture Notes in Computer Science, pages 198 – 213. Springer, Berlin, 2010. 7. J.C.A.M. Buijs, B.F. van Dongen, and W.M.P. van der Aalst. On the role of fitness, precision, generalization and simplicity in process discovery. In On the Move to Meaningful Internet Systems: OTM 2012, volume 7565 of Lecture Notes in Computer Science, pages 305–322. Springer, Berlin, 2012. 8. S. Ducasse and D. Pollet. Software architecture reconstruction: A process-oriented taxonomy. Software Engineering, IEEE Transactions on, 35(4):573–591, July 2009. 9. S.A. Fricker. Software product management. In Software for People, Management for Professionals, pages 53–81. Springer, 2012. 10. K.M. van Hee, N. Sidorova, and J.M.E.M. van der Werf. When can we trust a third party? In Transactions on Petri Nets and Other Models of Concurrency VIII, volume 8100 of Lecture Notes in Computer Science, pages 106–122. Springer, Berlin, 2013. 11. T. Israr, M. Woodside, and G. Franks. Interaction tree algorithms to extract effective architecture and layered performance models from traces. Journal of Sys- tems and Software, 80(4):474 – 492, 2007. Software Performance 5th International Workshop on Software and Performance. 12. R.L. Krikhaar. Software Architecture Reconstruction. PhD thesis, VU Amsterdam, 1999. 13. S.J.J. Leemans, D. Fahland, and W.M.P. van der Aalst. Discovering block- structured process models from event logs - a constructive approach. In Appli- cation and Theory of Petri Nets and Concurrency, volume 7927 of Lecture Notes in Computer Science, pages 311–329. Springer, Berlin, 2013. 14. G. Lucassen, J.M.E.M. van der Werf, and S. Brinkkemper. Alignment of software product management and software architecture with discussion models. In IWSPM 2014, pages 21–30. IEEE, Aug 2014. 15. R. Müller, C. Stahl, W.M.P. van der Aalst, and M Westergaard. Service discovery from observed behavior while guaranteeing deadlock freedom in collaborations. In Service-Oriented Computing, volume 8274 of Lecture Notes in Computer Science, pages 358–373. Springer, Berlin, 2013. 16. W. Reisig. Petri Nets: An Introduction, volume 4 of Monographs in Theoretical Computer Science: An EATCS Series. Springer, Berlin, 1985. J. Van Der Werf, E. Kaats: Discovery of Functional Architectures 243 17. R.N. Taylor, N. Medvidovic, and E.M. Dashofy. Software Architecture: Founda- tions, Theory, and Practice. John Wiley & Sons, 2010. 18. H.M.W. Verbeek, J.C.A.M. Buijs, B.F. van Dongen, and W.M.P. van der Aalst. XES, XESame, and ProM 6. In Information System Evolution, volume 72 of Lecture Notes in Business Information Processing, pages 60–75. Springer, Berlin, 2011. 19. R.J. Walker, G.C. Murphy, J. Steinbok, and M.P. Robillard. Efficient mapping of software system traces to architectural views. In Proceedings of the 2000 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON ’00, pages 12–. IBM Press, 2000. 20. M. Weidlich and J.M.E.M. van der Werf. On profiles and footprints – relational semantics for petri nets. In Applications and Theory of Petri Nets (ICATPN 2012), volume 7347 of Lecture Notes in Computer Science, pages 148 – 167. Springer, Berlin, 2012. 21. J.M.E.M. van der Werf and H.M.W. Verbeek. Online compliance monitoring of service landscapes. In BPM 2014 International Workshops, volume 202 of Lecture Notes in Business Information Processing, pages 89–95. Springer, Berlin, 2015.