StarStar Models: Using Events at Database Level for Process Analysis Alessandro Berti1[0000−0003−1830−4013] and Wil van der Aalst1[0000−0002−0955−6940] Process and Data Science department, Lehrstuhl fur Informatik 9 52074 Aachen, RWTH Aachen University, Germany Abstract. Much time in process mining projects is spent on finding and understanding data sources and extracting the event data needed. As a result, only a fraction of time is spent actually applying techniques to discover, control and predict the business process. Moreover, there is a lack of techniques to display relationships on top of databases without the need to express a complex query to get the required information. In this paper, a novel modeling technique that works on top of databases is presented. This technique is able to show a multigraph representing activities inferred from database events, connected with edges that are annotated with frequency and performance information. The representa- tion may be the entry point to apply advanced process mining techniques that work on classic event logs, as the model provides a simple way to retrieve a classic event log from a specified piece of model. Comparison with similar techniques and an empirical evaluation are provided. Keywords: Process Mining · Database Querying. 1 Introduction This paper introduces StarStar models, a novel way to enable Process Mining on database events that offers the best qualities of competing techniques, provid- ing a model representation without any effort required to the user, and offering drill-down possibilities to get a classic event log. The technique takes into ac- count relational databases, that are often used to support information systems. Events in databases could be logged in several ways, including redo logs and in-table versioning. To retrieve an event log suitable for process mining analysis, a case notion (a view on the data) should be chosen, choosing specific tables and columns to be included in the event log. In order to obtain the view, a SQL query needs to be expressed and this requires a deep knowledge of the process. More- over, this could also take to some performance issues (requiring joins between several tables). Some approaches have been introduced in literature in order to make the retrieval of event logs from databases easier: OpenSLEX meta-models [4] (this solution still requires to specify a case notion), Object-centric models [3] (where a process model is built on top of databases, but from which it’s im- possible to retrieve an event log) and SPARQL query translation [2]. StarStar 60 Fig. 1: Representation of a specific subset of activities in the A2A multigraph of the StarStar model extracted from a Dynamics CRM system as shown by the ProM plug-in. models could be defined as a representation of event data contained in a database composed of several graphs: an event to object graph E2O that aims to represent events and objects extracted from the database and relationships between them; an event to event multigraph E2E that aims to represent directly-follows relation- ships between events in the perspective of some object; an activities multigraph A2A that aims to represent directly-follows relationships between activities in the perspective of some object class. StarStar models are able to display relation- ships between activities without forcing the user to specify a case notion, since different case notions are combined in one succint diagram. The visualization part of a StarStar model is able to show a multigraph between activities (A2A); however, relations in the E2O and E2E multigraphs are important for filtering the model and for performing a projection on the selected case notion. The E2O graph is obtained directly from the data. For the E2E and the A2A multigraphs some algorithms will be introduced in the following sections. A representation of a StarStar model extracted from a Dynamics CRM system could be found in Fig. 1. 2 Approach StarStar models take as input an event log in a database context. In order to provide a definition of this concept (Def. 1), let UO be the universe of objects, UOC be the universe of object classes, UA be the universe of activities, Uattr be 61 the universe of attribute names, Uval be the universe of attribute values. It is possible to define a function class : UO → UOC that associates each object to the corresponding object class. Definition 1 (Event log in a database context) An event log in a database context is a tuple LD = (E, act, attr, EO, ≤) where E ⊆ UE is a set of events, act ∈ E → UA maps events onto activities, attr ∈ E → (Uattr 6→ Uval ) maps events onto a partial function assigning values to some attributes, EO ⊆ E × UO relates events to sets of object references, ≤ ⊆ E × E defines a total order on events. An example attribute of an event e is the timestamp attr(e)(time) which refer to the time the event happened. To project an event log in a database S context to a classic event log, a case notion (a set CD ⊆ P(E) \ ∅ such that x∈CD x = E) needs to be chosen, so events that should belong to the same case can be grouped. The projection function is trivial to define, and further details could be found in [1]. The E2O graph could then be introduced: Definition 2 (E2O graph) Let LD = (E, act, attr, EO, ≤) be an event log in a database context. (E ∪ O, EO ⊆ E × O) is an event to object graph relating events (E) and objects (O). The E2O graph is obtained directly from the data without any transformation. The remaining steps in the construction of a StarStar model are the construction of the E2E multigraph and of the A2A multigraph. Let g : UO → P(UE ), g(o) = {e ∈ UE | (e, o) ∈ EO} be a function that for each object returns the set of events 1 that are related to the object, w : UO → R, w(o) = |g(o)|+1 be the weight of the object defined as the inverse of the cardinality of the set of related events to the given object plus 1, ]k : UO → UE , ]k (o) = e such that e ∈ g(o) ∧ |{e0 ∈ g(o) | e0 ≤ e}| = k for 1 ≤ k ≤ |g(o)| be a function that in the totally ordered set g(o) returns the k-th element. Definition 3 (E2E multigraph) Let LD = (E, act, attr, EO, ≤) be an event log in a database context. Let FE = {(o, i) | o ∈ O ∧ 2 ≤ i ≤ |g(o)|} E such that for fE ∈ FE the following attributes are defined: Πobj (fE ) ∈ O E is the object associated to the edge, Πin (fE ) ∈ E is the input event associ- E ated to the edge, Πout (fE ) ∈ E is the output event associated to the edge, E + Πweight (fE ) ∈ R associates each edge to a positive real number expressing E its weight, Πperf (fE ) ∈ R+ ∪ {0} associates each edge to a non-negative real E number expressing its performance. For fE = (o, i) ∈ FE : Πobj (fE ) = o, E E E Πin (fE ) = ]i−1 (o), Πout (fE ) = ]i (o), Πweight (fE ) = w(o), E Πperf (fE ) = attr(Πout (fE ))(time) − attr(Πin (fE ))(time). The event to event multigraph (E2E) can be introduced having events as nodes and associating each couple  of events (e1 , e2 ) ∈ E × E to the following set of edges: RE (e1 , e2 ) = E E fE ∈ FE | Πin (fE ) = e1 ∧ Πout (fE ) = e2 . A representation of the E2E multigraph draws as many edges between a couple of events (e1 , e2 ) ∈ E ×E as the number of elements contained in the set RE (e1 , e2 ). 62 To each edge fE ∈ RE (e1 , e2 ), a label could be associated in the representation E E taking as example the weight Πweight (fE ) or the performance Πperf (fE ). Definition 4 (A2A multigraph) Let LD = (E, act, attr, EO, ≤) be an event log in a database context. Let FA = {(c, (a1 , a2 )) | c ∈ UOC ∧ (a1 , a2 ) ∈ A UA ×UA } such that for fA ∈ FA the following attributes are defined: Πclass (fA ) ∈ A UOC is the class associated to the edge, Πin (fA ) ∈ UA is the source activity A associated to the edge, Πout (fA ) ∈ UA is the target activity associated to the A edge, Πcount (fA ) ∈ N associates each edge to a natural number expressing the A number of occurrences, Πweight (fA ) ∈ R+ associates each edge to a positive real A number expressing its weight, Πperf (fA ) ∈ R+ ∪ {0} associates each edge to a non-negative real number expressing its performance. Let AE : FA → P(FE ) be E a function such that for fa ∈ FA : AE(fA ) = { fE ∈ FE | class(Πobj (fE )) = A E A E A Πclass (fA ) ∧ act(Πin (fE )) = Πin (fA ) ∧ act(Πout (fE )) = Πout (fA ) }. Then A A A for fA = (c, (a1 , a2 )) ∈ FA : Πclass (fA ) P = c, Πin (fA ) = a1 , Πout (fA ) = a2 , A A E A Πcount (fA ) = |AE(fA )|, Πweight (fA ) = Π (f fE ∈AE(fA ) weight E ), Π perf (fA ) = P E fE ∈AE(fA ) Πperf (fE ) A Πcount (fA ) . The activities multigraph (A2A) can be introduced having activities as nodes and associating each  couple of activities (a1 , a2 ) ∈ A×A to the A A following set of edges: RA (a1 , a2 ) = fA ∈ FA | Πin (fA ) = a1 ∧ Πout (fA ) = a2 A representation of the A2A multigraph (that is the visual element of a StarStar model) draws as many edges between a couple of activities (a1 , a2 ) ∈ A × A as the number of elements contained in the set RA (a1 , a2 ). To each edge fA ∈ RA (a1 , a2 ), a label could be associated in the representation taking as example A A the number of occurrences Πcount (fA ), the weight Πweight (fA ) or the perfor- A mance Πperf (fA ). Since by construction the edges in this graph can be associated Fig. 2: Representation of the Petri net obtained choosing the opportunity per- spective on the graph and applying projection. to elements in the E2E graph (through the AE function), the possibility to drill down to a classic event log (choosing a case notion) is maintained. Indeed, it is possible to define a projection function from an event log in database context to a classic event log (more insights on the differences could be found in [1]) in the fol- lowing way: proj(CD , LD ) = (C, E, case ev, act, attr, ≤) where C = ∪x∈CD id(x), case ev ∈ C → P(E) such that for all c ∈ CD , case ev(id(c)) = c. A sim- ple case notion that could be used after choosing an object class c ∈ UOC is: CD = ∪o∈O,class(o)=c {g(o)}. More advanced case notions could be found in [1]. 63 An example Petri net extracted from Dynamics CRM model (the A2A multi- graph has been represented in Fig. 1) could be found in Fig. 2. 3 Support tool In order to evaluate StarStar models, a ProM plug-in has been realized that is able to take as input a representation of the events happening at database level, is able to calculate the StarStar model starting from the data and to show it to the end user using the mxGraph library. The supported input data types include XOC logs [3], that are XMLs storing events along with their related objects and the status of the object model at the time the event happened, OpenSLEX meta-models [4] and Neo4J databases. Tools for increasing/decreasing the level of complexity of the process (number of edges or number of activities) are provided. Moreover, it is provided a way to graphically filter activities/edges that are related to a given perspective. Projection functions are provided to get a classic event log out of a StarStar model when a perspective is chosen. A Petri net extracted after the projection is represented in Fig. 2. 4 Conclusions This paper introduces StarStar models, providing a way to reduce ETL efforts on databases in order to enable process mining projects. StarStar models provide a multigraph visualization of the relationships between activities happening in a database, and the possibility to drill down. By selecting any case notion inter- actively we get a classic event log that can be analyzed using existing process mining techniques. Each step in the construction of a StarStar model has linear complexity and can be done on graph databases. A plug-in has been imple- mented on the ProM framework that can import the data, build the StarStar model, provide a visualization of the activities multigraph, and provide projec- tion functions. References 1. Berti, A., van Der Aalst, W.: arxiv: Starstar models: Process analysis on top of databases (2018) 2. Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M., Rodriguez-Muro, M., Xiao, G.: Ontop: Answering sparql queries over relational databases. Semantic Web 8(3), 471–487 (2017) 3. Li, G., de Carvalho, R.M., van der Aalst, W.: Automatic discovery of object-centric behavioral constraint models. In: International Conference on Business Information Systems. pp. 43–58. Springer (2017) 4. de Murillas, E.G.L., Reijers, H.A., van der Aalst, W.: Connecting databases with process mining: a meta model and toolset. Software & Systems Modeling pp. 1–39 (2018) 64