1. Introduction

March

Adaptive Handling of Out-of-order Streams in Conformance Checking

Kristo Raun

Riccardo Tommasini

0 2

Ahmed Awad

B Check Supply Apply Discount C

0 LIRIS Lab, INSA de Lyon , France 1 The British University in Dubai , Dubai , United Arab Emirates 2 University of Tartu , Tartu , Estonia

2024

25 2024 0000 0001

Organizations function through the execution of various business processes. Non-conformant behavior in these processes impacts organizations negatively through implications such as reduced eficiency, lower quality, and compliance risks. Thus, it is important to identify non-conformant process behavior rapidly. While this is a challenging problem on its own, it is further complicated by the advent of big data, distributed systems, and a fragmented landscape of cloud- and on-premise tools that all provide data for the analysis of a business process and for determining its conformance. In such a landscape, it is common that events may arrive out of order. This complicates the conformance-checking analysis, which commonly expects events within a process to arrive in a specific sequence allowed by the process model. This paper introduces the first streaming conformance-checking method that incorporates event time awareness, thus having the ability to correct imperfections stemming from the out-of-order arrival of events. The method is scalable, utilizing the Beamline framework built on top of Apache Flink. Furthermore, the method includes an adaptive approach for handling various levels of out-of-order events in event streams. Experiments were conducted to demonstrate the applicability of the method for real-world use cases with diferent levels of out-of-order events. The results indicate that the method is well suited for identifying conformance in business processes that rely on a multitude of underlying systems for aggregating a holistic view of the process.

eol>Streaming conformance checking Real-time analytics Process mining Scalability

1. Introduction

1 1 2 1 3 2 1

A B A D A C B of-order data may overcompensate at times when events arrive mostly on time and undercompensate when out-of- order event arrival [ 11 ]. An event is considered out of order event arrival is frequent. Thus, ideally, any method order when it arrives from a source after records with that handles out-of-order event arrival should be adap- later event timestamps. Many factors can explain why a tive in responding to the stream’s characteristics. Such stream may have events arriving out of order, the most an adaptive streaming conformance checker would be typical reasons being network reliability, bandwidth, and useful in industries where it is critical to have an up-to- load [ 12 ]. date overview of process executions, but the underlying Because a data stream is inherently infinite, the memsystems may introduce out-of-order events; for example, ory requirements on a stream processing system would healthcare, clickstream analytics, manufacturing, and be unbounded. Thus, the concept of windows is used smart mobility. to segment time into smaller units [13]. Windows allow

The contributions of this paper are threefold: (1) a aggregations and joins of multiple streams within spestreaming conformance-checking approach capable of cific timeframes. In the presence of out-of-order events, handling out-of-order events, (2) a novel approach for windows need to be maintained and possibly entirely adaptive handling of event-time progress in business recalculated. An example of out-of-order event arrival process streams, and (3) lifting an existing state-of-the- in the context of windows, and a comparison between art approach to using Apache Flink, allowing for a truly event time and system ingestion time is shown in Figstreaming and scalable conformance checking solution. ure 2. In this example, aggregating based on windows

The rest of the paper is structured as follows: in Sec- would indicate an increasing trend if looking at ingestion tion 2, we introduce the background, discussing out-of- time (3, 5, 6), while based on the event time, it is actually order event arrival in data streams, business process char- a downward trend (7, 5, 2). acteristics, and the current state of the art in streaming Since there is no limit on the potential delay of event conformance checking. In Section 3, we discuss an ex- arrival, an additional concept called watermarks is used in isting conformance-checking to be extended to handle data streams to keep track of event time progress and, esout-of-order event streams. We look at how the method sentially, limit the number of windows [14]. This avoids would handle out-of-order events and how the solution potential time lag due to outliers or faulty data that could can adapt based on the frequency of out-of-order event stretch windows indefinitely. While commonly, waterarrival. Section 4 describes the setting for running the marks are based on event time [ 12 ], this is complicated experiments and the results of the experiments, together for business process streams due to the characteristics of with a discussion of the results. Finally, Section 5 con- business processes and the fact that commonly, a single cludes the work. process execution does not have direct dependencies to other concurrent process executions, as we will see next.

2. Background 2.2. Business Process Characteristics

2.1. Out-of-order Event Streams Data analytics answers questions such as what happened and when [15]. However, traditional data analytics does Modern stream processing frameworks are designed for not generally consider data from a business process viewanalytics that build upon recent data stream aggregates, point. Processes have a defined course of action and especially in cases where real-time results are essential, constitute specific activities [ 1 ], thus enabling answers such as systems monitoring and decision support [ 6, 10 ]. to questions such as why something happened. A data stream that features data arriving from multiple Process models describe the behavior allowed in a prosources or a system built upon distributed systems will cess. Many notations exist with various characteristics, likely display a stream imperfection known as out-of

D B E

B E

B D C E

C B E such as Petri Nets, Directly-Follows Graphs, and BPMN models [ 1 ]. Most process models visualize the activities within the process and allow for behavior such as sequences, parallelism, choices, and loops. In the context of this paper, the method employs the trie data structure for describing process behavior [16]. A shortcoming of the trie is that it cannot explain parallelism and loops, thus being an approximation by a sample of the full process behavior; however, the trie is well suited for conformance-checking purposes, where the goal is to compare event data to the allowed behavior eficiently.

An example of the trie sampled from the process model in Figure 1 is shown in Figure 3.

The actual process executions are stored in event data.

Commonly, a single event contains attributes such as the case identifier (Case ID) , activity, and timestamp. All activities having the same case identifier make up a trace – a specific process execution instance. Events can be stored in static logs, called event logs, or in event streams, where each event arrives as it occurs in the real world.

2.3. Streaming Conformance Checking

Conformance checking is the examination of process executions and quantifying the non-conformant behavior compared to the behavior allowed by a process model [ 2 ]. The state-of-the-art output of a conformance checker is an alignment between the actual behavior (log moves) and a complete run allowed by the model (model moves) [17]. Alignments have the benefit of being more readily interpretable than most other diagnostic methods and have thus enjoyed wide adoption by the research community. An example alignment is shown in Table 2, with the log moves ⟨, , ⟩ indicating the actual process executions and the model moves ⟨, , ⟩ indicating a valid path in the model. The skip symbol ≫ indicates non-conformant behavior, typically associated with a conformance cost.

Traditional conformance checking is done on event logs extracted from business systems. While such a static approach is robust, it has some shortcomings; most notably, the data is obsolete by design. Thus, several recent research eforts have made conformance checking on event streams viable [ 3, 18, 5, 4 ].

log moves model moves

A A ≫ B

D D ≫ E

The work in [ 4 ] introduced the concept of alignments to streaming conformance checking by utilizing prefix alignments. In event streams, it is unknown whether the process executions have terminated, and a prefix alignment has the benefit of not penalizing the observations by expecting an entire model run. At the same time, a preifx alignment in a streaming setting allows a high level of interpretability of the discrepancy, which becomes especially relevant in larger processes.

While the initial prefix-alignment-based approaches ofered guarantees on the exactness of the results, they sufered from a relatively long latency, rendering the applicability of the methods challenging for faster event streams. Thus, new approaches have emerged recently. In [ 5 ], an approximate approach was introduced – IWS – for calculating prefix alignments on top of event streams. The algorithm utilizes the trie data structure as the allowed process behavior, improving the calculation time of alignments in some cases by several orders of magnitude while introducing only a moderate error compared to the previous state of the art. The work in [18] built upon the IWS algorithm, extending the approach with completeness and confidence metrics, allowing for quantification of warm-starting scenarios and a trace’s potential conclusion in a stream. It is the first prefix-alignment-based method that supports warm-starting scenarios in streaming settings – i.e., cases where the stream started before the conformance checker began measuring the results. In the next section, we will look deeper into the IWS approach to see why we think it is well suited to handle stream imperfections, and we present extensions to the original algorithm to handle out-of-order events and to adapt to changing frequency of out-of-order event arrival.

3. Methodology 3.1. Decay Time: Tracking Event Progress in IWS

The IWS algorithm [ 5 ] allows for fast and approximate conformance checking while under memory constraints in a streaming setting. For eficiency, IWS uses the trie trie

X E decay unprocessed time suffix - 3 | A arriving

events A 3 A

B C D E

A prefix alignment

X E 3 - 2 | A,X A 2 | A

B C D E

A X A X A X

X E - 1 | A,X,B A 1 X,B 2 | B

3 B C D E

A X

B A X B A >> B data structure to compare the event data to the expected repurposed to implement out-of-order handling into the process behavior. Due to the nature of business process algorithm, as will be described in the next section. executions, it may sometimes be necessary to re-calculate the optimal route in the trie, and due to erroneous activi- 3.2. Event Time Store ties, the method may end up on a wrong path in the trie.

Thus, the IWS approach uses a state bufer for keeping To handle out-of-order events, the algorithm is extended some past states available for recalculation and a dis- by an event time store that keeps the event time of each counted decay time hyperparameter to release old states arrived event. The event time store is an ordered keyfrom memory. The decay time is calculated based on value pair, with the key denoting the event time and the the diference between the average length from the root value being an array of events that occurred during this to a leaf node in the trie, indicating a roughly average time. For simplicity, we assume that if events have the expected trace length, and the current trace length. The same event time, the arrival order is the correct total outcome is multiplied with the discounting factor, a value order of these events. In other words, the array denotes between 0 and 1 to indicate the maximum percentage the arrival time of the events having this event time. of the average trace length that should be kept in mem- Formally, assume that is the set of timestamps, ℰ is the ory. For a more detailed explanation and formula, we set of events, and ℰ * is the set of all possible words over ℰ , refer to [ 5 ]. Conceptually, the decay time helps the algo- then the event store is a function : → ℰ * . As the rithm progress event time on a per-trace basis. Traces decay time releases states from memory, the event time may overlap and have diferent event execution patterns. store releases the earliest events from the event store. Thus, progressing based on event time, as it is commonly For out-of-order handling, each event time is first comdone with watermarks, would favor traces with rapid ex- pared to the largest key in the event store as events arrive. ecutions, while time-consuming process instances would If the new event has a timestamp equal to or larger than be quickly forgotten, and the analysis would sufer. Thus, the largest key, this event is arriving in order, and prothe decay time setting in IWS is not time-based but event- cessing continues as usual. If the largest key is larger based in the scope of a specific trace. than the timestamp of the arrived event, all the events

An example of the state bufer and the decay time are with a larger timestamp in the event store are considered shown in Figure 4. With the arrival of the first event, , out-of-order events and are piped for a new replay. Furtwo states are initiated, with the state at the root node thermore, any states in the state bufer that have played holding in its unprocessed sufix. The unprocessed out any out-of-order events are removed from the bufer, sufix is used to replay the moves upon the arrival of while the rest of the states remove the unprocessed sufix the next events. Decay time indicates how many events that matches the new sequence of events. within this trace should arrive before the state will be To illustrate, let’s consider an example of allowed becleared from memory. The optimal alignment at each havior as shown by the trie in Figure 3. event arrival is also shown. For event arrival , there Assume we observed events ⟨, , ⟩ with event are actually multiple optimal alignments, but only one timestamps of 1, 3 and 5, respectively. While this trace alignment is shown in the figure for illustrative purposes. could have multiple optimal alignments, for simplicity, Importantly, the state bufer allows the retraction of the assume we have the same alignment as in Table 2. false path traversal to node . This behavior can be If the algorithm now receives event with an event

Event time aware Non-event time aware A A A

B B ≫ B

D D D D ≫ ≫ E

B B would have events in order. Thus, the formula is applied globally to all traces within the process.

3.4. Implementation

Table 3 In order to be truly scalable, the original algorithm and Comparison of event time aware and non-aware alignments. the extensions introduced in this paper are implemented on top of the Beamline framework [20]. The Beamline timestamp 2, it first checks the event store to see whether framework utilizes Apache Flink as the runtime engine, the events are arriving in order. The event store’s largest allowing the algorithm’s execution to scale across a cluskey (5) is larger than the arrived event timestamp (2). ter of computing nodes. Commonly, each individual trace Thus, all events with timestamps larger than 2 are con- in a business process is looked at separately. Thus, partisidered out-of-order, and all of the states in memory that tioning by the case identifier would theoretically allow have played out the events and are removed. The scaling of the processing to as many nodes as there are resulting alignment of the event time aware solution is cases within the process. shown in Table 3, with a comparison to the original non- The source code for the implementation, together with event time aware version that assumes all events arrive instructions for running the experiments and the datasets in order, leading to a higher conformance cost. used, have been made available on GitHub1.

3.3. Adaptive Event Time Progress 4. Experiments

The arrival of out-of-order events cannot be expected to be static throughout the life of the stream. Thus, ap- 4.1. Setting proaches in stream processing have been devised to adapt Several real-life event logs were used to test the handling the watermarks based on concept drifts – changes in data of out-of-order events. The logs had to be manipulated arrival frequency and delays [ 9 ]. This paper introduces a to mimic the out-of-order scenario, as the original logs novel approach for adaptive event time progress suited were grouped by trace and in temporal order. The logs for business process data. Namely, we adopt the Expo- used in this paper are well-known real-life process event nentially Weighted Moving Averages (EWMA) metric logs: BPI 20122, BPI 20173, and BPI 2020 Travel Permits4. from inventory and financial planning [ 19] and modify The steps done for running the experiments are shown it to work as a sensor for indicating the level of out-of- in Figure 5. To limit the scope of the experiments, the orderedness. logs were first randomly sampled to 100 traces (step 1).

To adapt to the stream’s frequency of out-of-order Then, events within a trace were swapped with various events, we extend the algorithm with the following settings ranging from no out-of-order events to fully outmethod to modify the discounting factor. With every of-order events (step 2). The settings are described in new event, we check if the event is out of order. If it is Table 4, showing the probability of a swap, i.e., how likely out of order, we assign it a boolean value of 1 and 0 if it a single event is to trade places with another event within is not. Then, we increase (or decrease) the discounting the same trace, and max distance, i.e., how far from the factor using the following formula: current position can an event be swapped to. For example, with the swap_01 setting, each event has a one percent = * + (1 − ) * likelihood of getting swapped with a maximum distance of one, meaning that it will trade places with the event directly before or after.

The unaltered sampled log was used for generating the trie – the process model that describes the expected behavior (step 3). Since the IWS algorithm is capable of streaming conformance checking, the experiments were also conducted in a streaming fashion using an MQTT broker. A Python script published the out-oforder logs to MQTT topics (step 4), and the algorithm

Where is the discounting factor, is the smoothing factor, and is the Boolean value of whether it is an out-of-order event. In our experiments, we found an alpha of 0.005 to represent an appropriate change in the discounting factor.

Intuitively, if the proportion of out-of-order events has increased, then the discounting factor will increase, thus keeping in memory a larger amount of states and allowing for improved out-of-order event handling. If the frequency of out-of-order events decreases, so too will the discounting factor, releasing the memory strain.

We consider it unlikely that any specific trace would start exhibiting out-of-order behavior while other traces 1https://github.com/DataSystemsGroupUT/ StreamingConformanceChecker 2https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f 3https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b 4https://doi.org/10.4121/uuid:52fb97d4-4588-43c9-9d04-3604d4613b51 MQTT broker

6 Algorithm

Results file Original log 1 Sampled

log is a further order of magnitude slower than the nonadaptive method, with values of 1.89 ms/event for nonaware and 139.32 ms/event for the adaptive methods.

For the other datasets, a similar pattern can be observed. For BPI2017, the diference between non-aware and out-of-order handling methods is clear starting from _50, and for _99 the diference between nonaware and the adaptive method is almost three orders of magnitude. It can be observed that the execution time of the non-aware version of the algorithm does not increase with increased out-of-orderedness – this is because the algorithm does no recalculation for out-of-order event arrival and simply assumes that there is a great deal of non-conformant behavior occurring in the event stream.

For BPI2020 log, the results vary slightly more, with non-aware and non-adaptive versions being roughly equivalent for _50, and the adaptive method is faster than the non-adaptive method for _99. However, this may be due to the fact that this is the smallest of the datasets, as can be seen by the execution time remaining under 1-2ms per event. received the events by subscribing to these topics (step 5). The results of the experiments were outputted to a ifle on an event-by-event basis (step 6), measuring the latency of the algorithm – how long it takes to process an event – and the cost of the latest alignment of the case to where this particular event belongs to.

The method was instantiated with the default settings for the decay time variable: a minimum decay time () of 3 and a discounting factor ( ) of 0.3. The method was run with the adaptive add-on from Section 3.3 turned on (adaptive), turned of ( non-adaptive), and the out-oforder handling turned of ( non-aware, i.e., the original 4.2.2. Cost.

IWS algorithm). A core measure of a conformance checker is the cost. In

All experiments were executed three times and aver- this paper, we measure the cost of an alignment, i.e., simaged to mitigate possible runtime outliers impacting the ilarly to an edit distance diference between the expected results. The experiments were conducted on a machine process behavior and the actual observed behavior. The using Java 11 and Python 3.9. The MQTT broker used benefit of using an alignment is explainability, indicating was EMQX 5.1, which was run using Docker. clearly which part of the process contains the discrepancy. It is important to remember that the non-aware version 4.2. Results naively assumes that the order in which the events arrive is the order in which the events happened, thus nega4.2.1. Latency. tively impacting the alignment cost because the events were actually in the correct order but swapped. The cost results are summarized in Table 5, with color-coding from green (the best result) to red (the worst result) per log and swap variation.

An observation can be made that as the amount of swaps increases, the cost increases. This is true for all executions, except the adaptive algorithm on BPI2012 that slightly decreases cost for _99 compared to _50. This is due to the randomness of the out-oforderedness in the generated event data. In general, the cost increase is due to the fact that with the swaps, we have introduced superficial non-conformant behavior. As was shown in Table 4, the higher swap settings increase the likelihood and distance of an event displacement, thus having an increased amount of non-conformant behavior.

Comparing the diferent versions of the algorithm, it is clear that the non-aware version severely penalizes the out-of-order events. The diference between adaptive and non-adaptive versions is minuscule until _20, when the adaptive versions starts to outperform the nonswap_01 swap_05 swap_10 swap_20 swap_50 swap_99 swap_00 swap_01 swap_05 swap_10 swap_20 swap_50 swap_99 swap_00 swap_01 swap_05 swap_10 swap_20 swap_50 swap_99 adaptive version, and the _50 and _99 variations have an almost double the diference in cost, in favor of the adaptive version.

4.3. Discussion

Based on the results, we can say that we have introduced a streaming conformance-checking approach capable of handling out-of-order events. Furthermore, it seems that the adaptive handling of event-time progress is well suited for adapting to an increased load of out-oforder event arrivals. In general, the introduced methodology seems to work well for handling out-of-order events, even for streams where the portion of out-of-orderedness is relatively high. As expected, higher amounts of out-oforder events impact latency negatively. At the same time, the cost is greatly improved compared to the original IWS algorithm, which is unaware of event time. For smaller business processes, such as BPI2020, the adaptive method has low latency even with extreme out-of-orderedness. However, with more complex business processes, such as BPI2017, the non-adaptive method may be more sensical from the latency perspective.

Some threats of validation include the fact that only a few datasets were used in this comparison. Furthermore, the out-of-orderedness had to be mimicked because no known process mining logs or streams that exhibit outof-order events are publicly available. The swap settings could be further investigated, as the probabilities and max distances could increase orthogonally, not in correlation.

One thing to address in future research would be the fact that if multiple events have the same timestamp, then the method should not blindly assume that the arrival order within the timestamp is correct. This is seen, for example, on the BPI2012 dataset, with many simultaneous timestamps. Having a method that would be able to ifnd the optimal solution from partial order would be a further improvement to the introduced approach.

Ultimately, as the results are positive, and the latencies are generally low for most experiments, we believe this method would be applicable for real-life use cases for running conformance checking on distributed systems.

5. Conclusion

This paper introduced an approach for handling outof-order event arrival in streaming conformance checking. To the best of our knowledge, this is the first conformance-checking approach that is aware of event time and, thus, is able to handle such stream imperfections. The contribution included a novel approach for adapting the event time progress based on the frequency of out-of-order event arrival. The approach was implemented using the Beamline framework built on top of adap.

non-adap.

non-aw.

adap.

non-adap.

non-aw.

adap.

non-adap.

non-aw. 0.0 1.6 6.0 13.7 24.9 44.9 48.7

Acknowledgments

This work was supported by the European Social Fund via "ICT programme" measure, the European Regional Development Fund, and the programme Mobilitas Pluss (2014-2020.4.01.16-0024).

Apache Flink, making the method easily scalable and able to handle large data volumes in stream processing.

Future research aims to look at handling partial order in event streams, as the current method assumes the arrival order for events having the same event time is the total order. Furthermore, a more extensive study could be done of the swap settings in event streams and their impact on out-of-order event handling. [18] K. Raun, M. Nielsen, A. Burattin, A. Awad, C-3PA: Streaming conformance, confidence and completeness in prefix-alignments, in: International Conference on Advanced Information Systems Engineering, Springer, 2023, pp. 437–453. [19] P. R. Winters, Forecasting sales by exponentially weighted moving averages, Management science 6 (1960) 324–342. [20] A. Burattin, Streaming process mining with beamline, ICPM Demos (2022).

[7]

Burattin , Streaming process mining, Process Mining Handbook ( 2022 ) 349 - 372 .

[8]

Awad ,

Weidlich ,

Sakr , Process mining over unordered event streams , in: 2020 2nd International Conference on Process Mining (ICPM) , IEEE, 2020 , pp. 81 - 88 .

[9]

Awad ,

Traub ,

Sakr , Adaptive watermarks: A concept drift-based approach for predicting eventtime progress in data streams ., in: EDBT , 2019 , pp. 622 - 625 .

[10]

Zhang ,

Soto ,

Markl , A survey on transactional stream processing , The VLDB Journal ( 2023 ) 1 - 29 .

[11]

Li ,

Tufte ,

Shkapenyuk ,

Papadimos ,

Johnson , D. Maier, Out-of-order processing: a new architecture for high-performance stream systems , Proceedings of the VLDB Endowment 1 ( 2008 ) 274 - 288 .

[12]

Fragkoulis ,

Carbone ,

Kalavri , A . Katsifodi-

[1] W. M. van der Aalst , Process mining: a 360 degree mos, A survey on the evolution of stream prooverview , in: Process Mining Handbook , Springer, cessing systems, arXiv preprint arXiv:2008.00842 2022 , pp. 3 - 34 . ( 2020 ).

[2]

Carmona , B. F. van Dongen ,

Solti ,

Weidlich , [13]

Patroumpas , T. Sellis, Window specification Conformance Checking - Relating Processes and over data streams , in: International Conference on Models, Springer, 2018 . Extending Database Technology, Springer, 2006 , pp.

[3]

Burattin ,

Carmona , A framework for online 445-464 . conformance checking, in: International Confer- [14]

Carbone ,

Katsifodimos ,

Ewen , V. Markl, ence on Business Process Management, Springer, S. Haridi,

Tzoumas , Apache flink: Stream and 2017 , pp. 165 - 177 . batch processing in a single engine, The Bulletin

[4] S. J. van Zelst ,

Bolt ,

Hassani , B. F. van Dongen , of the Technical Committee on Data Engineering W. M. van der Aalst, Online conformance check- 38 ( 2015 ). ing: relating event streams to process models using [15]

Sun ,

Strang , Big data analytics serprefix-alignments, International Journal of Data vices for enhancing business intelligence , Journal of Science and Analytics 8 ( 2019 ) 269 - 284 . Computer Information Systems 58 ( 2018 ) 162 - 169 .

[5]

Raun ,

Tommasini ,

Awad , I will survive: An [16]

Awad ,

Raun ,

Weidlich , Eficient approxievent-driven conformance checking approach over mate conformance checking using trie data strucprocess streams , in: DEBS, ACM, 2023 , pp. 49 - 60 . tures, in: 2021 3rd International Conference on

[6]

Isah ,

Abughofa ,

Mahfuz ,

Ajerla ,

Zulk- Process Mining (ICPM), IEEE, 2021 , pp. 1 - 8 . ernine, S. Khan, A survey of distributed data [17] W. van der Aalst ,

Adriansyah , B. van Dongen , Restream processing frameworks, IEEE Access 7 playing history on process models for conformance ( 2019 ) 154300 - 154316 . checking and performance analysis , WIREs Data Mining and Knowledge Discovery 2 ( 2012 ) 182 - 192 .