<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Adaptive Handling of Out-of-order Streams in Conformance Checking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kristo Raun</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Tommasini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahmed Awad</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B Check Supply Apply Discount C</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIRIS Lab, INSA de Lyon</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The British University in Dubai</institution>
          ,
          <addr-line>Dubai</addr-line>
          ,
          <country country="AE">United Arab Emirates</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Tartu</institution>
          ,
          <addr-line>Tartu</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>25</volume>
      <issue>2024</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Organizations function through the execution of various business processes. Non-conformant behavior in these processes impacts organizations negatively through implications such as reduced eficiency, lower quality, and compliance risks. Thus, it is important to identify non-conformant process behavior rapidly. While this is a challenging problem on its own, it is further complicated by the advent of big data, distributed systems, and a fragmented landscape of cloud- and on-premise tools that all provide data for the analysis of a business process and for determining its conformance. In such a landscape, it is common that events may arrive out of order. This complicates the conformance-checking analysis, which commonly expects events within a process to arrive in a specific sequence allowed by the process model. This paper introduces the first streaming conformance-checking method that incorporates event time awareness, thus having the ability to correct imperfections stemming from the out-of-order arrival of events. The method is scalable, utilizing the Beamline framework built on top of Apache Flink. Furthermore, the method includes an adaptive approach for handling various levels of out-of-order events in event streams. Experiments were conducted to demonstrate the applicability of the method for real-world use cases with diferent levels of out-of-order events. The results indicate that the method is well suited for identifying conformance in business processes that rely on a multitude of underlying systems for aggregating a holistic view of the process.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Streaming conformance checking</kwd>
        <kwd>Real-time analytics</kwd>
        <kwd>Process mining</kwd>
        <kwd>Scalability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>1
1
2
1
3
2
1</p>
      <p>
        A
B
A
D
A
C
B
of-order data may overcompensate at times when events
arrive mostly on time and undercompensate when out-of- order event arrival [
        <xref ref-type="bibr" rid="ref5">11</xref>
        ]. An event is considered out of
order event arrival is frequent. Thus, ideally, any method order when it arrives from a source after records with
that handles out-of-order event arrival should be adap- later event timestamps. Many factors can explain why a
tive in responding to the stream’s characteristics. Such stream may have events arriving out of order, the most
an adaptive streaming conformance checker would be typical reasons being network reliability, bandwidth, and
useful in industries where it is critical to have an up-to- load [
        <xref ref-type="bibr" rid="ref6">12</xref>
        ].
date overview of process executions, but the underlying Because a data stream is inherently infinite, the
memsystems may introduce out-of-order events; for example, ory requirements on a stream processing system would
healthcare, clickstream analytics, manufacturing, and be unbounded. Thus, the concept of windows is used
smart mobility. to segment time into smaller units [13]. Windows allow
      </p>
      <p>The contributions of this paper are threefold: (1) a aggregations and joins of multiple streams within
spestreaming conformance-checking approach capable of cific timeframes. In the presence of out-of-order events,
handling out-of-order events, (2) a novel approach for windows need to be maintained and possibly entirely
adaptive handling of event-time progress in business recalculated. An example of out-of-order event arrival
process streams, and (3) lifting an existing state-of-the- in the context of windows, and a comparison between
art approach to using Apache Flink, allowing for a truly event time and system ingestion time is shown in
Figstreaming and scalable conformance checking solution. ure 2. In this example, aggregating based on windows</p>
      <p>
        The rest of the paper is structured as follows: in Sec- would indicate an increasing trend if looking at ingestion
tion 2, we introduce the background, discussing out-of- time (3, 5, 6), while based on the event time, it is actually
order event arrival in data streams, business process char- a downward trend (7, 5, 2).
acteristics, and the current state of the art in streaming Since there is no limit on the potential delay of event
conformance checking. In Section 3, we discuss an ex- arrival, an additional concept called watermarks is used in
isting conformance-checking to be extended to handle data streams to keep track of event time progress and,
esout-of-order event streams. We look at how the method sentially, limit the number of windows [14]. This avoids
would handle out-of-order events and how the solution potential time lag due to outliers or faulty data that could
can adapt based on the frequency of out-of-order event stretch windows indefinitely. While commonly,
waterarrival. Section 4 describes the setting for running the marks are based on event time [
        <xref ref-type="bibr" rid="ref6">12</xref>
        ], this is complicated
experiments and the results of the experiments, together for business process streams due to the characteristics of
with a discussion of the results. Finally, Section 5 con- business processes and the fact that commonly, a single
cludes the work. process execution does not have direct dependencies to
other concurrent process executions, as we will see next.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.2. Business Process Characteristics</title>
        <p>
          2.1. Out-of-order Event Streams Data analytics answers questions such as what happened
and when [15]. However, traditional data analytics does
Modern stream processing frameworks are designed for not generally consider data from a business process
viewanalytics that build upon recent data stream aggregates, point. Processes have a defined course of action and
especially in cases where real-time results are essential, constitute specific activities [
          <xref ref-type="bibr" rid="ref7">1</xref>
          ], thus enabling answers
such as systems monitoring and decision support [
          <xref ref-type="bibr" rid="ref12 ref4">6, 10</xref>
          ]. to questions such as why something happened.
A data stream that features data arriving from multiple Process models describe the behavior allowed in a
prosources or a system built upon distributed systems will cess. Many notations exist with various characteristics,
likely display a stream imperfection known as
out-of
        </p>
        <p>D
B
E</p>
        <p>C</p>
        <p>E</p>
        <p>B
E</p>
        <p>B
D
C
E</p>
        <p>
          C
B
E
such as Petri Nets, Directly-Follows Graphs, and BPMN
models [
          <xref ref-type="bibr" rid="ref7">1</xref>
          ]. Most process models visualize the activities
within the process and allow for behavior such as
sequences, parallelism, choices, and loops. In the context
of this paper, the method employs the trie data
structure for describing process behavior [16]. A
shortcoming of the trie is that it cannot explain parallelism and
loops, thus being an approximation by a sample of the
full process behavior; however, the trie is well suited for
conformance-checking purposes, where the goal is to
compare event data to the allowed behavior eficiently.
        </p>
        <p>An example of the trie sampled from the process model
in Figure 1 is shown in Figure 3.</p>
        <p>The actual process executions are stored in event data.</p>
        <p>Commonly, a single event contains attributes such as
the case identifier (Case ID) , activity, and timestamp. All
activities having the same case identifier make up a trace
– a specific process execution instance. Events can be
stored in static logs, called event logs, or in event streams,
where each event arrives as it occurs in the real world.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Streaming Conformance Checking</title>
        <p>
          Conformance checking is the examination of process
executions and quantifying the non-conformant
behavior compared to the behavior allowed by a process
model [
          <xref ref-type="bibr" rid="ref8">2</xref>
          ]. The state-of-the-art output of a conformance
checker is an alignment between the actual behavior
(log moves) and a complete run allowed by the model
(model moves) [17]. Alignments have the benefit of being
more readily interpretable than most other diagnostic
methods and have thus enjoyed wide adoption by the
research community. An example alignment is shown in
Table 2, with the log moves ⟨, , ⟩ indicating the
actual process executions and the model moves ⟨, , ⟩
indicating a valid path in the model. The skip symbol ≫
indicates non-conformant behavior, typically associated
with a conformance cost.
        </p>
        <p>
          Traditional conformance checking is done on event
logs extracted from business systems. While such a static
approach is robust, it has some shortcomings; most
notably, the data is obsolete by design. Thus, several recent
research eforts have made conformance checking on
event streams viable [
          <xref ref-type="bibr" rid="ref10 ref11 ref9">3, 18, 5, 4</xref>
          ].
        </p>
        <p>log moves
model moves</p>
        <p>A
A
≫
B</p>
        <p>D
D
≫
E</p>
        <p>
          The work in [
          <xref ref-type="bibr" rid="ref10">4</xref>
          ] introduced the concept of alignments
to streaming conformance checking by utilizing prefix
alignments. In event streams, it is unknown whether the
process executions have terminated, and a prefix
alignment has the benefit of not penalizing the observations
by expecting an entire model run. At the same time, a
preifx alignment in a streaming setting allows a high level
of interpretability of the discrepancy, which becomes
especially relevant in larger processes.
        </p>
        <p>
          While the initial prefix-alignment-based approaches
ofered guarantees on the exactness of the results, they
sufered from a relatively long latency, rendering the
applicability of the methods challenging for faster event
streams. Thus, new approaches have emerged recently.
In [
          <xref ref-type="bibr" rid="ref11">5</xref>
          ], an approximate approach was introduced – IWS –
for calculating prefix alignments on top of event streams.
The algorithm utilizes the trie data structure as the
allowed process behavior, improving the calculation time of
alignments in some cases by several orders of magnitude
while introducing only a moderate error compared to the
previous state of the art. The work in [18] built upon the
IWS algorithm, extending the approach with
completeness and confidence metrics, allowing for quantification
of warm-starting scenarios and a trace’s potential
conclusion in a stream. It is the first prefix-alignment-based
method that supports warm-starting scenarios in
streaming settings – i.e., cases where the stream started before
the conformance checker began measuring the results.
In the next section, we will look deeper into the IWS
approach to see why we think it is well suited to
handle stream imperfections, and we present extensions to
the original algorithm to handle out-of-order events and
to adapt to changing frequency of out-of-order event
arrival.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Decay Time: Tracking Event Progress in IWS</title>
        <p>
          The IWS algorithm [
          <xref ref-type="bibr" rid="ref11">5</xref>
          ] allows for fast and approximate
conformance checking while under memory constraints
in a streaming setting. For eficiency, IWS uses the trie
trie
        </p>
        <p>X
E
decay unprocessed
time suffix
- 3 | A arriving</p>
        <p>events
A 3 A</p>
        <p>B
C
D
E</p>
        <p>A</p>
        <p>A
prefix
alignment</p>
        <p>X
E
3
- 2 | A,X
A 2 | A</p>
        <p>B
C
D
E</p>
        <p>A
X
A X
A X</p>
        <p>X
E
- 1 | A,X,B
A 1 X,B
2 | B</p>
        <p>3
B
C
D
E</p>
        <p>A
X</p>
        <p>B
A X B
A &gt;&gt; B
data structure to compare the event data to the expected repurposed to implement out-of-order handling into the
process behavior. Due to the nature of business process algorithm, as will be described in the next section.
executions, it may sometimes be necessary to re-calculate
the optimal route in the trie, and due to erroneous activi- 3.2. Event Time Store
ties, the method may end up on a wrong path in the trie.</p>
        <p>
          Thus, the IWS approach uses a state bufer for keeping To handle out-of-order events, the algorithm is extended
some past states available for recalculation and a dis- by an event time store that keeps the event time of each
counted decay time hyperparameter to release old states arrived event. The event time store is an ordered
keyfrom memory. The decay time is calculated based on value pair, with the key denoting the event time and the
the diference between the average length from the root value being an array of events that occurred during this
to a leaf node in the trie, indicating a roughly average time. For simplicity, we assume that if events have the
expected trace length, and the current trace length. The same event time, the arrival order is the correct total
outcome is multiplied with the discounting factor, a value order of these events. In other words, the array denotes
between 0 and 1 to indicate the maximum percentage the arrival time of the events having this event time.
of the average trace length that should be kept in mem- Formally, assume that  is the set of timestamps, ℰ is the
ory. For a more detailed explanation and formula, we set of events, and ℰ * is the set of all possible words over ℰ ,
refer to [
          <xref ref-type="bibr" rid="ref11">5</xref>
          ]. Conceptually, the decay time helps the algo- then the event store  is a function  :  → ℰ * . As the
rithm progress event time on a per-trace basis. Traces decay time releases states from memory, the event time
may overlap and have diferent event execution patterns. store releases the earliest events from the event store.
Thus, progressing based on event time, as it is commonly For out-of-order handling, each event time is first
comdone with watermarks, would favor traces with rapid ex- pared to the largest key in the event store as events arrive.
ecutions, while time-consuming process instances would If the new event has a timestamp equal to or larger than
be quickly forgotten, and the analysis would sufer. Thus, the largest key, this event is arriving in order, and
prothe decay time setting in IWS is not time-based but event- cessing continues as usual. If the largest key is larger
based in the scope of a specific trace. than the timestamp of the arrived event, all the events
        </p>
        <p>An example of the state bufer and the decay time are with a larger timestamp in the event store are considered
shown in Figure 4. With the arrival of the first event, , out-of-order events and are piped for a new replay.
Furtwo states are initiated, with the state at the root node thermore, any states in the state bufer that have played
holding  in its unprocessed sufix. The unprocessed out any out-of-order events are removed from the bufer,
sufix is used to replay the moves upon the arrival of while the rest of the states remove the unprocessed sufix
the next events. Decay time indicates how many events that matches the new sequence of events.
within this trace should arrive before the state will be To illustrate, let’s consider an example of allowed
becleared from memory. The optimal alignment at each havior as shown by the trie in Figure 3.
event arrival is also shown. For event arrival , there Assume we observed events ⟨, , ⟩ with event
are actually multiple optimal alignments, but only one timestamps of 1, 3 and 5, respectively. While this trace
alignment is shown in the figure for illustrative purposes. could have multiple optimal alignments, for simplicity,
Importantly, the state bufer allows the retraction of the assume we have the same alignment as in Table 2.
false path traversal to node . This behavior can be If the algorithm now receives event  with an event</p>
        <sec id="sec-3-1-1">
          <title>Event time aware</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Non-event time aware A A A</title>
          <p>A</p>
          <p>B
B
≫
B</p>
          <p>D
D
D
D
≫
≫
E</p>
          <p>B
B
would have events in order. Thus, the formula is applied
globally to all traces within the process.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.4. Implementation</title>
        <p>Table 3 In order to be truly scalable, the original algorithm and
Comparison of event time aware and non-aware alignments. the extensions introduced in this paper are implemented
on top of the Beamline framework [20]. The Beamline
timestamp 2, it first checks the event store to see whether framework utilizes Apache Flink as the runtime engine,
the events are arriving in order. The event store’s largest allowing the algorithm’s execution to scale across a
cluskey (5) is larger than the arrived event timestamp (2). ter of computing nodes. Commonly, each individual trace
Thus, all events with timestamps larger than 2 are con- in a business process is looked at separately. Thus,
partisidered out-of-order, and all of the states in memory that tioning by the case identifier would theoretically allow
have played out the events  and  are removed. The scaling of the processing to as many nodes as there are
resulting alignment of the event time aware solution is cases within the process.
shown in Table 3, with a comparison to the original non- The source code for the implementation, together with
event time aware version that assumes all events arrive instructions for running the experiments and the datasets
in order, leading to a higher conformance cost. used, have been made available on GitHub1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Adaptive Event Time Progress</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>
        The arrival of out-of-order events cannot be expected
to be static throughout the life of the stream. Thus, ap- 4.1. Setting
proaches in stream processing have been devised to adapt Several real-life event logs were used to test the handling
the watermarks based on concept drifts – changes in data of out-of-order events. The logs had to be manipulated
arrival frequency and delays [
        <xref ref-type="bibr" rid="ref3">9</xref>
        ]. This paper introduces a to mimic the out-of-order scenario, as the original logs
novel approach for adaptive event time progress suited were grouped by trace and in temporal order. The logs
for business process data. Namely, we adopt the Expo- used in this paper are well-known real-life process event
nentially Weighted Moving Averages (EWMA) metric logs: BPI 20122, BPI 20173, and BPI 2020 Travel Permits4.
from inventory and financial planning [ 19] and modify The steps done for running the experiments are shown
it to work as a sensor for indicating the level of out-of- in Figure 5. To limit the scope of the experiments, the
orderedness. logs were first randomly sampled to 100 traces (step 1).
      </p>
      <p>To adapt to the stream’s frequency of out-of-order Then, events within a trace were swapped with various
events, we extend the algorithm with the following settings ranging from no out-of-order events to fully
outmethod to modify the discounting factor. With every of-order events (step 2). The settings are described in
new event, we check if the event is out of order. If it is Table 4, showing the probability of a swap, i.e., how likely
out of order, we assign it a boolean value of 1 and 0 if it a single event is to trade places with another event within
is not. Then, we increase (or decrease) the discounting the same trace, and max distance, i.e., how far from the
factor using the following formula: current position can an event be swapped to. For example,
with the swap_01 setting, each event has a one percent
 =  *  + (1 −  ) *  likelihood of getting swapped with a maximum distance
of one, meaning that it will trade places with the event
directly before or after.</p>
      <p>The unaltered sampled log was used for generating
the trie – the process model that describes the expected
behavior (step 3). Since the IWS algorithm is capable
of streaming conformance checking, the experiments
were also conducted in a streaming fashion using an
MQTT broker. A Python script published the
out-oforder logs to MQTT topics (step 4), and the algorithm</p>
      <p>Where  is the discounting factor,  is the smoothing
factor, and  is the Boolean value of whether it is an
out-of-order event. In our experiments, we found an
alpha of 0.005 to represent an appropriate change in the
discounting factor.</p>
      <p>Intuitively, if the proportion of out-of-order events
has increased, then the discounting factor will increase,
thus keeping in memory a larger amount of states and
allowing for improved out-of-order event handling. If
the frequency of out-of-order events decreases, so too
will the discounting factor, releasing the memory strain.</p>
      <p>We consider it unlikely that any specific trace would
start exhibiting out-of-order behavior while other traces
1https://github.com/DataSystemsGroupUT/
StreamingConformanceChecker
2https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
3https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
4https://doi.org/10.4121/uuid:52fb97d4-4588-43c9-9d04-3604d4613b51
MQTT
broker</p>
      <p>6
Algorithm</p>
      <p>Results
file
Original
log
1
Sampled</p>
      <p>log
is a further order of magnitude slower than the
nonadaptive method, with values of 1.89 ms/event for
nonaware and 139.32 ms/event for the adaptive methods.</p>
      <p>For the other datasets, a similar pattern can be
observed. For BPI2017, the diference between non-aware
and out-of-order handling methods is clear starting from
_50, and for _99 the diference between
nonaware and the adaptive method is almost three orders of
magnitude. It can be observed that the execution time of
the non-aware version of the algorithm does not increase
with increased out-of-orderedness – this is because the
algorithm does no recalculation for out-of-order event
arrival and simply assumes that there is a great deal of
non-conformant behavior occurring in the event stream.</p>
      <p>For BPI2020 log, the results vary slightly more, with
non-aware and non-adaptive versions being roughly
equivalent for _50, and the adaptive method is
faster than the non-adaptive method for _99.
However, this may be due to the fact that this is the smallest
of the datasets, as can be seen by the execution time
remaining under 1-2ms per event.
received the events by subscribing to these topics (step
5). The results of the experiments were outputted to a
ifle on an event-by-event basis (step 6), measuring the
latency of the algorithm – how long it takes to process
an event – and the cost of the latest alignment of the case
to where this particular event belongs to.</p>
      <p>The method was instantiated with the default settings
for the decay time variable: a minimum decay time ()
of 3 and a discounting factor ( ) of 0.3. The method was
run with the adaptive add-on from Section 3.3 turned
on (adaptive), turned of ( non-adaptive), and the
out-oforder handling turned of ( non-aware, i.e., the original 4.2.2. Cost.</p>
      <p>IWS algorithm). A core measure of a conformance checker is the cost. In</p>
      <p>All experiments were executed three times and aver- this paper, we measure the cost of an alignment, i.e.,
simaged to mitigate possible runtime outliers impacting the ilarly to an edit distance diference between the expected
results. The experiments were conducted on a machine process behavior and the actual observed behavior. The
using Java 11 and Python 3.9. The MQTT broker used benefit of using an alignment is explainability, indicating
was EMQX 5.1, which was run using Docker. clearly which part of the process contains the discrepancy.
It is important to remember that the non-aware version
4.2. Results naively assumes that the order in which the events arrive
is the order in which the events happened, thus
nega4.2.1. Latency. tively impacting the alignment cost because the events
were actually in the correct order but swapped. The cost
results are summarized in Table 5, with color-coding from
green (the best result) to red (the worst result) per log
and swap variation.</p>
      <p>An observation can be made that as the amount of
swaps increases, the cost increases. This is true for all
executions, except the adaptive algorithm on BPI2012
that slightly decreases cost for _99 compared to
_50. This is due to the randomness of the
out-oforderedness in the generated event data. In general, the
cost increase is due to the fact that with the swaps, we
have introduced superficial non-conformant behavior. As
was shown in Table 4, the higher swap settings increase
the likelihood and distance of an event displacement, thus
having an increased amount of non-conformant behavior.</p>
      <p>Comparing the diferent versions of the algorithm, it
is clear that the non-aware version severely penalizes
the out-of-order events. The diference between adaptive
and non-adaptive versions is minuscule until _20,
when the adaptive versions starts to outperform the
nonswap_01
swap_05
swap_10
swap_20
swap_50
swap_99
swap_00
swap_01
swap_05
swap_10
swap_20
swap_50
swap_99
swap_00
swap_01
swap_05
swap_10
swap_20
swap_50
swap_99
adaptive version, and the _50 and _99
variations have an almost double the diference in cost, in
favor of the adaptive version.</p>
      <sec id="sec-4-1">
        <title>4.3. Discussion</title>
        <p>Based on the results, we can say that we have
introduced a streaming conformance-checking approach
capable of handling out-of-order events. Furthermore, it
seems that the adaptive handling of event-time progress
is well suited for adapting to an increased load of
out-oforder event arrivals. In general, the introduced
methodology seems to work well for handling out-of-order events,
even for streams where the portion of out-of-orderedness
is relatively high. As expected, higher amounts of
out-oforder events impact latency negatively. At the same time,
the cost is greatly improved compared to the original IWS
algorithm, which is unaware of event time. For smaller
business processes, such as BPI2020, the adaptive method
has low latency even with extreme out-of-orderedness.
However, with more complex business processes, such as
BPI2017, the non-adaptive method may be more sensical
from the latency perspective.</p>
        <p>Some threats of validation include the fact that only a
few datasets were used in this comparison. Furthermore,
the out-of-orderedness had to be mimicked because no
known process mining logs or streams that exhibit
outof-order events are publicly available. The swap settings
could be further investigated, as the probabilities and max
distances could increase orthogonally, not in correlation.</p>
        <p>One thing to address in future research would be the
fact that if multiple events have the same timestamp, then
the method should not blindly assume that the arrival
order within the timestamp is correct. This is seen, for
example, on the BPI2012 dataset, with many
simultaneous timestamps. Having a method that would be able to
ifnd the optimal solution from partial order would be a
further improvement to the introduced approach.</p>
        <p>Ultimately, as the results are positive, and the latencies
are generally low for most experiments, we believe this
method would be applicable for real-life use cases for
running conformance checking on distributed systems.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper introduced an approach for handling
outof-order event arrival in streaming conformance
checking. To the best of our knowledge, this is the first
conformance-checking approach that is aware of event
time and, thus, is able to handle such stream
imperfections. The contribution included a novel approach for
adapting the event time progress based on the frequency
of out-of-order event arrival. The approach was
implemented using the Beamline framework built on top of
adap.</p>
      <p>non-adap.</p>
      <p>non-aw.</p>
      <p>adap.</p>
      <p>non-adap.</p>
      <p>non-aw.</p>
      <p>adap.</p>
      <p>non-adap.</p>
      <p>non-aw.
0.0
1.6
6.0
13.7
24.9
44.9
48.7</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the European Social Fund
via "ICT programme" measure, the European Regional
Development Fund, and the programme Mobilitas Pluss
(2014-2020.4.01.16-0024).</p>
      <p>Apache Flink, making the method easily scalable and able
to handle large data volumes in stream processing.</p>
      <p>Future research aims to look at handling partial order
in event streams, as the current method assumes the
arrival order for events having the same event time is the
total order. Furthermore, a more extensive study could
be done of the swap settings in event streams and their
impact on out-of-order event handling.
[18] K. Raun, M. Nielsen, A. Burattin, A. Awad, C-3PA:
Streaming conformance, confidence and
completeness in prefix-alignments, in: International
Conference on Advanced Information Systems
Engineering, Springer, 2023, pp. 437–453.
[19] P. R. Winters, Forecasting sales by exponentially
weighted moving averages, Management science 6
(1960) 324–342.
[20] A. Burattin, Streaming process mining with
beamline, ICPM Demos (2022).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          , Streaming process mining,
          <source>Process Mining Handbook</source>
          (
          <year>2022</year>
          )
          <fpage>349</fpage>
          -
          <lpage>372</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Awad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <article-title>Process mining over unordered event streams</article-title>
          ,
          <source>in: 2020 2nd International Conference on Process Mining (ICPM)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Awad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Traub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <article-title>Adaptive watermarks: A concept drift-based approach for predicting eventtime progress in data streams</article-title>
          .,
          <source>in: EDBT</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>622</fpage>
          -
          <lpage>625</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Soto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <article-title>A survey on transactional stream processing</article-title>
          ,
          <source>The VLDB Journal</source>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tufte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shkapenyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Papadimos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , D. Maier,
          <article-title>Out-of-order processing: a new architecture for high-performance stream systems</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>1</volume>
          (
          <year>2008</year>
          )
          <fpage>274</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fragkoulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Carbone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kalavri</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Katsifodi-
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Process mining: a 360 degree mos, A survey on the evolution of stream prooverview</article-title>
          ,
          <source>in: Process Mining Handbook</source>
          , Springer, cessing systems,
          <source>arXiv preprint arXiv:2008.00842</source>
          <year>2022</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>34</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Solti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          , [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Patroumpas</surname>
          </string-name>
          , T. Sellis,
          <article-title>Window specification Conformance Checking - Relating Processes and over data streams</article-title>
          , in: International Conference on Models, Springer,
          <year>2018</year>
          . Extending Database Technology, Springer,
          <year>2006</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <article-title>A framework for online 445-464</article-title>
          . conformance checking, in: International Confer- [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Carbone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Katsifodimos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ewen</surname>
          </string-name>
          , V. Markl, ence on Business Process Management, Springer, S. Haridi,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tzoumas</surname>
          </string-name>
          , Apache flink:
          <source>Stream and 2017</source>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>177</lpage>
          .
          <article-title>batch processing in a single engine, The Bulletin</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [4]
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bolt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          , of the Technical Committee on Data Engineering W. M. van der Aalst, Online conformance check-
          <volume>38</volume>
          (
          <year>2015</year>
          ).
          <article-title>ing: relating event streams to process models using</article-title>
          [15]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Strang</surname>
          </string-name>
          ,
          <article-title>Big data analytics serprefix-alignments, International Journal of Data vices for enhancing business intelligence</article-title>
          ,
          <source>Journal of Science and Analytics</source>
          <volume>8</volume>
          (
          <year>2019</year>
          )
          <fpage>269</fpage>
          -
          <lpage>284</lpage>
          .
          <source>Computer Information Systems</source>
          <volume>58</volume>
          (
          <year>2018</year>
          )
          <fpage>162</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Raun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tommasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awad</surname>
          </string-name>
          , I will survive: An [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Awad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Raun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <article-title>Eficient approxievent-driven conformance checking approach over mate conformance checking using trie data strucprocess streams</article-title>
          , in: DEBS, ACM,
          <year>2023</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          . tures, in: 2021 3rd International Conference on
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Isah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Abughofa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahfuz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ajerla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zulk- Process Mining</surname>
          </string-name>
          (ICPM), IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . ernine, S. Khan,
          <article-title>A survey of distributed data</article-title>
          [17]
          <string-name>
            <surname>W. van der Aalst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Adriansyah</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Dongen</surname>
          </string-name>
          ,
          <article-title>Restream processing frameworks, IEEE Access 7 playing history on process models for conformance (</article-title>
          <year>2019</year>
          )
          <fpage>154300</fpage>
          -
          <lpage>154316</lpage>
          .
          <article-title>checking and performance analysis</article-title>
          ,
          <source>WIREs Data Mining and Knowledge Discovery</source>
          <volume>2</volume>
          (
          <year>2012</year>
          )
          <fpage>182</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>