Towards Run-Time Adaptation of Complex Event Forecasting Manolis Pitsikalis1,* , Elias Alevizos1,2 , Nikos Giatrakos3 and Alexander Artikis4,1 1 NCSR Demokritos, Athens, Greece 2 The American College of Greece, Athens, Greece 3 Technical University of Crete, Chania, Greece 4 University of Piraeus, Greece Abstract Complex Event Forecasting (CEF) is a process whereby complex events of interest are forecast over a stream of simple events. CEF facilitates proactive measures by anticipating the occurrence of complex events. This proactive property, makes CEF a crucial task in many domains; for instance, in maritime situational awareness, forecasting the arrival of vessels at ports allows for better resource management, and higher operational efficiency. However, our worldโ€™s dynamic and evolving conditions necessitate the use of adaptive methods. For example, maritime vessels adapt their routes based on weather or human-caused factors. CEF systems typically rely on probabilistic models, trained on historical data. This renders such CEF systems inherently susceptible to data evolutions that can invalidate their underlying models. To address this problem, we propose RTCEF, a novel framework for Run-Time Adaptation of CEF, based on a distributed, service-oriented architecture. We evaluate RTCEF on a real-world maritime use-case and our reproducible results show that our proposed approach has significant benefits in terms of forecasting performance without sacrificing efficiency. Keywords complex event forecasting, run-time adaptation, distributed architecture 1. Introduction To address the above challenges we propose RTCEF an open-source framework for Run-Time Adaptation of CEF Complex Event Forecasting (CEF) is akin to Complex Event over constantly evolving data streams. RTCEF adopts a dis- Recognition (CER) [1, 2], but with a forward-looking per- tributed architecture comprising targeted services to effec- spective. Both tasks operate on a stream of simple events, tively (a) enable run-time update of CEF models with little while their output consists of Complex Events (CEs). For to no downtime, (b) ensure that transition between models example, in maritime situational awareness [3], the stream does not cause loss of ongoing forecasts. In other words, of simple events would contain positional messages of ves- RTCEF supports continuous adaptation to dynamic changes sels, while the output stream would contain maritime CEs in the input stream with little to no effect on efficiency. Fur- such as fishing activities. The difference between CER and thermore, RTCEF provides a trend-based policy which acts CEF is that, in the former, elements of the output stream as a decision making mechanism to distinguish whether refer to CE detections, while in CEF, elements of the output hyperparameter optimisation or CEF model retraining with- stream refer to the probability of a CE happening in the out changing hyperparameters is the best way to maintain future. Consequently, CER enables reactive responses upon accurate forecasts. We evaluate RTCEF on maritime situa- CE detections, while CEF supports proactive measures by tional awareness using real-world maritime data, and our anticipating future CEs. This proactive property renders results demonstrate that RTCEF has significant benefits for CEF systems highly desirable. CER and CEF applications CEF over constantly evolving conditions. span diverse domains, such as maritime situational aware- The remainder of this paper is organised as follows. In ness [4, 5] whereby CEs such as (illegal) fishing or vessel Section 2 we present the necessary background. Next, in Sec- rendezvous are detected or forecast over a stream of mar- tion 3 we introduce offCEFand RTCEF, while in Section 4 itime data; credit card fraud management [5, 6] whereby we present our experimental setting, and analyse our results. frauds are detected or forecast over a stream of transaction In Section 5 we mention works related to ours. Finally, in data; and so on. Section 6, we summarise and discuss future directions. CEF operates over constantly evolving conditions. More- over, CEF systems rely on probabilistic models trained on historical data [7, 8, 9]. This renders CEF systems inherently 2. Background susceptible to evolutions in the input that can invalidate their underlying models. Additionally, as with the majority CEF is a task that allows forecasting CEs of interest, such as of trainable models, CEF models have hyperparameters that fishing activities or vessel rendezvous, over an input stream require fine tuning for optimal performance. Wayeb [5], a of simple events; e.g., timestamped position messages of state-of-the-art CEF engine, is no exception to the above. maritime vessels. Forecasts involve the occurrence of a CE in the future accompanied by a degree of certainty [5]. Published in the Proceedings of the Workshops of the EDBT/ICDT 2025 This behaviour is usually derived from stochastic models Joint Conference (March 25-28, 2025), Barcelona, Spain that project into the future evolutions of the input that can * Corresponding author. cause a detection of a CE. For the task of CEF, we utilise $ manospits@iit.demokritos.gr (M. Pitsikalis); Wayeb, a CEF engine introduced in [5], which employs alevizos.elias@iit.demokritos.gr (E. Alevizos); ngiatrakos@tuc.gr (N. Giatrakos); a.artikis@iit.demokritos.gr (A. Artikis) symbolic automata as its computational model. The user ย€ https://manospits.github.io/ (M. Pitsikalis); submits a query/pattern to Wayeb which is then compiled http://users.softnet.tuc.gr/~ngiatrakos/ (N. Giatrakos); into a symbolic, streaming automaton. This automaton https://users.iit.demokritos.gr/~a.artikis/ (A. Artikis) may be used to perform event recognition, i.e., to detect  0000-0003-2959-2022 (M. Pitsikalis); 0000-0002-9260-0024 instances of pattern satisfaction upon a stream of input (E. Alevizos); 0000-0002-8218-707X (N. Giatrakos); 0000-0001-6899-4599 (A. Artikis) events. Whenever the automaton reaches a final state, a ยฉ 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License complex event is reported as having occurred. In order Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Table 1 a form of Variable-order Markov Models. Variable-order An example stream composed of five events. Each event has a Markov Models, compared to fixed-order Markov models, vessel identifier, a value for that vesselโ€™s speed and a timestamp. capture longer-term dependencies as in practice they al- low for higher order (๐‘š) values than the latter. Each node vessel ID 78986 78986 78986 78986 78986 ... in a PST contains a โ€œcontextโ€ and a distribution that indi- speed 5 3 9 14 11 ... cates the probability of encountering a symbol, conditioned timestamp 1 2 3 4 5 ... on the context. Figure 3 (top left) shows an example of a PST. Each โ€œsymbolโ€ of a PST corresponds to a predicate > of the automaton for which we want to build a probabilis- tic model. For example, the predicate (๐‘ ๐‘๐‘’๐‘’๐‘‘>10) may be speed > 10 speed > 10 such a โ€œsymbolโ€ for the pattern ๐‘…. The same predicate but start 0 1 2 negated i.e., ยฌ(๐‘ ๐‘๐‘’๐‘’๐‘‘>10) may be another such โ€œsymbolโ€. Figure 1: Streaming symbolic automaton created from the ex- Learning a PST from data is an incremental process that pression ๐‘… := (๐‘ ๐‘๐‘’๐‘’๐‘‘ > 10) ยท (๐‘ ๐‘๐‘’๐‘’๐‘‘ > 10). adds new nodes corresponding to symbols only when nec- essary [11, 6, 5]. The learning process involves two key hyperparameters. First, the pMin โˆˆ [0, 1] hyper-parameter to perform forecasting, Wayeb constructs a probabilistic which corresponds to a threshold determining which sym- model of the compiled automaton, by using part(s) of a bols are deemed to be โ€œtoo rareโ€ to be taken under consider- stream for training. The model allows us to infer, at any ation by the learning algorithm (symbols with a probability given moment, the possible paths that the automaton may of appearance less than pMin are discarded). Second, the ๐›พ follow in the future. By searching among the possible future hyperparameter is a symbol distribution smoothing param- paths, we can estimate when the automaton is expected eter. to reach a final state and thus report a CE. The output of With the resulting PST, for every state ๐‘ž of an automaton Wayeb thus consists of two streams: a) one reporting the and the last ๐‘š (order of the PST) symbols of the input stream, detected events, and b) one reporting the forecasts of events we can calculate the waiting-time distribution (๐‘Š๐‘ž ), that expected to occur in the future. is, the probability of reaching a final state in ๐‘› transitions Wayeb has clear, compositional semantics for the pat- from a state ๐‘ž. Recall that a CE is detected whenever an terns expressed in its language and can support most of the automaton reaches a final state. Figure 3 (middle and bottom common operators [1]. Wayebโ€™s patterns are expressed as left) shows an example of an automaton and the waiting- Symbolic Regular Expressions (SRE s), where terminal ex- time distributions learnt from a training dataset. Wayeb pressions are Boolean expressions, i.e., logical formulae that then performs CEF as follows. Given the current state ๐‘ž use the standard Boolean connectives of conjunction โ€˜โˆงโ€™, of an automaton, using ๐‘Š๐‘ž , we compute the probability of disjunction โ€˜โˆจโ€™ and negation โ€˜ยฌโ€™ on predicates [5]. Wayeb reaching a final state (๐‘๐ถ๐ธ ) within the next ๐‘› transitions SRE s are defined using the grammar below: (or, equivalently, input events). If ๐‘๐ถ๐ธ exceeds a confidence threshold ๐œƒfc โˆˆ [0, 1], Wayeb emits a โ€œpositiveโ€ forecast ๐‘… ::= ๐‘…1 + ๐‘…2 (union) | ๐‘…1 ยท ๐‘…2 (concatenation) (denoting that the CE is expected to occur), otherwise a | ๐‘…1* (Kleene-star)| !๐‘…1 (complement) โ€œnegative forecastโ€ (no CE is expected) is emitted. | ๐œ“ (Boolean expression) A forecast for a CE is characterised as a True Positive (TP ) if a positive forecast (i.e., the CE will occur in the ๐‘…1 , ๐‘…2 are regular expressions, and ๐œ“ is a Boolean expres- future) was emitted and the CE indeed occurred or, respec- sion. The semantics of the above operators are detailed tively, as a False Positive (FP ) if the CE did not occur. A in [5]. Evaluation of SRE s on a stream of events requires forecast for a CE is characterised as a True Negative (TN ) first their compilation into symbolic automata. Transitions if a negative forecast is emitted (i.e., the CE will not occur in symbolic automata are labeled with Boolean expressions. in the future) and the CE does not occur or, respectively, For a symbolic automaton to move to another state, it first as a False Negative (FN ) if the CE does occur. Note that a applies the Boolean expressions of its current stateโ€™s outgo- forecast cannot be evaluated as TP , FP , TN or FN upon ing transitions to the element last read from the stream. If an its emission. It can be evaluated as such after the next ๐‘› expression is satisfied, then the corresponding transition is input events have arrived, at which point we can know triggered and the automaton moves to that transitionโ€™s tar- whether the forecast event did occur or not. Since Wayeb get state. For example, in maritime situational awareness, performs both CEF and CER, forecasts are evaluated on-the- a domain expert could use Wayebโ€™s language to specify a fly. Using these classifications of forecasts the performance pattern ๐‘… := (๐‘ ๐‘๐‘’๐‘’๐‘‘ > 10) ยท (๐‘ ๐‘๐‘’๐‘’๐‘‘ > 10) for identifying of CEF may be quantified through Matthewโ€™s Correlation speed violations in specific areas where the maximum al- Coefficient (MCC ), which is defined as follows: lowed speed is 10 ๐‘˜๐‘›๐‘œ๐‘ก๐‘ . This pattern is satisfied when there โˆš๏ธ€ are two consecutive events where a vesselโ€™s speed exceeds ๐‘€ ๐ถ๐ถ = Precision ร— Recall ร— Specificity ร— NPV โˆš the threshold. The compiled automaton corresponding to ๐‘… โˆ’ FDR ร— FNR ร— FPR ร— FOMR (1) is illustrated in Figure 1. For an input stream consisting of the events in Table 1, the automaton would run as follows. where NPV = TNTN +FN , Specificity = TNTN+FP , FDR = For the first three input events, the automaton remains in 1 โˆ’ Precision, FNR = 1 โˆ’ Recall, FPR = 1 โˆ’ Specificity state 0. After the fourth event, it moves to state 1 and after and FOMR = 1โˆ’NPV . Precision and Recall are defined as the fifth event it reaches its final state, state 2, triggering usual. Therefore, MCC โˆˆ [โˆ’1, 1] estimates the agreement, also a CE detection for ๐‘… at timestamp = 5. in which case MCC = 1, (or disagreement, where resp. To perform CEF, Wayeb needs a probabilistic description MCC = โˆ’1) between the emitted forecasts and observa- for a symbolic automaton derived from a SRE . For this pur- tions. In contrast to F1-Score, which takes into account only pose, Wayeb employs Prediction Suffix Trees (PSTs) [10, 11]โ€“ positive instances, MCC takes into account both positive Performance Metric Performance Metric Performance Metric Dinit a(x) value next micro-benchmark Completed micro- benchmarks x x x x (b) Fitting GPR around micro- (c) The acquisition function (d) A GPR concludes after (a) Before ๐ท๐‘–๐‘›๐‘–๐‘ก , shaded area benchmarks in ๐ท๐‘–๐‘›๐‘–๐‘ก (red chooses next micro- shows uncertainty of per- a number of BO micro- lines). Blue line shows ex- benchmark ๐‘ฅ of high benchmarks. formance (๐‘“ (๐‘ฅ)) values. pected performance. uncertainty. Figure 2: Bayesian Optimisation Operation. Model Factory (Offline) Controller (offline) Wayeb Server BO optimiser where ๐‘š, ๐œƒfc , pMin and ๐›พ are Wayebโ€™s hyperparameters โ€ข Learn a prediction suffix tree Hyper- Acquisition function (see Section 2) with their domain empirically set as: Perform CEF under the stationarity assumption ฮต, (0.6, 0.4) parameters Acq. func. value a, (0.7, 0.3) b, (0.5, 0.5) [m, ฮธfc, next micro-benchmark aa, (0.75, 0.25) ba, (0.1, 0.9) pMin, ฮณ] โ€ข Estimate waiting time distributions ๐‘š โˆˆ [1, 5] ๐œƒfc โˆˆ [0.0, 1.0] a a a Historical pMin โˆˆ [0.0001, 0.01] ๐›พ โˆˆ [0.0001, 0.01] a b b b data C Given the infinite parameter combinations, exhaustive TRAIN st a r t 0 1 2 3 4 ยฌconvergence ? new micro-bench.:deploy search is computationally prohibitive. Furthermore, due b a opt. model โ€ข Construct forecasts b 1 state:0 Saved BO (GPR) Model to Wayebโ€™s complexity, performance for a given parameter Completion Probability interval:5,12 models set cannot be known beforehand. Consequently, to find 0.8 state:1 0.6 state:2 Score state:3 0.4 Report the optimal configuration ๐‘ we employ Bayesian optimisa- MCC score tion (BO) [12, 13] i.e., a stochastic method for optimising 0.2 TEST 0 1 2 3 4 5 6 7 8 9 10 11 12 Number of future events C expensive-to-evaluate objective functions that are complex or cannot be described by analytic formulae. In our work, Figure 3: Architecture of the offline CEF optimiser (offCEF). the objective function is defined as ๐‘“ (๐‘) = MCC ๐‘ , where MCC ๐‘ denotes the MCC score of Wayeb given configura- tion ๐‘. and negative instances. Since Wayeb produces both positive The goal of BO is to find the vector of Wayebโ€™s hyperpa- and negative forecasts MCC is a fitting choice. rameters that maximises CEF performance, using a minimal Given the above, the hyperparameters required for train- set of Wayeb training-test runs, termed โ€˜micro-benchmarksโ€™, ing Wayeb models, i.e., PSTs, are the following. The max- as training samples. Unlike other optimisation methods [14] imum order ๐‘š of the PST, along with the symbol retain- BO does not require a high number of micro-benchmarks ing probability threshold pMin, the symbol distribution or an analytical formula [12, 13, 6]. BO employs a proba- smoothing parameter ๐›พ and the confidence threshold ๐œƒfc . bilistic modelโ€”called surrogate modelโ€”to approximate the The naive way to train a Wayeb PST is to manually fix the unknown objective function, in our case CEF performance values of these hyperparameters and then select a training quantified by MCC , and iteratively refines this model. We dataset from which a PST may be extracted. This process employ a Gaussian Process Regressor (GPR) as the surrogate can be performed offline and Wayeb may then employ the model. Initial beliefs about the objective function must be learnt PST for online event forecasting. As we explain below, formulated before observing any data. In BO, priors are this is not the proper way to go. often specified for the mean and covariance functions of the Gaussian Process model. For example, a prior belief might suggest that the function is smooth and lies within a certain 3. Run-Time CEF Adaptation range of values. Priors are represented as: We first present offCEF, a baseline framework for hyperpa- ๐‘“ (๐‘) โˆผ GP(๐œ‡0 (๐‘), ๐‘˜0 (๐‘, ๐‘โ€ฒ )) rameter optimisation of CEF under the stationarity assump- where ๐œ‡0 (๐‘) and ๐‘˜0 (๐‘, ๐‘โ€ฒ ) are the prior mean and covariance tion, i.e., assuming that there are no evolutions in the input (kernel) functions, respectively. that might invalidate the CEF model. Subsequently, we Every time we observe a new micro-benchmark and col- present RTCEF, which addresses all challenges of run-time lect CEF performance metrics by training and testing Wayeb CEF. given a configuration ๐‘, we acquire a new training sample (๐‘, MCC ๐‘ ), to fit on the GPR, thereby updating our poste- 3.1. CEF Under the Stationarity Assumption rior belief in light of new evidence. The posterior distri- Under the stationarity assumption, a single PST, produced bution represents our updated knowledge about Wayebโ€™s through training on some historical, static dataset, will suf- performance and after observing ๐‘› new training samples, fice for future input. Consequently, in this setting, we may denoted by Data, the posterior is given by: use a framework for offline hyperparameter optimisation, ๐‘“ (๐‘) | Data โˆผ GP(๐œ‡๐‘› (๐‘), ๐‘˜๐‘› (๐‘, ๐‘โ€ฒ )) hereafter offCEF. The aim of offCEF is the identification of an optimal configuration ๐‘๐‘œ๐‘๐‘ก that yields the best per- ๐œ‡๐‘› (๐‘) and ๐‘˜๐‘› (๐‘, ๐‘โ€ฒ ) being the posterior mean and covari- formance for Wayeb, quantified by the MCC score (see ance functions updated through Bayesian inference [12, 13]. Equation (1)). A configuration ๐‘ is defined as follows: For selecting training samples, we start by randomly pick- ing points from the input parameter domain, and then exe- ๐‘ = [๐‘š, ๐œƒfc , pMin, ๐›พ] cute the respective micro-benchmarks and observe Wayebโ€™s MCC scores. We call this initial set of configurations Complex event forecasting Input Output ๐‘, paired with MCCc scores, ๐ท๐‘–๐‘›๐‘–๐‘ก . Subsequently, using stream Wayeb stream Bayesian inference, the first posteriors are calculated and the expected result is illustrated by comparing the prior in Models Reports Figure 2a against the posterior in Figure 2b. Change After ๐ท๐‘–๐‘›๐‘–๐‘ก , the next micro-benchmarks are selected us- Collector Model Factory Detector ing an acquisition function ๐‘Ž(๐‘). The acquisition function guides the selection of the next evaluation point by quan- Commands Scores tifying the utility of sampling a particular point ๐‘ฅ in the Datasets Instructs input space i.e., the domain of Wayebโ€™s configurations. ๐‘Ž(๐‘) Controller balances exploration and exploitation. Exploration involves Data Metrics collection Optimisation and re-training sampling ๐‘ configurations in the input space that are not monitoring yet well-explored or that have high uncertainty associated Figure 4: Architecture of RTCEF. Cylinders and rounded rectan- with them, while exploitation involves sampling ๐‘ points gles denote topics and services respectively. For simplicity, we that are likely to yield the best objective function values omit synchronisation topics; instead we use gray arrows. exploiting the current knowledge. For instance, in the plot of Figure 2c the acquisition function chooses the point in the input domain with the highest uncertainty. Different acqui- them. Synchronisation of the various services is denoted by sition functions introduce stochasticity in the BO process dotted arrows in Figure 4. Below we describe in detail the by incorporating uncertainty estimates from the probabilis- services comprising our framework. tic model. BO concludes either when a micro-benchmark Change Detector. In order to determine whether the MCC budget is depleted or when the value of ๐‘“ (๐‘) converges. Fig- score of Wayeb has deteriorated, the quality of its forecasts ure 2d illustrates a GPR with minimal uncertainty around must be monitored. This task is handled by the Change its mean values, after the microbenchmark budget has been Detector service (right of Figure 4) which consumes MCC depleted. scores from the โ€˜Reportsโ€™ topic and, produces โ€˜retrainโ€™ or โ€˜op- Figure 3 illustrates the architecture of offCEF, compris- timiseโ€™ instructions as indicated in Algorithm 1. Essentially, ing a Model Factory alongside a Controller. The Model Fac- a retrain instruction requests a new PST for Wayeb with- tory includes a Wayeb Server that utilises historical training out changing training hyperparameters. An optimisation and validation datasets to construct and evaluate PSTs. The instruction, requests a new PST, produced through hyper- Controller, includes the BO optimiser which is executed parameter optimisation. Note that while hyperparameter offline on a historical dataset. The Controller initialises BO optimisation will provide the best possible hyperparameters, by providing a set of configurations i.e., ๐‘ vectors to the it can be a costly procedure. On the other hand, retraining Model Factory, which, respectively, conducts the prescribed on an updated dataset is a cheaper process. micro-benchmarks, saves temporarily the candidate PSTs, The decision process of Algorithm 1 is summarised in and sends reports to the Controller. The Controller will use Figure 5, while an illustrative execution of Algorithm 1 for these reports for updating the GPR surrogate model of BO. maritime situational awareness is presented in Figure 6. offCEF deploys the PST that is expected to maximise We describe Algorithm 1 following the example execution MCC based on the hyperparameter combination value vec- of Figure 6. The Change Detector continuously consumes tor ๐‘๐‘œ๐‘๐‘ก calculated by BO. On the other hand, offCEF suf- MCC scores from Wayeb and retains the ๐‘˜ most recent fers from several disadvantages: (i) it drives its decisions MCC scores to evaluate the performance trend. In the by attributing equal importance to cumulative performance example of Figure 6, Wayeb begins with a PST, referred metric statistics, while in a streaming setup we often need to as PST๐‘ค0 , created using configuration ๐‘๐‘ค0 . The Change to take into consideration only a sliding window of recent Detector records the MCC Score at ๐‘ค0 , however at this point measurements and defy obsolete ones; (ii) it cannot optimise no decision is made since fewer than ๐‘˜ = 3 scores have been CEF hyperparameters at run-time which is a crucial limita- collected. Once the Change Detector has at least ๐‘˜ scores, it tion, since fluctuations in the inputโ€™s statistical properties computes the first degree polynomial ๐‘ง๐‘– (๐‘ฅ) = ๐‘Ž๐‘– ๐‘ฅ + ๐‘๐‘– (a in streaming settings is the norm rather than an infrequent trend line) so that ๐‘Ž๐‘– and ๐‘๐‘– minimise the following squared situation; (iii) it cannot distinguish whether the hyperpa- error: rameters for training PSTs should be adjusted through BO or ๐‘—=๐‘˜ if it is only the Wayebโ€™s PST that should be retrained, with- โˆ‘๏ธ [๏ธ€ ]๏ธ€2 ๐ธ= ๐‘ง๐‘– (๐‘ฅ๐‘— ) โˆ’ ๐‘ฆ๐‘— out changing hyperparameters. RTCEF, presented below, ๐‘—=0 addresses these issues. for ๐‘ฅ๐‘— = ๐‘— and ๐‘ฆ๐‘— = score ๐‘–โˆ’๐‘˜+๐‘— , where ๐‘– is an increasing integer denoting the ID of the current score (lines 9, 10). If 3.2. CEF Over Evolving Data Streams the slope (๐‘Ž๐‘– ) of ๐‘ง๐‘– (๐‘ฅ) is negative, indicating decrease in performance, and less than a max _slope โˆˆ Rโˆ’ parameter We propose RTCEF, which is built with three major goals (line 11) then a โ€˜retrainโ€™ instruction is produced (lines 15- in mind. First, it updates at run-time PSTs according to 17). In the example, by week ๐‘ค2 , the Change Detector has input data evolutions; second, it performs CEF without dis- MCC scores for ๐‘ค0 , ๐‘ค1 , ๐‘ค2 . Using these points, the Change ruptions, i.e., PST updating does not cause delays on CEF; Detector computes the trend line ๐‘ง๐‘ค2 with a slope ๐›ผ๐‘ค2 = and third, it does not overuse resources for producing new โˆ’0.04 which is steeper than max _slope = โˆ’0.02. To PSTs. The architecture of RTCEF consists of five main ser- remedy this behaviour, the Change Detector issues a retrain vices, acting as Kafka producers and consumers, running instruction. As a result, a new PST, referred to as PST๐‘ค2 , synergistically to ensure undisrupted CEF and dynamic PST is created using the same configuration as PST๐‘ค0 , i.e., ๐‘๐‘ค0 . retraining or hyperparameter optimisation. Figure 4 illus- This occurs because retraining updates the PST without trates these services and the communication links between MCCi 0.8 Wayeb rt opt YES MCCi < NO (ai, bi) = computeTrend( optimise trend trend grace min_score [MCCi-k,.., MCCi]) MCC 0.6 grace YES retrain NO YES ai < max_slope 0.4 ๐›ผ๐‘ค2 โ‰ƒ โˆ’0.04 grace > 0 ๐›ผ๐‘ค4 โ‰ƒ โˆ’0.05 NO ๐‘ค0 ๐‘ค1 ๐‘ค2 ๐‘ค3 ๐‘ค4 ๐‘ค5 ๐‘ค6 no op Weeks Figure 5: Decision process of the Change Detector. Pink and Figure 6: Execution sample of the Change Detector for mar- green parallelograms correspond to input and output information itime situational awareness. โ€˜optโ€™ and โ€˜rtโ€™ stand for โ€˜optimisationโ€™ respectively, rhombi correspond to conditionals, and rectangles and โ€˜retrainingโ€™ respectively. Dashed dotted, and dashed lines correspond to function calls. correspond to the trend lines associated with the retrain and op- timisation instructions at ๐‘ก2 and ๐‘ก4 respectively, while black lines correspond to grace periods initiated at the issuance of retrain or optimisation instructions. modifying Wayebโ€™s hyperparameters. Intuitively, forecasting performance deterioration i.e., ๐‘Ž๐‘– < max _slope, shortly after a new PST deployment, indicates that the new PST failed and hyperparameter opti- a report is lower than a threshold min_score (line 6) then misation should thus be performed. We place each newly the Change Detector asks directly for โ€˜optimisationโ€™ and deployed PST in a grace period (lines 14,17). A grace pe- omits a โ€˜retrainโ€™ instruction. riod starts after a PST is deployed, and ends after grace_n Wayeb. The CEF part of RTCEF (top of Figure 4) contains performance reports. If the performance of a PST under Wayeb. In addition to reading timestamped simple events a grace period deteriorates (๐‘Ž๐‘– < max _slope) then a hy- from the input stream and producing an output stream of perparameter optimisation instruction is produced (lines CE forecasts, Wayeb produces a stream of CEF forecasting 12,13). If on the other hand, ๐‘Ž๐‘– < max _slope is satisfied af- performance reports and continuously monitors the โ€˜Mod- ter grace_n reports, then a โ€˜retrainโ€™ instruction is produced elsโ€™ topic, which contains updated PSTs. When a new PST is for which a new โ€œgraceโ€ period begins. In the example of made available in the Models topic, Wayeb replaces its PST with the latest available version. Recall that, to produce a Algorithm 1 Change Detector service CE forecast, Wayeb will utilise both the automaton corre- sponding to the symbolic regular expression defining a CE Require: ๐‘˜, grace_n, ๐‘š๐‘Ž๐‘ฅ_๐‘ ๐‘™๐‘œ๐‘๐‘’, min_score and the PST. The automaton retains information about the 1: scores โ† [] current state (๐‘ž) as well as the next states that can lead to an 2: grace โ† โˆ’1 accepting run, while the PST is used for producing the next 3: while True do symbol probabilities and therefore the waiting-time distri- 4: score ๐‘– โ† consume(Reports) bution for state ๐‘ž (๐‘Š๐‘ž ). Notably, the process of replacing 5: scores.update(score ๐‘– , k) the PST with a new version can be executed in linear time 6: pit_cond โ† score ๐‘– < min_score with respect to the number of runs and in practice happens 7: slope_cond โ† False in negligible time. 8: if grace โ‰ฅ 0 then grace โ† grace โˆ’ 1 Collector. Training datasets evolve over time. Therefore, 9: if len(|๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’๐‘ |)> 2 then the data collection part of RTCEF (left of Figure 4) includes 10: (๐‘Ž๐‘– , ๐‘๐‘– ) โ† fit_trend(scores) the Collector service, a data processing module organising 11: slope_cond โ† ๐‘Ž๐‘– < max _slope and storing subsets of the input stream that may be used 12: if (slope_cond and grace โ‰ฅ 0) or pit_cond then for retraining or hyperparameter optimisation. Therefore, 13: send(โ€œinstructionsโ€, โ€œoptimiseโ€) the Collector service consumes the input stream (see โ€˜Data 14: grace โ† grace_n โ— New grace period collectionโ€™ in Figure 4), in parallel to Wayeb, and stores 15: else if slope_cond then subsets of it in time buckets. The Collector gathers data in 16: send(โ€œinstructionsโ€, โ€œretrainโ€) a sliding window manner, emitting a new dataset version to 17: grace โ† grace_n โ— New grace period the โ€˜Datasetsโ€™ topic as soon as the last bucket in the range is full. Old buckets that no longer serve a purpose for training, Figure 5 a grace period begins at week ๐‘ค2 and will last for are deleted for space economy. grace_n = 4 reports, i.e., until ๐‘ค5 . While PST๐‘ค2 shows Controller. The Controller service, based on the instruc- improvement at ๐‘ค3 , at ๐‘ค4 performance drops again. The tions of the Change Detector, initialises hyperparameter performance drop is also confirmed by the slope of -0.05 com- optimisation procedures, during which it also serves as the puted by the Change Detector using the MCC scores from Bayesian optimiser, or retraining procedures, where it sup- ๐‘ค2 to ๐‘ค4 . Since the slope ๐›ผ๐‘ค4 is again below max _slope, plies Wayeb configurations. When optimisation is required, but this time a grace period is active, the Change Detector the Controller initiates the following three phases. issues a hyperparameter optimisation instruction instead of Initialisation phase: The Controller sets up the Bayesian retraining. Consequently, a new PST๐‘ค4 is produced through optimiser. Similar to [15], we leverage micro-benchmarks hyperparameter optimisation, resulting in an updated con- from previous runs. Using the retain_percentage โˆˆ figuration ๐‘๐‘ค4 , and a new grace period starting at ๐‘ค4 . Fi- [0, 1] parameter, we uniformly keep โŒŠretain_percentage * nally, to avoid pitfalls whereby the score drops suddenly all _samplesโŒ‹ observations from the last executed BO very low, we employ an additional condition: if the score of run, where all _samples is the total number of micro- offCEF RTCEF rt opt 1 1 1 1 0.8 0.8 0.8 0.8 Improv. (%) MCC 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 0 0 200 200 2K 200 0 0 0 0 4 8 12 16 20 24 12 16 20 24 2 6 16 20 24 2 6 10 24 2 6 10 14 18 Week (MD 0 ) Week (MD 2 ) Week (MD 3 ) Week (MD 5 ) Figure 7: Experimental results for datasets MD 0/2/3/5 with ๐‘…port . โ€˜rtโ€™ and โ€˜optโ€™ stand for โ€˜retrainโ€™ and โ€˜optimisationโ€™ respectively. The plots show MCC (upper part) and MCC improvement (lower part) of โ€˜RTCEFโ€™ relative to โ€˜offCEFโ€™ over time. benchmarks. This allows us to accelerate optimisation, and 1 RTCEF offCEF 1 RTCEF offCEF retain useful information from previous runs. Avg. MCC Avg. MCC Step phase: The Controller issues โ€˜train & testโ€™ commands 0.5 0.5 along with the hyperparameters suggested by the acqui- sition functionโ€”in our case the acquisition function is a combination of lower confidence bound, expected improve- 0 0 0 1 2 3 4 5 0 1 2 3 4 5 ment, and probability of improvement. After each โ€˜train MD ๐‘– - ๐‘…port MD ๐‘– - ๐‘…fish & testโ€™ command the Controller awaits the corresponding performance report i.e., the value of the f (c) objective func- 1.5 v vw/m m opt rt tion. Upon receiving the performance report, the optimiser Count (x1000) 0.2 MPPT (%) is updated with the new sample and the hyperparameters 1 for the next step are suggested. The step phase ends when 0.5 0.1 all micro-benchmarks are completed or if convergence is achieved. 0 Finalisation phase: Once optimisation concludes and the 0 4 8 12 16 20 24 0 0 1 2 3 4 5 best hyperparameters are acquired, the Controller sends Week (MD 0 ) - ๐‘…port MD ๐‘– - ๐‘…port a finalisation message containing the ID of the best PST. Additionally, the Controller updates the previously best Figure 8: Avg MCC (top) per MD ๐‘– for Rport and Rfish . Dataset hyperparameters with the newly acquired ones, ensuring and CER characteristics (bottom left). โ€˜vโ€™, โ€˜vw/mโ€™ and โ€˜mโ€™ stand availability of the latter for subsequent โ€˜retrainโ€™ instructions. for โ€˜vesselsโ€™, โ€˜vessels with matchesโ€™ and โ€˜matchesโ€™ respectively. Model Factory. Similar to offCEF the primary function MPPT i.e., mean percentage of time spent every four weeks for of the Model Factory service is to train, test and send up- production of models per MD ๐‘– for ๐‘…port (bottom right). to-date PST to Wayeb. To do this, it will assemble and use the latest dataset version produced by the Collector. Upon receiving a โ€˜trainโ€™ command, the Model Factory trains a PST Identification System) messages transmitted between Oc- on the latest dataset and shares this new PST version with tober 1st 2016 and 31st March 2016 (6 months), from 5K Wayeb. vessels sailing in the Atlantic Ocean around the port of For PST production through hyperparameter optimisa- Brest, France [16]. AIS allows the transmission of infor- tion, upon receiving an โ€˜initialisationโ€™ message, the Model mation such as the current speed, heading and coordinates Factory โ€˜locksโ€™ the most recent assembled dataset so that of vessels, as well as, ancillary static information such as destination and ship type. We evaluate RTCEF on a mar- the same dataset is used throughout the optimisation pro- itime pattern, which expresses the arrival of a vessel at the cedure. Next, during the โ€˜stepโ€™ phase, the Model Factory main port of Brest [6]. This pattern is derived after discus- trains, saves and tests candidate PSTs on the locked dataset sions with domain experts from a large, European maritime and reports MCC scores to the Controller. Finally, when service provider [4, 17]: the BO โ€˜finalisationโ€™ message, including the ID of the best performing PST, is received, the Model Factory sends the ๐‘…port := (ยฌInPort(Brest))* ยท (ยฌInPort(Brest)) ยท (2) best PST to Wayeb. It is only at this point, that Wayeb will (ยฌInPort(Brest)) ยท (InPort(Brest)) stop momentarily for PST replacement. InPort(Brest) is true when a vessel is within 5 km from the port of Brest. Recall that โ€˜ยฌโ€™, โ€˜* โ€™ and โ€˜ยทโ€™ correspond to 4. Experimental Evaluation negation, iteration (Kleene star) and sequence respectively (see Section 2). Consequently, ๐‘…port is satisfied if a sequence We evaluate our framework on maritime situational aware- of at least three events occur. At least two require the vessel ness, where maritime CEs of interest are forecast over real to be away from the portโ€”thus limiting false positives from noisy entrancesโ€”, while the last denotes that the vessel has vessel position streams. entered the port. This CE is important for port management and logistics reasons. We also perform experiments for a 4.1. Experimental Setup We use a real-world, publicly available, maritime dataset containing 18M spatio-temporal positional AIS (Automatic CE named ๐‘…fish defined as follows: of the initial model of the MD 3 dataset on the lack of ves- sels passing through the monitoring area on that period ๐‘…fish := (ยฌInArea(Fishing))* ยท (ยฌInArea(Fishing)) ยท (see Figure 8 bottom-left). Figure 8 top-left, shows that the (ยฌInArea(Fishing)) ยท average MCC for each dataset MD ๐‘– (๐‘– โˆˆ [0, 5]) when us- (InArea(Fishing) โˆง ยฌSpeedRange(Fishing))* ยท (InArea(Fishing) โˆง SpeedRange(Fishing)) ing RTCEF is consistently higher than that achieved via a (3) single model trained only on the first four weeks of each dataset (offCEF). In Figure 8 (top-right) we report results InArea(Fishing) is true when a vessel is within a fishing concerning the ๐‘…fish CE. For the ๐‘…fish pattern (see Defini- area, while speedRange is a predicate satisfied when the ves- tion (3)) there are no data evolutions in the input that affect sel has fishing speed [4]. Therefore, ๐‘…fish is satisfied when CEF performance, therefore in this case, the results show initially a vessel is outside a fishing area, then the vessel that when data evolutions that affect model performance at enters the fishing area and; at some point while it is within run-time are not present, the use of RTCEF does not affect the fishing area, it has fishing speed. Monitoring (illegal) forecasting performance. fishing is important for environmental and sustainability Concerning processing efficiency, interruptions in CEF reasons. are minimal as retraining or optimisation procedures take To cross validate our approach, we create 6 datasets place in parallel to CEF, thus efficiency and throughput of MD ๐‘– , ๐‘– โˆˆ [0, 5] by shifting the starting month in a cyclic CEF remain unaffected. However, when a new PST request manner: arises, new PST versions arrive with some delay. Recall, that until a new PST is available, Wayeb consumes the input โƒฆ๐‘—=5 MD ๐‘– = โƒฆ๐‘—=0 month (๐‘—+๐‘–) mod 6 stream, in parallel to the PST update procedures, with the where โ€– denotes the operation of concatenating two datasets, already deployed PST. Figure 8 (bottom-right) shows the and monthk corresponds to month ๐‘˜ of the original dataset. mean percentage of time spent every four weeks for pro- We perform offline hyperparameter optimisation with duction of PSTs (we denote this value as MPPT) involving offCEF on the first four weeks of each dataset MD ๐‘– and use the ๐‘…port pattern. The results show that every four weeks, the resulting model, hyperparameters and micro-benchmark on average less than 0.2 % of time is spent for PST produc- samples for initialisation. To showcase, the benefit of RTCEF, tion (roughly 80 minutes in a period of four weeks) for all we additionally perform CEF with static models yielded by datasets MD ๐‘– . Consequently, RTCEF spends minimal time offCEF (see Section 3.1): i.e., for each MD ๐‘– we adopt the every four weeks for PST production, thus ensuring minimal stationarity assumption and perform CEF using the corre- delays and a resource-friendly behaviour as optimisation or sponding initial model of each dataset. In what follows, the retraining procedures are not overperformed. experiments that utilise the run-time optimisation frame- work are labelled with โ€˜RTCEFโ€™ while experiments that are performed only with offline optimised static models are 5. Related Work labelled with โ€˜offCEFโ€™. offCEF and RTCEF are implemented in Python 3.9.18, The problem addressed in this paper pertains to concept while the Kafka version was 3.5.2. Messages are formatted drift, i.e., evolutions in the data that invalidate the deployed in JSON, and serialised/deserialised using Apache AVRO model [18]. Our work is the first that tackles this problem format. For BO, we use the scikit-optimize library 0.9.0. specifically for CEF. For example, the work of Stavropoulos The experiments are conducted on a server running Debian et al. [6] allows for offline CEF optimisation but does not 12 with an AMD EPYC 7543 32-Core Processor and 400G allow run-time adaptation on dynamically evolving data of RAM. Each service of RTCEF runs on its own dedicated streams, while concerning CEF optimisation itself, com- core. Our framework is open-source and our experiments pared to our framework, it offers only a very restricted are reproducible1 . set of functionalities. EasyFlinkCEP [19], similar to RTCEF, uses BO to optimise the parallelism of FlinkCEP programs but lacks support for forecasting, focusing only on system- 4.2. Experimental Results oriented metrics (e.g., throughput). Herodotou et al. [20] Figure 7 shows the evolution of MCC over time for offer a comprehensive survey of machine learning-based ๐‘…port (see Definition (2)), along with the score improve- techniques, including BO, for tuning the performance of ments when using RTCEF as opposed to offCEF for the Big Data management systems. Existing CER optimisation MD 0/2/3/5 datasets respectively. For MD 0 โ€”the dataset in techniques focus on enhancing throughput i.e., the number its original orderโ€”RTCEF drammatically improves MCC of tuples processed per unit of time [21] while others focus up to โˆผ 300% following retraining and hyperparameter on reducing processing latency and efficiently managing optimisation procedures in weeks 5 and 6, respectively. A memory utilisation [21]. These approaches typically adapt similar pattern is observed on the MD 5 dataset. On dataset traditional query optimisation techniques such as early pred- MD 2 , although improvement is not as prominent as with icate evaluation and query rewriting to suit the context of MD 0/3/5 , on average RTCEF improves MCC (see Figure 8 CER. Giatrakos et al. [1] discuss techniques for executing top-left). On the MD 3 case, results show that the initial parallel CER efficiently in geo-distributed settings. Notably, PST, generated by offCEF underperforms on weeks 16 to none of the above address run-time CEF adaptation. 19. However, this behaviour is immediately cured when Forecasting covers several areas such as time-series fore- the Change Detector requests hyperparameter optimisation casting [22], general sequence prediction [23, 10], event in the first running week (see orange dot on week 16 of sequence prediction and point-of-interest recommenda- Figure 7 - MD 3 )โ€”this is due to the score being less than tions [24, 25]. However, such methods primarily focus on min_score (see Algorithm 1). We attribute the low scores input event forecasting rather than CEF. Process mining, 1 closely related to CEF [26], involves learning processes from Execution scripts in https://github.com/manospits/rtcef/tree/main/ activity logs and predicting process completions [27, 28]. scripts However, process mining often overlooks CE patterns. CEF [12] E. Brochu, V. M. Cora, N. de Freitas, A tutorial on aims to address such challenges, as outlined in various con- bayesian optimization of expensive cost functions, ceptual frameworks [29, 30, 31]. In our work specifically, we with application to active user modeling and hierar- address these challenges by utilising Wayeb, a CEF engine chical reinforcement learning, 2010. that employs high-order Markov models [5, 9]. Further- [13] P. I. Frazier, A tutorial on bayesian optimization, 2018. more, none of existing proposals (e.g., [7, 8, 32]) automate [14] L. Yang, A. Shami, On hyperparameter optimization run-time adaptation in resource-friendly way. of machine learning algorithms: Theory and practice, Neurocomputing 415 (2020) 295โ€“316. [15] M. Feurer, J. Springenberg, F. Hutter, Initializ- 6. Summary and Further Work ing bayesian hyperparameter optimization via meta- learning, AAAI 29 (2015). We presented, RTCEF, a novel framework for run-time opti- [16] C. Ray, R. Drรฉo, E. Camossi, A.-L. Jousselme, C. Iphar, misation of CEF. RTCEF involves several services running Heterogeneous integrated dataset for maritime intelli- synergistically for undisrupted run-time CEF and improved gence, surveillance, and reconnaissance, Data in Brief performance via lossless dynamic model updating. We eval- 25 (2019) 104141. uated our approach on maritime situational awareness use- [17] K. Patroumpas, A. Artikis, N. Katzouris, M. Vodas, case involving real-world data and our experimental results Y. Theodoridis, N. Pelekis, Event recognition for mar- show that there is a clear benefit using our framework as itime surveillance, in: EDBT, 2015. opposed to performing CEF with a single model in โ€˜offlineโ€™ [18] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, fashion. We release publicly our framework in an open- A. Bouchachia, A survey on concept drift adaptation, source fashion. ACM Comput. Surv. 46 (2014) 189:1โ€“189:38. For future work, we plan to investigate additional data [19] N. Giatrakos, E. Kougioumtzi, A. Kontaxakis, A. Deli- collection, and retrain vs optimisation policies. Furthermore, giannakis, Y. Kotidis, Easyflinkcep: Big event data we aim to integrate parallel BO. Finally, we want to evaluate analytics for everyone, in: CIKM, 2021. our framework on additional problems such as run-time [20] H. Herodotou, Y. Chen, J. Lu, A survey on automatic CER query optimisation. parameter tuning for big data processing systems, ACM Comput. Surv. 53 (2020) 43:1โ€“43:37. Acknowledgments [21] I. Flouris, N. Giatrakos, A. Deligiannakis, M. N. Garo- falakis, M. Kamp, M. Mock, Issues in complex event This work was supported by the CREXDATA project, which processing: Status and prospects in the big data era, J. received funding from the European Unionโ€™s Horizon Eu- Syst. Softw. 127 (2017) 217โ€“236. rope Programme, under grant agreement No 101092749. [22] D. C. Montgomery, C. L. Jennings, M. Kulahci, Intro- duction to time series analysis and forecasting, John Wiley & Sons, 2015. References [23] R. Begleiter, R. El-Yaniv, G. Yona, On prediction using variable order markov models, J. Artif. Intell. Res. 22 [1] N. Giatrakos, E. Alevizos, A. Artikis, A. Deligiannakis, (2004) 385โ€“421. M. N. Garofalakis, Complex event recognition in the [24] Z. Li, X. Ding, T. Liu, Constructing narrative event big data era: a survey, VLDB J. 29 (2020) 313โ€“352. evolutionary graph for script event prediction, in: [2] A. Margara, G. Cugola, Processing flows of informa- IJCAI, 2018. tion: from data stream to complex event processing, [25] B. Chang, Y. Park, D. Park, S. Kim, J. Kang, Content- in: DEBS, 2011. aware hierarchical point-of-interest embedding model [3] A. Artikis, D. Zissis (Eds.), Guide to Maritime Infor- for successive POI recommendation, in: IJCAI, 2018. matics, Springer, 2021. [26] W. Van Der Aalst, Process mining: discovery, confor- [4] M. Pitsikalis, A. Artikis, R. Dreo, C. Ray, E. Camossi, mance and enhancement of business processes, 2011. A.-L. Jousselme, Composite event recognition for mar- [27] A. E. Mรกrquez-Chamorro, M. Resinas, A. Ruiz-Cortรฉs, itime monitoring, in: DEBS, 2019. Predictive monitoring of business processes: A survey, [5] E. Alevizos, A. Artikis, G. Paliouras, Complex event IEEE Trans. Services Computing 11 (2018) 962โ€“977. forecasting with prediction suffix trees, VLDB J. 31 [28] C. D. Francescomarino, C. Ghidini, F. M. Maggi, F. Mi- (2022) 157โ€“180. lani, Predictive process monitoring methods: Which [6] V. Stavropoulos, E. Alevizos, N. Giatrakos, A. Artikis, one suits me best?, in: BPM, 2018. Optimizing complex event forecasting, in: DEBS, 2022. [29] L. J. Fรผlรถp, ร. Beszรฉdes, G. Toth, H. Demeter, L. Vidรกcs, [7] S. Pandey, S. Nepal, S. Chen, A test-bed for the eval- L. Farkas, Predictive complex event processing: a uation of business process prediction techniques, in: conceptual framework for combining complex event CollaborateCom, 2011. processing and predictive analytics, in: BCI, 2012. [8] V. Muthusamy, H. Liu, H. Jacobsen, Predictive pub- [30] Y. Engel, O. Etzion, Towards proactive event-driven lish/subscribe matching, in: DEBS, 2010. computing, in: DEBS, 2011. [9] E. Alevizos, A. Artikis, G. Paliouras, Wayeb: a tool for [31] M. Christ, J. Krumeich, A. W. Kempa-Liehr, Integrating complex event forecasting, in: LPAR, 2018. predictive analytics into complex event processing [10] D. Ron, Y. Singer, N. Tishby, The power of amnesia: by using conditional density estimations, in: EDOC Learning probabilistic automata with variable memory Workshops, 2016. length, Machine Learning 25 (1996) 117โ€“149. [32] Y. Li, T. Ge, C. X. Chen, Data stream event prediction [11] D. Ron, Y. Singer, N. Tishby, The power of amnesia, based on timing knowledge and state transitions, Proc. in: NIPS, 1993. VLDB Endow. 13 (2020) 1779โ€“1792.