Incrementally Predictive Runtime Verification? Angelo Ferrando1[0000−0002−8711−4670] and Giorgio Delzanno1[0000−0001−7030−1050] University of Genova, Italy forename.surname@unige.it Abstract. Runtime Verification is a lightweight formal verification tech- nique used to verify the runtime behaviour of software (resp. hardware) systems. Given a formal property, one or more monitors are synthe- sised to verify the latter against a system execution. A monitor can only conclude the violation of a property when it observes such a violation. Unfortunately, in safety-critical scenarios, this might happen too late for the system to react properly. In such scenarios, it is advised to use Pre- dictive Runtime Verification, where monitors are capable of anticipating (by using a model of the system) future events before actually observ- ing them. In this work, instead of assuming such a model is given, we describe a runtime verification workflow where the model is learnt and incrementally refined by using process mining techniques. We present the approach and the resulting prototype tool. 1 Introduction Runtime Verification (RV) [7] is a kind of formal verification technique that focuses on checking the behaviour of software/hardware systems. With respect to other formal verification techniques, such as Model Checking [11] and Theorem Provers [18], RV is considered more dynamic and lightweight. This is mainly due to its being completely focused on checking how the system behaves, while the latter is currently running. This is important from a complexity perspective. RV does not need to simulate the system in order to check all possible execution scenarios; but, it only analyses what the system produces (i.e., everything that can be observed of the system). This is usually obtained through monitors, that are nothing more than validation engines which, given a trace of events generated by the system execution, conclude the satisfaction (resp. violation) of a formal property of interest. In turn, a formal property is the formal representation of how we expect the system should behave. The monitor’s job is to verify at runtime whether such a property holds. Since monitors are usually deployed together with the system under analy- sis, they are well suited for checking properties that require to be continuously monitored. This is especially true in safety-critical scenarios, where a system’s ? Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). fault can cause injuries, loss of money and even deaths. A key example is au- tonomous and robotic systems, where reliability is vital [12], and the addition of monitors ensuring a correct behaviour is a valuable feature. Nonetheless, even though monitors are lightweight components, they are still an additional work- load for the system. This is not a problem for large systems, but it might be for smaller ones, such embedded systems; where the amount of available resources can be limited. For both increasing reliability and reducing the impact of the monitors on the system, an extension of standard RV named Predictive Runtime Verification (PRV) has been proposed e.g. in [31]. As the name suggests, PRV differs from RV because it does not only consider the events observed by the system execution, but it also tries to predict future events. By predicting future events, the resulting monitors are capable of concluding the verification sooner. Indeed, since these monitors can predict how the system is going to behave in the future, they can reduce the search space of possible future continuations and, hypothetically, conclude the satisfaction (resp. violation) by analysing shorter traces of events (w.r.t. the standard counterpart). By concluding earlier, pre- dictive monitors are a good choice: (i) for improving reliability in safety-critical scenarios, since the monitors can conclude a violation of a property before such a violation has even been observed; (ii) for reducing the workload introduced by the monitor, since the monitor can conclude the satisfaction (resp. violation) of the property sooner, it can be removed from the system and its resources can be reclaimed for other purposes. The problem with PRV is that it requires additional knowledge on the sys- tem in order to predict future events. Usually, this is represented through an abstraction, the model, which is manually created by an expert of the system. The problem with this approach is that such a model is not always available, and even when it is, it might not be specified in a convenient way (e.g. wrong formalism). One possible way to avoid errors due to human intervention in the model generation step is to resort once again to observations collected at run- time. The guiding principle here is to learn the model behavior by observing real execution traces so as to create a sort of closed loop in which logs are used to adjust the candidate models, models are used to predict faults with certain confidence level, the confidence level increases with the log size, etc. For this purpose, in this paper we present an initial study on how to use Process Mining (PM) to automate the model generation phase in practical ap- plications of PRV. PM is a rich research area which consists in – but is not limited to – studying and developing automated techniques to synthesise models from log files. Specifically, we focus on the branch of PM called Process Dis- covery. We show how it can be applied in the context of PRV, and we present the updated verification workflow; where, starting from a set of log files gener- ated by the system execution, we obtain a model that can be used to predict events for the monitor. We also show how the resulting approach can be used to obtain incrementally predictive monitors, where more the monitors are used to verify the system and more they improve at predicting future events, and thus, at concluding the satisfaction (resp. violation) of the analysed property. It is important to remark that, to avoid human intervention in the model generation phase, the proposed approach has an empirical nature by construction since it always works with partial information on the entire set of possible system behaviours. In theory, a precise model of the system can be obtained only as the limit of a sequence of possible inaccurate candidate models. For this reason, our validation procedure returns truth values enriched with confidence scores that depend on the frequency of observed events applied during the PM pre- processing phase. The remainder of this paper is structured as follows. Section 2 presents the preliminary notions of RV, PRV and PM. Section 3 shows our general verifica- tion workflow without selecting any specific formalism for properties and models. Section 3.1 instantiates our workflow with the most used formalisms in formal verification. Section 4 presents the prototype tool representing the implementa- tion of our approach. Section 5 positions this contribution with respect to the state of the art. Finally, Section 6 summarises the results of this work and points out to future directions. 2 Preliminaries 2.1 Runtime Verification and Predictive Runtime Verification A system is denoted by S, and its alphabet (all of its observable events) is denoted by ΣS (or Σ where there is no confusion). Given an alphabet, Σ, a trace, σ, is a sequence of events in Σ, and tr(Σ) is the set of all possible traces (the language) over Σ. Properties are denoted by ϕ, potentially with subscripts, and ϕ denotes their negation. Given an alphabet Σ, a property ϕ is satisfied by a trace σ over Σ, written σ |= ϕ, if ϕ is true in σ. The set JϕK = {σ | σ |= ϕ} contains the set of traces satisfying ϕ, and we denote that a particular trace σ satisfies a property ϕ as σ ∈ JϕK. A property ϕ can be specified in any formalism such that for a given alphabet Σ, and for any trace σ ∈ tr(Σ), the following two conditions hold: σ ∈ JϕK is decidable (1) σ ∈ JϕK ⇐⇒ σ ∈ / JϕK (2) Condition (1) states that, given a property ϕ specified with the formalism of choice, we can always test if a trace σ satisfies ϕ, i.e. σ belongs to the set of traces satisfying ϕ. This condition is mandatory since a monitor is defined upon the notion of trace acceptance. As we are going to show in Definition 1, a monitor requires to check if a trace satisfies the property under analysis, and this can be done only when condition (1) holds. Condition (2), instead, implicitly states that the formalism of choice has to be closed under negation and a trace σ satisfies a property ϕ, if and only if, σ does not satisfy its negation ϕ. The negation of properties will be used in Definition 2 to define monitors with predictive flavour, where we will combine a model ψ with the negation of a property ϕ to check for traces satisfying ψ but not ϕ. Definition 1 (Monitor). Let S be a system with alphabet Σ, and ϕ be a property. Then, a monitor for ϕ is a function M onϕ : tr(Σ) → B3 , where B3 = {>, ⊥, ?}:  >  ∀u∈tr(Σ) .σ • u ∈ JϕK M onϕ (σ) = ⊥ ∀u∈tr(Σ) .σ • u ∈ / JϕK  ? otherwise  where • is the standard trace concatenation operator. Intuitively, a monitor returns > if all continuations of σ satisfy ϕ; ⊥ if all possible continuations of σ violate ϕ; ? otherwise. Definition 1 describes a generic monitor that does not impose constraints on the formalism used. Consequently, we collapse the definition of tr(Σ) for representing finite and infinite traces depending on what is supported by the formalism that is used to define ϕ. Thus, if ϕ accepts only traces of infinite length, then tr(Σ) = Σ ω ; if ϕ accepts only traces of finite length, then tr(Σ) = Σ ∗ ; otherwise, tr(Σ) = Σ ∗ ∪ Σ ω . Let S be a system with alphabet Σ. We denote its model by ψ, and use JψK ⊆ tr(Σ) to indicate the set of traces recognised by ψ (i.e., ψ represents a formal abstraction of how S behaves). A model, ψ, can be specified in any formalism such that for a given alphabet, Σ, for any trace, σ ∈ tr(Σ), and for any property, ϕ, the following holds: σ ∈ JψK is decidable (3) JϕK ∩ JψK is computable (4) We indicate (4) via the use of a binary relation ⊗, that is, Jϕ ⊗ ψK = JϕK ∩ JψK. Often, PRV frameworks express their properties in Linear-time Temporal Logic (LTL) [22] (for example [17,31]); however, we took our inspiration from a PRV framework where both the System Under Analysis (SUA) and property are defined using Timed Automata (TA) [21]. The reason is that in works such as [17,31], the predictive aspect is not formalised through a model of the system, but as a set of finite suffixes. These suffixes are then concatenated to the given trace σ allowing the monitor to predict the initial part of the possible contin- uations u. Instead, in our work, as in [21], we explicitly represent the model without focusing only on the first events after σ, but by applying the prediction to the entire possible continuation. This can be obtained using a model of the system as input to the monitor, alongside the property to be verified. Informally, the model generates the set of event traces that can be observed by executing the system. We follow the definition of a predictive monitor from [21], however, in this paper we remain formalism-agnostic. Definition 2 (Predictive Monitor). Let S be a system with alphabet Σ, model ψ and let ϕ be a property. A predictive monitor for ϕ given ψ is a function, M onϕ,ψ : tr(Σ) → B3 , where B3 = {>, ⊥, ?}:  >  ∀u∈tr(Σ) .σ • u ∈ / Jϕ ⊗ ψK M onϕ,ψ (σ) = ⊥ ∀u∈tr(Σ) .σ • u ∈ / Jϕ ⊗ ψK  ? otherwise  The intuitive meaning of the return values is the same as in the non-predictive case (Definition 1). Note the use ⊗ in the definition. For instance, the case for > requires all traces σ • u not to be in JϕK ∩ JψK where ϕ represents the negation of ϕ. 2.2 Process Mining Process Mining (PM) [26] is a technique used in software engineering to auto- matically synthesise a formal model which denotes the system behaviour. Such analysis is usually performed on event logs generated by multiple executions of the system. In practice, by using data mining algorithms, knowledge is extracted by these logs and corresponding formal models are generated. These models represent an abstraction of the system, and can be used to understand how the system behaves. Usually, once such models are extracted, an expert of the system can use them to study and check whether the implemented system actually meets his/her expectations. This check can be manually performed by the expert, who inspects the extracted model and searches for inconsistencies, or it can be automatically performed through conformance checking algorithms. The power of PM lies in its being flexible, customisable and automated. Thus, we decided to study and develop its integration in the context of PRV. Mostly because, one of the drawbacks of PRV is the assumption that an existing model of the system exists. Moreover, since PM completely depends on the event logs generated by the system execution, more logs are used, and better (i.e., more precise) models are extracted. Because of this, PM does not only allow PRV to be applied when a model of the system does not exist, but it also makes PRV more robust and reliable. For instance, when the system under analysis presents dynamic aspects, to rely on a static model would eventually bring to wrong predictions. Instead, if the model is automatically obtained by observing how the system behaves, it may keep up with the change and be a more reliable representation of how the system is currently behaving. 3 Incrementally Predictive Runtime Verification When applying RV with a predictive flavour, we need a model of the system to predict future events. Without such a knowledge, each event would be considered observable in the future, and the monitor would need to evaluate all possible continuations of the current analysed trace. Nonetheless, how the model of the system is generated is not defined a priori. In previous works [6,30,21,31,17,10], the model of the system is assumed to be manually created by an expert of the system. Here, we show how the standard predictive runtime verification workflow can be enhanced using process mining techniques to automatically synthesise the model of the system. (2) (1) log files Process Mining Model Predictive Model (ψ) (3) (5) (4) ϕ (4) Monitor Verdict trace of events Fig. 1: Engineering steps to obtain incrementally predictive monitors. Figure 1 shows a general overview of our approach. At this level, we do not pick any specific formalism for representing the properties and the models (as it is done in Section 2), but we focus on the necessary steps to define the workflow for obtaining an incrementally predictive RV approach. Step 1: Model extraction As it is true for all process mining algorithms, ev- erything starts from a set of log files. These log files contain information about previous executions of the system, and can be used to predict future executions as well. Given such a set of log files, a process mining algorithm of choice can be se- lected to parse the logs and generate a model representing the system behaviour (Process Mining Model ). Depending on the algorithm, the resulting model may be different; even though usually the most common formalisms used to specify such models are Petri Net (PN) [20] and Directly-Follows Graph (DFG) [5]. Step 2: Predictive Model derivation The model extracted by the process min- ing algorithm (step (1)) may not be directly usable to synthesise a predictive monitor (e.g. formalism mismatch). If that is the case, one or more additional transformation steps are required1 . These steps aim to transform the Process Mining Model into a more suitable Predictive Model, which can be used to syn- thesise predictive monitors as we show in Section 2. Again, these models can be specified using different formalisms which depend on the kind of predictive monitor we want to synthesise. 1 Naturally, if the Process Mining Model is already defined in a suitable way for the predictive monitor (i.e., the Process Mining Model and Predictive Model coincide), the entire step (2) can be removed. Step 3: Monitor synthesis Once the model of the system ψ is obtained (steps (1-2)), it can be used to synthesise a predictive monitor. Such synthesis depends on which formalism is used for defining the property ϕ to verify, and which formalism has been used at step (2) to generate ψ. In principle, the synthesised monitor has to follow Definition 2; where the model is used to predict future events and to prune impossible continuations of the current analysed trace. Step 4: Monitor execution The predictive monitor obtained at step (3) for the property ϕ and the Predictive Model ψ is used to analyse the system. More in detail, a trace representing the system execution is passed as input to the monitor. Such a trace can be retrieved incrementally while the system is running (events are passed to the monitor as soon as they are generated), or, it can be stored in a log file. In the former, we refer to online RV, while in the latter, we refer to offline RV. In both cases, the verification process ends with a verdict produced by the monitor. If the verdict is positive (>), then the system execution has satisfied ϕ. If the verdict is negative (⊥), then the system execution has violated ϕ. Finally, if the system execution neither satisfies nor violates ϕ, the verdict is inconclusive (?). Step 5: Log files update An interesting aspect of our approach is that log files can be used at two separate levels. They are used to synthesise the model of the system through process mining at step (1), and are used to verify the current system execution against a formal property ϕ at step (4). Nonetheless, in both cases they are nothing more than traces of events obtained by executing the system under analysis. When the monitor is done with a trace of events (it has completed the verification), in standard RV such a trace is discarded. Even though this is true for standard RV, it is not the case for PRV. Indeed, we can use the trace of events analysed by the monitor to increase the knowledge we have of the system. Specifically, this new trace of events can be added to the log files on which process mining is applied. In this way, more the predictive monitor is used to verify a system, and more it improves at predicting events on that system. Initially, the set of log files used at step (1) might be small, which would bring to a non informative model of the system; but, by using the predictive monitor to verify the system, additional event traces would be added to the set of log files. The addition of new traces enriches the knowledge the process mining algorithm has of the system, and consequently, it improves the quality of the synthesised model. 3.1 Incrementally Predictive Runtime Verification instantiation In Section 3, we present the general workflow, where no specific formalism is fixed. In this way, depending on the domain, the approach can be customised for obtaining better results. Nonetheless, for a better understanding, we present an instantiation of our approach, where we use DFG as formalism for representing the Process Mining Model, Probabilistic Finite-State Machine (PFSM) [24] and hev1 , 1.0i init ev1 v1,2 s0 (1) hev2 , 1.0i v4,1 (2a) hev1 , 1.0i v2,4 ev2 ev4 v2,3 v2,4 hev4 , i v2,3 + v2,4 s1 v2,3 s3 hev3 , i log files v2,3 + v2,4 ev3 v3,4 s2 Directly-Follows Graph hev4 , 1.0i Probabilistic Finite-State Machine (2b) ev1 init ϕ s0 ev2 (4) (3) ev1 (5) Verdict Monitor ev4 s1 s3 ev3 (4) s2 ev4 Buchi Automaton trace of events Fig. 2: Instantiated engineering steps to obtain predictive monitors. Büchi Automata (BA) [8] for the Predictive Model, and finally, LTL for the properties to verify at runtime with the monitor. Figure 2 shows an overview of the instantiated verification workflow; where, starting from a set of log files representing different system executions, and an LTL property ϕ to verify, we obtain a predictive monitor. The verification process consists in 6 steps (since step (2) of Figure 1 is split into (2a) and (2b)). Step 1: Model extraction First of all, using the log files generated by multi- ple system executions, a DFG [5] is created. Such a graph represents the sys- tem behaviour and can be obtained by applying state of the art process min- ing algorithms, such as Alpha Miner [2,29], Heuristics Miner [4,28], Inductive Miner [16,13], Process Skeletonization [3], and Graph-Based Miner [27,25] (a thorough review can be found in [5]). The output’s formalism of these algo- rithms can vary, but the most common representations are Petri Net [20], Causal Net [1], BPMN [9] and DFG. We decided to use DFG because their translation to automata is more direct. The DFG so generated is a graph where the states are labeled with events evi observed in the log files, and a directed edge goes from a state evi to a state evj if there is at least one trace in the log files where the event evi is followed by the event evj . Naturally, these events are domain dependent, e.g. messages, function calls, actions, etc. Moreover, each transition is labelled with the number of traces (vi,j ) where such causality relation is ob- served. For instance, in Figure 2, ev1 is followed by ev2 in v1,2 traces in the log files, ev2 is followed by ev3 in v2,3 traces and by ev4 in v2,4 traces, and so on. Step 2a: Addition of probabilities After generating a DFG of the system, the next step is to explicitly represent which events can be observed and with which prob- ability. This is achieved in step (2a), where starting from a DFG, a correspond- ing PFSM [24] is obtained. A PFSM is an extension of standard Finite-State Machine (FSM) where the transitions amongst states are labelled with tuples hevi , pi i, which denote that the event (evi ) can be observed with probability (pi ) in that state. Since the PFSM is obtained by a DFG, the events and probabili- ties are extracted accordingly. If there is an edge from evi to evj labelled with vi,j , and an edge from evi to evk labelled with vi,k , then in the corresponding vi,j PFSM, a state s is generated, with two transitions labelled hevj , vi,j +v i,k i and vi,k hevk , vi,j +vi,k i. These two transitions specify the probability of observing events evj and evk in s. For instance, in Figure 2, in the DFG, the state ev2 is followed by the states ev3 and ev4 . This is mapped to state s1 in the PFSM, in which the two events ev3 , and ev4 can be observed. The probability attached to ev3 is the probability of observing ev3 after ev1 , which can be computed as the number of times ev3 has followed ev1 in the log files (v1,3 ) divided by the total number of traces where ev1 has been observed (similar reasoning for ev4 ). Let us assume 7 v1,3 = 7 and v1,4 = 3, then the probability of observing ev3 after ev1 is 7+3 = 0.7 3 (i.e., 70%); while the probability of ev4 is 7+3 = 0.3 (i.e., 30%). Thus, the tran- sitions from state s1 to states s2 and s3 are hev3 , 0.7i and hev4 , 0.3i, respectively. It is important to note that, each state in PFSM has a transition for each pos- sible event of the system. The events that have never been observed in a state have zero probability (such transitions are omitted in Figure 2 for readability). For instance, in state s1 , the event ev1 has zero probability to be observed, since no trace has ever been observed with an event ev2 followed by an event ev1 . Nonetheless, we need to explicitly represent the events with zero probability as well because initially, when the model extracted at step (1) is not trustworthy, any event has to be considered possible. How much trust we put in the PFSM is determined by the following step (2b), where a probability threshold is used to prune (resp. keep) transitions on the basis of their probability values. Step 2b: Approximation with a threshold In a standard predictive RV approach, the model of the system does not contain probabilities. Instead, it simply speci- fies which traces can be generated by the system execution. Probabilities, even though relevant, are not usually taken in consideration when the model is used for predicting future events [30,21,31,17,10]. Nonetheless, probabilities can be used to select which events are observable in which states. One possible way to give importance to probabilities without explicitly reporting them in the model is to set a probability threshold. Given a probability threshold, all transitions with a probability less than the threshold can be removed (grey transition in Fig- ure 2), while the transitions with a probability greater than the threshold can be preserved (but deprived of the probability information). In this way, given a PFSM, we can generate a corresponding BA [8], for a certain probability thresh- old. Such a BA explicitly describes the language recognised by the model of the system. We decided to use BA because is one of the most used formalisms in for- mal verification. By generating a BA, we assure a wider use of our solution, since existing predictive RV tools already support BA. Depending on the threshold, the BA can be more or less reliable. For instance, if the threshold is 0.01 (i.e., 1%), then all transitions with probabilities less than 0.01 are pruned. This means that, the BA so generated recognises all traces generated by the log files, except for the traces that are unlikely. Naturally, higher is the threshold, and smaller is the language recognised by the BA; since more transitions are pruned in the process from PFSM to BA. Because of this, the threshold needs to be chosen carefully. In principle, more we trust the model generated at step (1), and higher we can set the threshold. Indeed, if we trust the model extracted by the process mining algorithm, then we can trust that the transitions with low probability represent outliers, and by pruning them we can have a faster2 verification. On the other hand, if we do not trust the model generated at step (1), we can select a lower threshold and prune less transitions in the transformation from PFSM to BA. For instance, this can happen when not many log files are available, and the process mining is not accurate enough. While the threshold can be increased later on, when more log files are available and the resulting process mining is more trustworthy. Note that, the BA corresponds to an over-approximation of the DFG obtained at step (1). Since the DFG only specifies the cause-effect re- lation amongst the system’s events, in presence of loops in the DFG (resp. in the BA), we cannot infer the number of times the system execution passes on such loops. Because of this, in order to not prune possible system executions from the BA model, each state in the BA is set to be final. In this way, the BA recognises a superset of the execution traces generated by the system. Step 3: Monitor synthesis Once we obtain a model of the system, a predictive monitor can be synthesised (following Definition 2). In [31], the authors present the approach for synthesising predictive monitors for LTL properties, where the model of the system is assumed being a BA. Following the same approach, at step (3), we synthesise a predictive monitor by combining an LTL property ϕ given in input, and the BA obtained at step (2b). Naturally, depending on the formalism used at step (2b), the monitor synthesis may vary. For instance, instead of BA, we could use timed automata to denote the model of the system, as it is done in [17]. Note that, this would cause a modification of only step (2b) and (3); the remaining steps of the workflow would remain unchanged. In fact, we selected LTL and BA as formalisms for representing properties and models because are widely used and well-known. Nonetheless, the workflow presented in this work is not constrained to any specific formalism. With few modifications it can be adapted to other formalisms as well. Steps 4-5: Monitor execution and Log files update These steps do not depend on the specific instantiation and are the same as the steps in Section 3. 2 Since less future continuations have to be considered. 4 Implementation The Python implementation of our approach is publicly available as a GitHub repository3 . We implemented all the engineering steps presented in this paper, when instantiated to the case with LTL properties and BA models. More in detail, the resulting tool takes in input: (i) a set of log files (expressed as a single XES file4 ), which is the standard format used in PM to represent event logs; (ii) a threshold to guide the mapping from PFSM to BA; (iii) an LTL property to verify; (iv) a trace generated by the current system execution to analyse. To implement step (1), i.e. the application of process mining to the log files, we used the PM4Py5 Python library. PM4Py is the leading open source pro- cess mining platform written in Python, and is developed by the process mining group of Fraunhofer Institute for Applied Information Technology6 . Such a li- brary allows to develop process mining applications very quickly. In fact, the step (1) of our approach has been completely implemented by calling PM4Py API. Starting from the XES file denoting the knowledge we have of the system, PM4Py generates a DFG as a Python object. The rest of the code has been fully implemented in Python from scratch. First, the Python object representing the DFG is translated into another Python object denoting the corresponding PFSM (step (2a)). Such translation is straightforward, because it is enough to unroll the transitions amongst states in the DFG; similarly to what it is usually done when translating Kripke structures [15] to BA. Once the Python object representing the PFSM is obtained, the tool goes on with its translation to BA (step (2b)). Again, this translation is direct because it simply requires to prune the tran- sitions with a probability lower than the chosen threshold, and make all states final. With the BA Python object, a predictive monitor is then generated. This, in particular, is obtained by using a Python library supporting the generation of predictive monitors for LTL properties and BA models7 . This library has been developed by one of the authors of this paper as well. Using this library, the step (3) is completed, and a predictive monitor ready to be used is synthesised. Finally, the tool terminates by using such a monitor to verify a trace of events given in input, and reporting the final verdict to the user (step (4)). The trace so analysed can then be integrated (step (5)) inside the XES file given in input to enhance future process mining phases (i.e., next times the approach is used). 5 Related Work RV is a rich research area, but PRV is still rising. One of the oldest works on PRV is presented in [31]. In this work, the authors present an extension of the standard RV approach when applied to LTL properties. Differently from the 3 https://github.com/AngeloFerrando/IncrementallyPredictiveRV 4 http://www.xes-standard.org/ 5 https://pm4py.fit.fraunhofer.de/ 6 http://fit.fraunhofer.de/process-mining 7 https://github.com/AngeloFerrando/MultiModelPredictiveRuntimeVerification rest of works in literature, the prediction of future events is not obtained using a model of the system, but is limited to a single sequence of events (called finite predictive word ) that is concatenated to the analysed trace. With respect to our solution, no explicit – nor complete – representation of the system is used. The same year, one of the authors proposed an extended version [17], where instead of considering one single predictive word, a BA model is integrated inside the standard generation of LTL monitors. The resulting approach is at the basis of the predictive library we use in our tool to synthesise predictive LTL moni- tors. The main difference is in the generation of the BA, which in [17] is given, while in our approach is automatically extracted through PM. Moreover, our approach is intrinsically general. In fact, LTL and BA are just a possible instan- tiation. A more recent work where prediction is obtained through assumptions is presented in [10]. The authors propose a different way to synthesise predic- tive monitors for LTL properties, where the model used to predict the future events, instead of being a BA, is a Fair Kripke Structure (FKS) [14]. Similarly, in [21], another PRV approach is presented, where the model is instead defined as a Timed Automaton. In both cases, the main difference with our approach is in the generality and the fact that the model is not given but automatically synthesised. On a completely different line of research, we may find [6], where probabilistic models, like Markov Models, are used. This work is very similar to ours because, differently from the others mentioned before, it does not assume the model of the system is given. In fact, the probabilistic model used in the resulting RV framework is trained on samples generated by the system execu- tion using the Baum-Welch algorithm [23]. Even though similar in principle, this work differs from our solution in three aspects: (i) it uses machine learning techniques to extract the model, while we rely on PM techniques; (ii) it does not define any notion of incrementality; (iii) it considers only finite extensions. To the best of our knowledge, the work that is closest to ours is presented in [19]; where the authors use PM to extract the model used for synthesising predic- tive monitors as well. With respect to our approach, in [19] the objective of the predictive monitor is slightly different. Instead of being purely focused on the verification of the system under analysis, they are more interested in using the monitors to recommend the system on how to proceed. To do so, they focus on the probabilistic aspects and present a framework to suggest the best actions to perform in order to increase the probability of satisfying a given LTL formula. Moreover, their solution is less general than ours. Their approach is hard-coded in the Business Process scenario, while our verification workflow is general and tackles all the engineering steps necessary to integrate PM inside the creation of predictive monitors. 6 Conclusions and Future Work In this paper, we present a general verification workflow for integrating PM in the generation of predictive monitors. We show all the engineering steps that, from a set of log files, bring to the extraction of a model which can be used to predict future events and speed up the RV process. We present an overview of the verification workload where no formalism is enforced. This choice increases the impact of the work in the verification community, since these engineering steps can be customised w.r.t. the user’s needs. Nonetheless, to help better un- derstanding the approach, we also show an instantiation with LTL properties and BA models. The choice of these two formalisms is due to their being largely used in the verification community. Finally, a Python prototype tool is presented. We briefly show its features and how it implements the engineering steps presented in the paper. With respect to future directions; this is an initial work on the topic and there are many different aspects that still need to be tackled. For instance, at the current level, the probability is not considered in the monitor and it is lost in the translation from PFSM to BA. Nonetheless, this is an interesting aspect to explore further. Indeed, the notion of threshold could be used to add more information to the monitor’s outcome. This could also bring to the generation of multiple BA, each corresponding to a different threshold. This would be interest- ing to analyse, and it could be a starting point for a more thorough comparison between our approach, and the one presented in [19]. References 1. van der Aalst, W.M.P.: Process Mining - Discovery, Conformance and Enhance- ment of Business Processes. Springer (2011). https://doi.org/10.1007/978-3-642- 19345-3, https://doi.org/10.1007/978-3-642-19345-3 2. van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discover- ing process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128– 1142 (2004). https://doi.org/10.1109/TKDE.2004.47, https://doi.org/10.1109/ TKDE.2004.47 3. Abe, M., Kudo, M.: Business monitoring framework for process discovery with real-life logs. In: Sadiq, S.W., Soffer, P., Völzer, H. (eds.) Business Process Management - 12th International Conference, BPM 2014, Haifa, Israel, Septem- ber 7-11, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8659, pp. 416–423. Springer (2014). https://doi.org/10.1007/978-3-319-10172-9 30, https: //doi.org/10.1007/978-3-319-10172-9_30 4. Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Bruno, G.: Au- tomated discovery of structured process models from event logs: The discover-and-structure approach. Data Knowl. Eng. 117, 373–392 (2018). https://doi.org/10.1016/j.datak.2018.04.007, https://doi.org/10.1016/j. datak.2018.04.007 5. Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A., Mecella, M., Soo, A.: Automated discovery of process models from event logs: Review and benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 686– 705 (2019). https://doi.org/10.1109/TKDE.2018.2841877, https://doi.org/10. 1109/TKDE.2018.2841877 6. Babaee, R., Gurfinkel, A., Fischmeister, S.: P revent : A predictive run-time verification framework using statistical learning. In: Johnsen, E.B., Schaefer, I. (eds.) Software Engineering and Formal Methods - 16th International Confer- ence, SEFM 2018, Held as Part of STAF 2018, Toulouse, France, June 27- 29, 2018, Proceedings. Lecture Notes in Computer Science, vol. 10886, pp. 205–220. Springer (2018). https://doi.org/10.1007/978-3-319-92970-5 13, https: //doi.org/10.1007/978-3-319-92970-5_13 7. Bartocci, E., Falcone, Y., Francalanza, A., Reger, G.: Introduction to runtime verification. In: Lectures on Runtime Verification, pp. 1–33. Springer (2018) 8. Büchi, J.R.: On a Decision Method in Restricted Second Order Arithmetic, pp. 425–435. Springer New York, New York, NY (1990). https://doi.org/10.1007/978- 1-4613-8928-6 23, https://doi.org/10.1007/978-1-4613-8928-6_23 9. Business process model and notation. https://www.bpmn.org/, accessed: 2021-06- 24 10. Cimatti, A., Tian, C., Tonetta, S.: Assumption-based runtime verification with partial observability and resets. In: Finkbeiner, B., Mariani, L. (eds.) Runtime Verification - 19th International Conference, RV 2019, Porto, Portugal, Octo- ber 8-11, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11757, pp. 165–184. Springer (2019). https://doi.org/10.1007/978-3-030-32079-9 10, https: //doi.org/10.1007/978-3-030-32079-9_10 11. Clarke, E.M.: Model checking. In: International Conference on Foundations of Soft- ware Technology and Theoretical Computer Science. pp. 54–56. Springer (1997) 12. Fisher, M., Mascardi, V., Rozier, K.Y., Schlingloff, B., Winikoff, M., Yorke-Smith, N.: Towards a framework for certification of reliable autonomous systems. Auton. Agents Multi Agent Syst. 35(1), 8 (2021). https://doi.org/10.1007/s10458-020- 09487-2, https://doi.org/10.1007/s10458-020-09487-2 13. Ghawi, R.: Process discovery using inductive miner and decomposition. CoRR abs/1610.07989 (2016), http://arxiv.org/abs/1610.07989 14. Kesten, Y., Pnueli, A., Raviv, L.: Algorithmic verification of linear temporal logic specifications. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) Automata, Languages and Programming, 25th International Colloquium, ICALP’98, Aal- borg, Denmark, July 13-17, 1998, Proceedings. Lecture Notes in Computer Sci- ence, vol. 1443, pp. 1–16. Springer (1998). https://doi.org/10.1007/BFb0055036, https://doi.org/10.1007/BFb0055036 15. Kripke, S.A.: Semantical considerations on modal logic. Acta Philosophica Fennica 16, 83–94 (1963) 16. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - A constructive approach. In: Colom, J.M., Desel, J. (eds.) Application and Theory of Petri Nets and Concurrency - 34th International Conference, PETRI NETS 2013, Milan, Italy, June 24-28, 2013. Proceedings. Lecture Notes in Computer Science, vol. 7927, pp. 311– 329. Springer (2013). https://doi.org/10.1007/978-3-642-38697-8 17, https:// doi.org/10.1007/978-3-642-38697-8_17 17. Leucker, M.: Sliding between Model Checking and Runtime Verification. In: Run- time Verification. LNCS, vol. 7687, pp. 82–87. Springer (2012) 18. Loveland, D.W.: Automated theorem proving: a logical basis, Fundamental studies in computer science, vol. 6. North-Holland (1978), https://www.worldcat.org/ oclc/252520243 19. Maggi, F.M., Francescomarino, C.D., Dumas, M., Ghidini, C.: Predictive mon- itoring of business processes. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) Advanced Information Systems Engineering - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8484, pp. 457–472. Springer (2014). https://doi.org/10.1007/978-3-319-07881- 6 31, https://doi.org/10.1007/978-3-319-07881-6_31 20. Petri, C.A.: Kommunikation mit Automaten. Ph.D. thesis, Universität Hamburg (1962) 21. Pinisetty, S., Jéron, T., Tripakis, S., Falcone, Y., Marchand, H., Preoteasa, V.: Pre- dictive Runtime Verification of Timed Properties. Journal of Systems and Software 132, 353–365 (2017) 22. Pnueli, A.: The temporal logic of programs. In: 18th Annual Sympo- sium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977. pp. 46–57. IEEE Computer Society (1977). https://doi.org/10.1109/SFCS.1977.32, https://doi.org/10.1109/SFCS. 1977.32 23. Rabiner, L.: A tutorial on hidden markov models and selected applica- tions in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989). https://doi.org/10.1109/5.18626 24. SALOMAA, A.: Chapter ii - finite non-deterministic and probabilistic automata. In: SALOMAA, A. (ed.) Theory of Automata, International Series of Monographs on Pure and Applied Mathematics, vol. 100, pp. 71–113. Pergamon (1969). https://doi.org/https://doi.org/10.1016/B978-0-08-013376-8.50008-3, https: //www.sciencedirect.com/science/article/pii/B9780080133768500083 25. Sarno, R., Sungkono, K.R., Johanes, R., Sunaryono, D.: Graph-based algorithms for discovering a process model containing invisible tasks. International Journal of Intelligent Engineering and Systems 12, 85–94 (2019) 26. Van Der Aalst, W.: Data science in action. In: Process mining, pp. 3–23. Springer (2016) 27. Waspada, I., Sarno, R., Sungkono, K.: An improved method of paral- lel model detection for graph-based process model discovery. International Journal of Intelligent Engineering and Systems 13, 127–139 (04 2020). https://doi.org/10.22266/ijies2020.0430.13 28. Weijters, A.J.M.M., Ribeiro, J.T.S.: Flexible heuristics miner (FHM). In: Pro- ceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, part of the IEEE Symposium Series on Computa- tional Intelligence 2011, April 11-15, 2011, Paris, France. pp. 310–317. IEEE (2011). https://doi.org/10.1109/CIDM.2011.5949453, https://doi.org/10.1109/ CIDM.2011.5949453 29. Wen, L., Wang, J., van der Aalst, W.M.P., Huang, B., Sun, J.: Mining pro- cess models with prime invisible tasks. Data Knowl. Eng. 69(10), 999–1021 (2010). https://doi.org/10.1016/j.datak.2010.06.001, https://doi.org/10.1016/ j.datak.2010.06.001 30. Yu, K., Chen, Z., Dong, W.: A Predictive Runtime Verification Framework for Cyber-Physical Systems. In: Software Security and Reliability-Companion. pp. 223–227. IEEE (2014) 31. Zhang, X., Leucker, M., Dong, W.: Runtime Verification with Predictive Semantics. In: NASA Formal Methods. LNCS, vol. 7226, pp. 418–432. Springer (2012)