Scalable Process Monitoring Through Rules and Neural Networks Alan Perotti1 , Guido Boella1 , and Artur d’Avila Garcez2 1 University of Turin, Italy 2 City University London, United Kingdom Abstract. In this paper we introduce RuleRunner, a Runtime Verification system for monitoring LTL properties over finite traces. By exploiting results from the Neural-Symbolic Integration area, a RuleRunner monitor can be encoded in a recurrent neural network. The results show that neural networks can perform real-time runtime verification and techniques of parallel computing can be applied to improve the performance in terms of scalability. Furthermore, our framework allows for property adaptation by using a standard neural network learning algorithm. 1 Introduction A trend in high-performance computation is parallel computing, and the field of neural networks has been showing impressive improvements in performance, especially with the use of GPU accelerators. For example, in 2012 Google used approximately 1,000 CPU-based servers, or 16,000 CPU cores, to develop its neural network (1.7 billion parameters), which taught itself to recognise cats in a series of YouTube videos [13]. One year later, the Stanford team created an equally large network with only three servers using NVIDIA GPUs to accelerate the processing of the big data generated by the network [7]. Our first Research Question is the following: Is it possible to use the connectionist paradigm and parallel computing techniques in order to improve the performance of a runtime verification system? Neural networks are also known to offer intrinsic learning capabilities [11], and many fields using verification also involve learning-based tasks. For instance, Business Process (BP) Management [19] attempts to continuously improve business effectiveness and efficiency, typically relying on Business Process Management Systems (BPMS). A BPMS allows to automatically execute a BP according to its process schema, i.e., to a formalised model in which the actions to be performed and the control flow relations to be respected among them are specified. However, BP optimisation may ask the enterprise to be able to flexibly change and adapt such a predefined process schema, in response to expected situations (e.g., new laws, reengineering efforts, concept drift [19]) as well as to unanticipated exceptions and problems in the operating environment (e.g., emergencies) [12]. Our second Research Question is therefore: Can one develop a single framework that combines efficient monitoring, possibly through parallel computing techniques, with property adaptation through learning? In this paper, we answer both Research Questions positively. We present our runtime 2 monitoring system: RuleRunner creates and maintains a single, detailed state, avoiding the possible worlds branching structure typical of tableaux-based formulations. Furthermore, RuleRunner is designed as a set of rules that can be fired in parallel, rather than in a strict sequence. We show how a RuleRunner system can be translated into an equivalent logic program, in order to exploit results from the Neural-Symbolic Integration area [8] to encode it in a recurrent neural network. We validate our framework by comparing three implementations based on neural networks, observing how sparse form and GPU computing improve the performance, respectively, in terms of absolute speed and scalability. We show how in our framework learning corresponds to adapt the encoded property according to the observed behaviour of a system, but the application of learning strategies and approaches is outside the scope of this paper. As an example, suppose several violations (signalled by an access control monitor) coming from a given area of a LAN turn out to be false positives [6]. The reasonable reaction is to relax the encoded property it in order to correctly identify the transactions from the given area as acceptable. In this way, the security system still encodes the given property, but reducing the number of false positives makes the system more efficient. This paper is structured as follows: Section 2 introduces background and related work and Section 3 analyses our rule-based system, RuleRunner. Section 4 discusses the translation of our rule system in a neural network; Section 5 shows experimental results and a motivational example for parallel monitoring and property adaptation; Section 6 ends the paper with closing remarks and directions for future work. 2 Background and Related Work 2.1 Runtime Verification Runtime Verification (RV) relates an observed system with a formal property φ specifying some desired behaviour. An RV module, or monitor, is defined as a device that reads a trace and yields a certain verdict [14]. A trace is a sequence of cells, which in turn are lists of observations occurring in a given discrete span of time. Runtime verification may work on finite (terminated), finite but continuously expanding, or on prefixes of infinite traces. While LTL is a standard semantic for infinite traces [17], there are many semantics for finite traces: FLTL [15], RVLTL [3], LTL3 [4], LTL± [10] just to name some. Since the standard LTL semantics is based on infinite behaviours, the issue is to close the gap between properties specifying infinite behaviours and finite traces. In particular, FLTL differs from LTL as it offers two next operators (X, X̄ in [3], X, W in this paper), called respectively strong and weak next. Intuitively, the strong (and standard) X operator is used to express with Xφ that a next state must exist and that this next state has to satisfy property φ. In contrast, the weak W operator in W φ says that if there is a next state, then this next state has to satisfy the property φ. More formally, let u = a0 ..an−1 denote a finite trace of length n. The truth value of an FLTL formula ψ (either Xφ or W φ) w.r.t. u at position i < n, denoted by [u, i  ψ], is an element of B and is defined as follows: ( ( [u, i + 1  φ], if i+1 < n [u, i + 1  φ], if i+1 < n [u, i  Xφ] = [u, i  W φ] = ⊥, otherwise >, otherwise While RVLTL and LTL3 have been proven to hold interesting properties w.r.t. FLTL (see [3]), we selected FLTL as we think it captures a more intuitive semantics when 3 dealing with finite traces. Suppose to monitor φ = a over a trace t, where a is observed in all cells: we have that [t  φ] equals, respectively, > in FLTL, ? in LTL3, and T p in RVLTL. If t is seen as a prefix of a longer trace tσ, then LTL3 and RVLTL provide valuable information about how φ could be evaluated over σ. But if t is a conclusive, self-contained trace (e.g. a daily set of transactions), then the FLTL semantics captures the intuitive positive answer to the query does a always hold in this trace? Several RV systems have been developed, and they can be clustered in three main approaches, based respectively on rewriting, automata and rules [14]. Within rule based approaches, RuleR [2] uses an original approach. It copes with the temporal dimension by introducing rules which may reactivate themselves in later stages of the reasoning. RuleRunner is inspired by this powerful idea, and in particular by the striking similarity of triggering rules and activating neurons. While various formalisms can be encoded in a RuleR monitor, we focus on FLTL and extend RuleR to allow structures of a fixed size that can be implemented in efficient networks. The next section will describe the difference in the two approaches in more detail. 2.2 Neural networks and Neural-symbolic Integration An artificial Neural Network (NN, [11]) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. Within the umbrella term of connectionist approach, many NN models have been developed, varying from those with only one or two layers of single direction logic, to complicated models with multi-input and many-directional feedback loops and layers. On the whole, these systems use algorithms in their programming to determine control and organisation of their functions. What they do have in common, however, is the principle of non-linear, distributed, parallel and local processing and adaptation. The main purpose of a neural-symbolic system is to bring together the connectionist and symbolic approaches exploiting the strengths of both paradigms and, hopefully, avoiding their drawbacks. In [18], Towell and Shavlik presented the influential neural-symbolic system KBANN (Knowledge-Based Artificial Neural Network), a system for rule insertion, refinement and extraction from feedforward neural networks. KBANN served as inspiration for the construction of the Connectionist Inductive Learning and Logic Programming (CILP) system [8] (example in Fig. 1). CILP’s Translation Algorithm maps a general logic program P into a single-hidden-layer neural network N such that N computes the least fixed-point of P . In particular, rules are mapped onto hidden neurons, the preconditions of rules onto input neurons and the conclusion of the rules onto output neurons. The weights are then adjusted to express the dependence among all these elements. The obtained network implements a massively parallel model for Logic Programming, and it can perform inductive learning from examples, by means of standard learning strategies. 4 (a) (b) (c) (d) Fig. 1: Logic program P as a network N (CILP); (a) shows a simple logic program P . Applying CILP to P generates a NN N as depicted in (c). The input-outputs behaviour in P and N is shown in (b) and (d) respectively. For instance, if P is presented with the fact A, the first two clauses will produce A, B (second line in (b)). Similarly, if the input neuron representing A is activated in N , the signal propagation will activate the first two hidden neurons and both output neurons (second line in (d)). 3 The RuleRunner Rule System RuleRunner is a rule-based online monitor observing finite but expanding traces and returning an FLTL verdict. RuleRunner accepts formulae φ generated by the grammar: φ ::= true | a | !a | φ ∨ φ | φ ∧ φ | φU φ | Xφ | W φ | ♦φ | φ | EN D a is treated as an atom and corresponds to a single observation in the trace. We assume, without loss of generality, that temporal formulae are in negation normal form (NNF), e.g., negation operators pushed inwards to propositional literals and cancellations applied. W is the weak next operator. END is a special character that is added to the last cell of a trace to mark the end of the input stream. Algorithm 1 RuleRunner monitoring (abstract) 1: function RR- MONITORING(φ,trace t) 2: Build a monitor RRφ encoding φ 3: while new cells exist in t do 4: Observe the current cell 5: Compute truth values of φ in the current cell of t . Evaluation rules 6: if φ is verified or falsified then 7: return SUCCESS or FAILURE respectively 8: end if 9: Set up the monitor for the next cell in t . Reactivation rules 10: end while 11: end function Given an FLTL formula φ and a trace t, Algorithm 1 provides an abstract description of the creation and runtime behaviour of a RuleRunner system monitoring φ over t. At first, a monitor encoding φ is computed. Second, the monitor enters the verification loop, composed by observing a new cell of the trace and computing the truth value of the property in the given cell. If the property is irrevocably satisfied or falsified in the current cell, RuleRunner outputs a binary verdict. If this is not the case (because the φ refers to cells ahead in the trace), the system shifts to the following cell and enters another monitoring iteration. The FLTL semantics guarantees that, if the trace ends, the verdict in the last cell of the trace is binary. RuleRunner is a runtime monitor, as it analyses one cell at a time and never needs to store past cells in memory nor peek future ones. 5 Definition 1. A RuleRunner system is a tuple hRE , RR , Si, where RE (evaluation rules) and RR (reactivation rules) are rule sets, and S (for state) is a set of active rules, observations and truth evaluations. Given a finite set of observations O and an FLTL formula φ over (a subset of) O, a state S is a set of observations (o ∈ O), rule names (R[ψ]) and truth evaluations ([ψ]V ); V ∈ {T, F, ?} is a truth value. A rule name R[ψ] in S means that the logical formula ψ is under scrutiny, while a truth evaluation [ψ]V means that the logical formula ψ currently has the truth value V . The third truth value, ? (undecided), means that it is impossible to give a binary verdict in the current cell. Evaluation rules follow the pattern R[φ], [ψ 1 ]V, . . . , [ψ n ]V, → [φ]V and their role is to compute the truth value of a formula φ under verification, given the truth values of its direct subformulae ψ i (line 5 in Algorithm 1). For instance, R[♦ψ], [ψ]T → [♦ψ]T reads as if ♦a is being monitored and ψ holds, then ♦ψ is true. Reactivation rules follow the pattern [φ]? → R[φ], R[ψ 1 ], . . . , R[ψ n ] and the meaning is that if one formula is evaluated to undecided, that formula (together with its subformulae) is scheduled to be monitored again in the next cell of the trace (line 9 in Algorithm 1). For instance, [♦ψ]? → R[♦ψ], R[ψ] means that if ♦ψ was not irrevocably verified nor falsified in the current cell of the trace, both ψ and ♦ψ will be monitored again in the next cell. A RuleRunner feature is that rules never involve disjunctions. In RuleR, for instance, the simple formula ♦a is mapped to the rule R♦a : −→ a | R♦a and its meaning, intuitively, is that, if ♦a has to be verified, either a is observed (thus satisfying the property) or the whole formula will be checked again (in the next cell of the trace). The same formula corresponds to the following set of rules in RuleRunner: R[♦a], [a]T → [♦a]T R[♦a], [a]?, EN D → [♦a]F R[♦a], [a]? → [♦a]? [♦a]? → R[a], R[♦a] R[♦a], [a]F → [♦a]? The disjunction in the body of the RuleR rule corresponds to the additional constraints in the head of the RuleRunner rules. Therefore, where RuleR generates a set of alternative hypotheses and later matches them with actual observations, RuleRunner maintains a detailed state of exact information. This is achieved by means of evaluation tables: three-valued truth tables (as introduced by Lukasiewitz [16]) annotated with qualifiers. Each evaluation rule for φ corresponds to a single cell of the evaluation table for the main operator of φ; a qualifier is a subscript letter providing additional information to ? truth values. Figure 2 gives the example for disjunction. Qualifiers (B, L, R in this case) are used to store and propagate detailed information about the verification status of formulae. For instance, if φ is undecided and ψ is false when monitoring φ ∨ ψ (highlighted cell in Figure 2), ?L means that the disjunction is undecided, but that its future verification state will depend on the truth value of the Left disjunct. Note, in fact, how ∨L is a unary operator. An example for this is monitoring ♦b ∨ a against a cell including only c: a is false, ♦b is undecided (as b may be observed in the future), and the whole disjunction will be verified/falsified in the following cells depending on ♦b only. 6 Fig. 2: Truth table (left) and evaluation tables (right) for ∨ !a ^B T ? F ^L ^R _B T ? F _L _R a (observation) T T ?R F T T T T T T T T T T T T a ∈ state T a ∈ state F ? ?L ? B F ? ?L ? ?R ? T ? B ?L ? ?L ? ?R a ∉ state F a ∉ state T F F F F F F F F F T ?R F F F F F ⇤ ⇤ ⌃ END ⌃ WM XM END END X T T ? F W T T END W X T T T ?K ?K T ? ? ?M ? T ? ? ?M ? F ? ? ? F ? ? F F ?M F F F ?M F F F F ? UA T ? F UB T ? F UL UR END U T T ?A ?A T T ?A ?A T T T T ?A F ? T ?A ? B ? ?L ? B ? B ? ?L ? ?R ?B F F T ?R F F F F F F F F F ?L F ?R F Fig. 3: Complete set of evaluation tables The complete set of evaluation tables is reported in Fig. 3, while the generation of evaluation and reactivation rules is summarised in Algorithm 2. The algorithm parses φ in a tree and visits the parsing tree in post-order. The system is built incrementally, starting from the system(s) returned by the recursive call(s). If φ is an observation (or its negation), an initial system is created, including two evaluation rules (as the observation may or may not occur), no reactivation rules and the single R[φ] as initial state. If φ is a conjunction or disjunction, the two systems of the subformulae are merged, and the conjunction/disjunction evaluation rules, reactivation rule and initial activation are added. The computations are the same if the main operator is U , but the reactivation rule will have to reactivate the monitoring of the two subformulae; in particular, UA denotes the standard until operator, while UB is the particular case where the ψ failed and the until operator cannot be trivially satisfied anymore. Formulae with X or W as main operator go through two phases: first, the formula is evaluated to undecided, as the truth value can’t be computed until the next cell is accessed. Special evaluation rules force the truth value to false (for X) or true (for W ) if no next cell exists. Then, at the next iteration, the reactivation rule triggers the subformula: this means that if Xφ is monitored in cell i, φ is monitored in cell i + 1. φ is then monitored independently, and the Xφ (or W φ) rule enters a ’monitoring state’ (suffix M in the table), simply mirroring φ truth value and self-reactivating. The evaluation of φ is false (undecided) when φ is false (undecided); it is also undecided when φ holds (as φ can never be true before the end of the trace), 7 but the K suffix indicates when, at the end of the trace, an undecided  can be evaluated to true. Finally, ♦φ constantly reactivates itself and its subformula φ, unless φ is verified at runtime (causing ♦φ to hold), the trace ends (♦φ fails). Algorithm 2 Generation of rules 1: function I NITIALISE(φ) 2: op ← main operator . Apply recursively to subformula(e) 3: if op ∈ {, ♦, X, W } then 1 1 4: hRE , RR , S 1 i ← Initialise(ψ 1 ) 1 5: RE ← RE ; 1 6: RR ← RR ; 7: else if op ∈ {∨, ∧, U } then 1 1 8: hRE , RR , S 1 i ← Initialise(ψ 1 ) 2 2 9: hRE , RR , S 2 i ← Initialise(ψ 2 ) 1 2 1 2 10: RE ← RE ∪ RE ; RR ← RR ∪ RR ; 11: else 12: RE ← ∅; RR ← ∅; 13: end if . Compute and add evaluation rules for main operator 14: Cells ←op’s-evaluation-tables 15: for all cell ∈ Cells do 16: Convert cell to single rule re , substituting formula names 17: R E ← R E ∪ re 18: end for 19: if φ-is-main-formula then 20: RE ← RE ∪ ([φ]T → SU CCESS) 21: RE ← RE ∪ ([φ]F → F AILU RE) 22: RE ← RE ∪ ([φ]? → REP EAT ) 23: end if . Compute initial state for this subsystem 24: if op = a then S ← R[a] 25: else if op =!a then S ← R[!a] 26: else if op ∈ {∨, ∧} then S ← S 1 ∪ S 2 ∪ R[φ]B 27: else if op = U then S ← S 1 ∪ S 2 ∪ R[φ]A 28: else if op ∈ {, ♦} then S ← S 1 ∪ R[φ] 29: else if op ∈ {X, W } then S ← R[φ] 30: end if . Compute and add reactivation rules for main operator 31: if op ∈ {∨, ∧} then RR ← RR ∪ ([φ]?Z → R[φ]?Z), for Z ∈ L, R, B 32: else if op = U then RR ← RR ∪ ([φ]?Z → R[φ]?Z, S 1 , S 2 ), for Z ∈ A, B, L, R 33: else if op ∈ {, ♦} then RR ← RR ∪ ([φ]? → R[φ], S 1 ) 34: else if op ∈ {X, W } then RR ← RR ∪ ([φ]? → R[φ]M, S 1 ) ∪ ([φ]?M → R[φ]M ) 35: end if . Return computed system 36: return hRE , RR , Si 37: end function RuleRunner generates several rules for each operator, but this number is constant, as it corresponds to the size of evaluation tables plus special rules (like the SUCCESS one). The number of rules corresponding to φ ∨ ψ, for instance, does not depend in any way on the nature of φ or ψ, as only the final truth evaluation of the two subformulae is taken into account. The preprocessing phase creates the parse tree of the property to encode and adds a constant number of rules for each node (subformula). The obtained rule set does not change at runtime nor when monitoring new traces. During the runtime verification, for each cell of the trace the system goes through all rules exactly once. This is guaranteed by the post-order visit of the parsing tree, as shown in Algorithm 2, assuring pre-emption for rules evaluating simpler formulae. Therefore, the complexity of the system is inherently linear. This is not in contrast with known exponential lower bounds 8 for the temporal logic validity problem, as RuleRunner deals with the satisfiability of a property on a trace, thus tackling a different problem from the validity one (this distinction is also mentioned in [9]). A prototype for RuleRunner can be found here. As an example, consider the formula φ = a ∨ ♦b and the trace t = [c − a − b, d − b, EN D] (dashes separate cells and commas separate observations in the same cell). If monitoring φ over t, a fails in the first cell, while b is sought until the third cell, when it is observed. Thus the monitoring yields a success even before the end of the trace. EVALUATION RULES INITIAL STATE – R[a], a is not observed → [a]F – R[a], R[b], R[♦b], R[a ∨ ♦b]B – R[b], b is observed → [b]T – R[b], b is not observed → [b]F – R[♦b], [b]T → [♦b]T EVOLUTION OVER [c − a − b, d − b, EN D] – R[♦b], [b]F → [♦b]? – R[a ∨ ♦b]B, [a]F , [♦b]? → [a ∨ ♦b]?R state R[a], R[b], R[♦b], R[a ∨ ♦b]B – R[a ∨ ♦b]R, [♦b]T → [a ∨ ♦b]T + obs R[a], R[b], R[♦b], R[a ∨ ♦b]B, c – R[a ∨ ♦b]R, [♦b]? → [a ∨ ♦b]?R eval [a]F, [b]F, [♦b]?, [a ∨ ♦b]?R – [a ∨ ♦b]T → SU CCESS react R[b], R[♦b], R[a ∨ ♦b]R state R[b], R[♦b], R[a ∨ ♦b]R + obs R[b], R[♦b], R[a ∨ ♦b]R, a REACTIVATION RULES eval [b]F, [♦b]?, [a ∨ ♦b]?R – [♦b]? → R[b], R[♦b] react R[b], R[♦b], R[a ∨ ♦b]R – [a ∨ ♦b]?R → R[a ∨ ♦b]R state R[b], R[♦b], R[a ∨ ♦b]R + obs R[b], R[♦b], R[a ∨ ♦b]R, b, d eval [b]T, [♦b]T, [a ∨ ♦b]T, SU CCESS STOP PROPERTY SATISFIED The behaviour of the runtime monitor is the following: – At the beginning, the system monitors a,b,♦b and a∨♦b (initial state = R[a], R[b], R[♦b], R[a ∨ ♦b]B). The −B in R[a ∨ ♦b]B means that both disjuncts are being monitored. – In the first cell, c is observed and added to the state S. Using the evaluation rules, new truth values are computed: a is false, b is false, ♦b is undecided. The global formula is undecided, but since the trace continues the monitoring goes on. The −R in R[a ∨ ♦b]R means that only the right disjunct is monitored: the system dropped a, since it could only be satisfied in the first cell. – In the second cell, a is observed but ignored (the rules for its monitoring are not activated); since b is false again, ♦b and a ∨ ♦b are still undecided. – In the third cell, d is ignored but observing b satisfies, in cascade, b, ♦b and a ∨ ♦b. The monitoring stops, signalling a success. The rest of the trace is ignored. 4 Encoding the monitor in a Neural Network The translation of a RuleRunner monitor in an equivalent neural network is composed by two main steps, respectively encoding a rule system in a logic program and the resulting logic program in a neural network. We hereby describe the details of the first step and explain how to use CILP for the second one. The first step of the neural encoding is the translation of a RuleRunner system into an equivalent logic program (Algorithm 3). 9 Algorithm 3 From RuleRunner to Logic Programs 1: function RR 2 LP(φ) 2: Create RRφ = hRE , RR , Si encoding φ 3: Create an empty logic program LP . CE 4: for all R[φ], [ψ 1 ]V, . . . , [ψ n ]V, → [φ]V ∈ RE do 5: LP ← LP ∪ [φ]V :-∼[U P DAT E], R[φ], [ψ 1 ]V, . . . , [ψ n ]V 6: end for . CP 7: for all o ∈ RE ∪ RR do 8: LP ← LP ∪ o :- ∼[U ], o 9: end for 10: for all R[φ] ∈ RE ∪ RR do 11: LP ← LP ∪ R[φ] :- ∼[U P DAT E], R[φ] 12: end for . CR 13: for all [φ]? → R[φ], R[ψ 1 ], . . . , R[ψ n ] ∈ RR do 14: LP ← xLP ∪ R[φ] :- [U P DAT E], [φ]? 15: for all R[ψ i ] do 16: LP ← LP ∪ R[ψ i ] :- [U P DAT E], [φ]? 17: end for 18: end for 19: LPφ = LP 20: return LPφ 21: end function The algorithm creates a single logic program LP ; however, for the sake of explanation, we distinguish three kinds of clauses: evaluation, reactivation and persistence (marked as CE , CR , CP in Algorithm 3). Intuitively, evaluation and reactivation clauses (in LP ) mirror, respectively, evaluation and reactivation rules in RuleRunner. Persistence clauses are used to remember observations and active rules by explicit re-generation: these clauses follow the pattern x :- ∼[U P DAT E], x, where x is is a rule name (R[φ]) or an observation (o ∈ O). Evaluation clauses are obtained from evaluation rules by adding one extra literal in the body, ∼[U P DAT E]. The reactivation rules are split into several reactivation clauses, one for each literal in the head of the rule; [U P DAT E] is added in the body of all these rules. Finally, for all observations and truth evaluations in the rules, a persistence clause is added. RuleRunner’s monitor loop fires the evaluation and reactivation rules in an alternate fashion. We simulate that by introducing the [U P DAT E] literal and using it as a switch: when [U P DAT E] holds, only reactivation clauses can hold, while when it does not all reactivation rules are inhibited and evaluation and persistence clauses are potentially active. RuleRunner iteratively builds a state, going through partial solutions; in the same way, some sort of memory is necessary for the partial results to be remembered by the LP. We achieve that by adding persistence clauses: as long as [U P DAT E] does not hold, all observations and rule names are re-obtained at each iteration of the logic program. Another way to achieve this persistence is to add the desired observations/rule names as facts: we opted for the former because persistence clauses have a standard structure, while adding facts to the LP correspond to clamping neurons in a neural network. The example in the previous section corresponds to the logic program LP(a∨♦b) , depicted below ([U P DAT E] shortened to [U ]). 10 EVALUATION CLAUSES REACTIVATION CLAUSES – [a]F :- ∼[U ], R[a], ∼a – R[♦b] :- [U ], [♦b]? – [b]T :- ∼[U ], R[b], b – R[b] :- [U ], [♦b]? – [b]F :- ∼[U ], R[b], ∼b – R[a ∨ ♦b]R :- [U ], [a ∨ ♦b]?R – [♦b]T :- ∼[U ], R[♦b], [b]T – [♦b]? :- ∼[U ], R[♦b], [b]F – [a ∨ ♦b]?R :- ∼[U ], R[a ∨ ♦b]B, [a]F, [♦b]? PERSISTENCE CLAUSES – [a ∨ ♦b]T :- ∼[U ], R[a ∨ ♦b]R, [♦b]T – [a ∨ ♦b]?R :- ∼[U ], R[a ∨ ♦b]R, [♦b]? – a :- ∼[U ], a – SU CCESS :- ∼[U ], [a ∨ ♦b]T – b :- ∼[U ], b – R[a] :- ∼[U ], R[a] – R[b] :- ∼[U ], R[b] – R[♦b] :- ∼[U ], R[♦b] – R[a ∨ ♦b]B :- ∼[U ], R[a ∨ ♦b]B – R[a ∨ ♦b]R :- ∼[U ], R[a ∨ ♦b]R Summarising, we start from a formal property φ expressed as an FLTL formula, we compute a RuleRunner rule system RRφ monitoring that formula, and then we encode the rule set in a logic program LPφ . The correctness of this translation can be proved by induction on the monitoring process; we omit such proof due to lack of space. We can then exploit the CILP algorithm [8] to translate the logic program LPφ into an equivalent neural network N Nφ . Algorithm 4 CILP System - Translation 1: function CIL2 P (LPφ ) 2: Create empty network N N 3: for all clause c ∈ LPφ , c = h :- b1 , . . . , bn do 4: Add a neuron C to the hidden layer 5: If not there, add a neuron H to the output layer 6: Connect C to H 7: for all clause bi ∈ c do 8: If not there, add a neuron Bi to the input layer 9: Connect Bi to C 10: end for 11: end for 12: N Nφ = N N 13: return N Nφ 14: end function Each clause (c) of LPφ is mapped from the input layer to the output layer of N Nφ through one neuron (labelled C) in the single hidden layer of N Nφ . We omitted the computation of the parameters; intuitively, the Translation Algorithm from LPφ to N Nφ has to implement the following conditions: (C1) the input potential of a hidden neuron (C) can only exceed C’s threshold, activating C, when all the positive antecedents of c are assigned the truth value true while all the negative antecedents of c are assigned false; and (C2) the input potential of an output neuron H can only exceed H’s threshold, activating H, when at least one hidden neuron C that is connected to H is activated; the translation algorithm is sound and complete [8]. LP(a∨♦b) is translated into the neural network N N(a∨♦b) in Fig. 4. The result of the neural encoding phase is a neural network N Nφ able to perform runtime verification, monitoring φ over the given trace(s). The general algorithm for 11 neural monitoring is described in Algorithm 5. Adding x to a given layer means activating the neuron corresponding to x in that layer. In terms logic programming, this corresponds to adding the fact x to the program. It is worth comparing how, from an operational point of view, a RuleRunner system (RRφ ) and its neural encoding (N Nφ ) carry out the monitoring task. In each iteration of the main loop, RuleRunner goes through the list of evaluation rules, adding the result of each active rule to the state. When the end of the evaluation rules list is reached, the reactivation rules are allowed to fire, collecting the output in a new state. In the neural network the alternation of evaluation and reactivation is achieved by means of the [U P DAT E] neuron, which acts as a switch. In the evaluation phase, all evaluation rules are fired in parallel until convergence. Moving to the next trace cell is achieved by firing all reactivation rules in parallel once. R[a] a R[b] b R[⌃b] [b]F [b]T R[a _ ⌃b]B [a]F [⌃b]? R[a _ ⌃b]R [⌃b]T [a _ ⌃b]T [U ] [⌃b]? [a _ ⌃b]?R e p p e e p p e e p e p e e p e r r r [a]F R[a] a [b]T [b]F R[b] b [⌃b]? [⌃b]T R[⌃b] [a _ ⌃b]?R R[a _ ⌃b]B [a _ ⌃b]T R[a _ ⌃b]R [S] Fig. 4: (simplified) Neural network for monitoring a ∨ ♦b: N N(a∨♦b) . The input and output layers include neurons whose labels correspond to atoms in the logic program; active neurons are filled in grey, representing the initial state R[a], R[b], R[♦b], R[a ∨ ♦b]. [U ] stands for [U P DAT E] and [S] for [SU CCESS]. Hidden neurons correspond to clauses, labelled with e (resp. p,r) to mark that it encodes an evaluation (resp. persistence, reactivation) clause. Solid lines represent connection with positive weights, dashed lines represent negative weights. To simplify the visualisation, all recurrent connections and connections from [U ] are omitted in Fig. 4. For instance, the leftmost hidden neuron in Fig. 4 has ingoing connections from a (negative weight) and R[a] (positive weight), plus the negative connection from [U ] (omitted in the figure); it has an outgoing connection towards [a]F : this corresponds to the clause [a]F :- ∼[U P DAT E], R[a], ∼a, the first evaluation clause in LP(a∨♦b) . Algorithm 5 Runtime Verification using N Nφ 1: function NN- MONITOR(φ,trace t) 2: Create RRφ = hRR , RE , Si encoding φ (Algorithm 2) 3: Rewrite RRφ into LPφ (Algorithm 3) 4: Rewrite LPφ into N Pφ (Algorithm 4: CILP ) 5: Add S to the input layer 6: while new observations exist in t do 7: Add the new observations to the input layer 8: Let the network converge 9: if S contains SUCCESS (resp.FAILURE) then 10: return return SUCCESS (resp.FAILURE) 11: end if 12: Add U P DAT E to the input layer 13: Fire the network once 14: end while 15: end function 12 5 Experiments 5.1 Parallel Monitoring The encoding of RuleRunner in a neural network allows for the monitoring of several traces in parallel. A cell in a trace is a vector of observations which is fed to the neural network. To monitor multiple traces, it is sufficient to compose all observation vectors (of different traces at the same time) as rows in a matrix. The monitoring is then carried out as explained above, with all vector-matrix steps substituted by matrix-matrix operations. Furthermore, the matrix-based nature of neural networks allows for straightforward optimisations in order to speed up the monitoring: we focused on sparse form and GPU computation, using Matlab 2013a for all experiments. A sparse matrix is a matrix primarily populated by zeros (e.g. any identity matrix). The sparse(m) function is a lossless compression that stores only the non-null elements in a matrix m. The weight matrices in our networks are sparse: the size depends on the complexity of the encoded formula (and therefore the number of rules in the monitor), but the number of non-null elements per row/column is constant, as each rule (clause) only involves truth values of direct subformulae and has a constant number of literals in the body/head. We also exploited MathWork’s Parallel Computing Toolbox to run our code on GPU. We used GPUArray, a special array type with several associated functions that allows to perform computations on CUDA-enabled NVIDIA GPUs directly from MATLAB. In the following set of experiments, the base implementation for φ is the neural-network encoding a monitor for φ as described in the previous section (N Nφ ); the sparse and GPU implementations were simply obtained, respectively, the sparse(·) and gpuArray(·) functions to all the matrices in the base implementation. (A) (B) (C) (D) Fig. 5: Scalability experiments for matrix-based monitors In terms of code, the difference between the three implementations is deliberately minimal, as our goal is to show how tiny modifications to the code allow for performance improvement. For this experiments, we generated random formulae from the FLTL grammar and sets of observations: rather than testing domain-specific properties, we 13 aimed at checking a set of unbiased formulae. The complexity of the temporal formulae is measured in terms of leaves of the parsing tree: for instance, (a ∨ b) has two leaves (a, b), ♦(Xa ∧ ((c ∨ d)U d)) has four leaves (a, b, c, d), et cetera. In Figure 5.A, we show how the length of the trace has no impact on the average time required to monitor a cell: the resources required by our system do not grow over time. In Figure 5.B, we measured how increasing the complexity of the formula impacts on the average time required to monitor a cell. The first striking feature of these curves is that, while for simple formulae base and sparse are significantly faster than gpu, for higher number of leaves base ends up having average times similar to those of gpu. Moreover, base’s curve looks steeper than the other two. The same values can be averaged over the number of leaves: that is, given the average time required to monitor a cell when monitoring a formula with n leaves, we can divide that time value by n. The chart in Figure 5.C shows how the gpu curve significantly lowers the time per cell per leaf, when the number of leaves increases. Our explanation for this behaviour is that the GPU implementation suffers from a remarkable overhead in the CPU-GPU communication: this delta becomes less influent when making the formula more complex and thus the matrices bigger. Finally, it is relevant to measure the increase rate of these curves: to do so, we divided all measured times by the first experiment in each curve (Figure 5.D). We observed that base scales linearly w.r.t. the number of leaves: for instance, the experiment with 32 leaves is 16 times slower than the experiment (with the same implementation) with 2 leaves. The curve for the sparse form is more gradual, with the last experiment being ∼ 4 times slower than the first one. But the implementation with the smoothest curve is the gpu-based one, with the 32-leaves experiment being only ∼ 1.5 times slower than the 2-leaves one. We ran other tests to study the impact of other features, such as the number of consecutive next operators or the interleaving of temporal and classical operators. All tests generated curves following the same patterns. We thus observed how the sparse form allows for a speed-up of the monitoring, while the gpu implementation shows the best scaling factor. 5.2 Learning as property adaptation Neural networks also offer intrinsic learning capabilities. In our system, observations (events) are presented to the monitor as the activation of labelled input neurons. The feedforward propagation then causes some output neurons to be activated, representing the result of the monitoring of the encoded formula on the given cell. It may be the case that a user or domain expert would like to modify the verdict of a monitoring task: for instance, consider a security system built to detect consecutive login operations from the same user with no logout in between: φ = login ⇒ X(!login ∪ logout). Now suppose the system detects a number of violations, in a given LAN, that the security manager judges as false positives. For instance, that LAN could be a lab where users log in from different devices, or it could be a trustable section of the network. It is reasonable that the security manager wants to maintain the security system, but also to relax the formal property in order to include logins from the given LAN, and therefore avoid false positives and improve efficiency. Reclassifying a trace means demoting the causality relation between the observations and the actual output while promoting the relation between observations and expected output: it is, in fact, a supervised learning task. 14 in out v1 prv v2 in out v1 prv v2 lan e1 e2 e3 e1 e2 e3 e4 0.45 0.8 0.8 0.8 0.8 0.8 0.8 0.42 -0.51 v1 v2 [F] [S] v1 v2 [F] [S] Fig. 6: Simplified network for the consecutive logins example, representing the current login (in), logout (out), the existence of a previous login (prv), the occurrence of a login without a logout before (v1, corresponding to a violation of !login ∪ logout), the occurrence of v1 when a previous login occurred (v2, corresponding to !φ) and, finally, flags for global violation ([F ] for failure) and success ([S]). The security manager tells the system to accept the consecutive logins occurring in LAN. A new input neuron, lan is added to the system. The learning task is performed as training of two perceptrons: P 1 maps {e1, e2, e3, e4} onto [F ], while P 2 maps the same hidden neurons onto [S]. The perceptrons are conceptually isolated by the rest of the network as follows: the output is the flag to be modified, and the inputs are all hidden neurons in the network. Note that, even if all hidden neurons are used as inputs for a perceptron and its learning algorithm, only the weights of the connections from active neurons can be modified, thus limiting the algorithm to the relevant (currently active) hidden neurons. P 1 is trained to accept only traces where e3, !e4 occurs, while for P 2 the positive examples involve e3, e4. Both perceptrons are trained with the standard perceptron learning rule: Figure 6 shows the results when applying the learning rule with η = 0.2 and thresholds = 0.5. Relevant connections are labeled with their weights; connections with weight close to 0 are omitted. For P 1, the weights from e1, e2, e3 are only slightly modified by the learning; the connection from e4, on the contrary, goes from being 0 to a negative value. For P 2, since the network is trained to associate the co-occurence of the violation v2 =!φ and lan, the weights from e3 and e4 towards [S] are increased. As a result, when lan is not active the network displays the nominal behaviour and signals a violation ([F ]); on the contrary, when both lan and v2 are active, the negative weight from e4 prevents [F ] from being activated, while the sum of the signals from e3 and e4 activates [S]. 6 Conclusions and Future Work Firstly, we developed a runtime monitoring system, RuleRunner, to monitor FLTL properties on finite traces. Secondly, we discussed the encoding, partially based on CILP’s Translation Algorithm, of a RuleRunner rule system in a standard feedforward neural network with recurrent connections and one hidden layer. Thirdly, we showed how matrix-based optimisations allows for major improvements in terms of performance, either concerning actual speed (matrices in sparse form) or scaling factor (GPU computation). Finally, we pointed out how in our framework learning corresponds to property adaptation. A first direction for future work is to adopt and compare more advanced techniques for 15 parallel monitoring, such as the MapReduce framework (recently applied to Runtime Verification in [1]) and the parallel algorithms in [5]. A second direction for future work is the development of ad-hoc learning strategies, mostly based on the Backpropagation algorithm [11], for specific classes of adaptation problems, such as soft norms in BPM [19] and false positives in Security [6]. References 1. B. Barre, M. Klein, M. Soucy-Boivin, P.-A. Ollivier, and S. Hall. Mapreduce for parallel trace validation of ltl properties. In Procs. of Runtime Verification 2012, pages 184–198. 2. H. Barringer, D. E. Rydeheard, and K. Havelund. Rule systems for run-time monitoring: from eagle to ruler. Journal of Logic and Computation, volume 20:pages 675–706, 2010. 3. A. Bauer, M. Leucker, and C. Schallhart. The good, the bad, and the ugly, but how ugly is ugly? In Procs. of Runtime Verification 2007, pages 126–138. 4. A. Bauer, M. Leucker, and C. Schallhart. Monitoring of real-time properties. In Procs. of Foundations of Software Technology and Theoretical Computer Science 2006, pages 260–272. 5. S. Berkovich, B. Bonakdarpour, and S. Fischmeister. Gpu-based runtime verification. In Procs. of the IEEE International Symposium on Parallel and Distributed Processing 2013, pages 1025–1036. 6. D. Breitgand, M. Goldstein, and E. H. Shehory. Efficient control of false negative and false positive errors with separate adaptive thresholds. Network and Service Management, IEEE Transactions on, 8:128–140, 2011. 7. A. Coates, B. Huval, T. Wang, D. J. Wu, B. C. Catanzaro, and A. Y. Ng. Deep learning with cots hpc systems. In Procs. of International Conference on Machine Learning 2013, pages 1337–1345. 8. A. S. d’Avila Garcez and G. Zaverucha. The connectionist inductive learning and logic programming system. Applied Intelligence, volume 11:pages 59–77, 1999. 9. D. Drusinsky. The temporal rover and the atg rover. In Procs. of the International Workshop on SPIN Model Checking and Software Verification 2000, pages 323–330. 10. C. Eisner, D. Fisman, J. Havlicek, Y. Lustig, A. McIsaac, and D. V. Campenhout. Reasoning with temporal logic on truncated paths. In Procs. of Computer-Aided Verification 2003, pages 27–39. 11. S. Haykin. Neural Networks: A Comprehensive Foundation (3rd Edition). 2007. 12. P. Heimann, G. Joeris, C.-A. Krapp, and B. Westfechtel. Dynamite: Dynamic task nets for software process management. In Procs of International Conference on Software Engineering 1996, pages 331–341. 13. Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning. In Procs. of International Conference on Machine Learning 2012. 14. M. Leucker and C. Schallhart. A brief account of runtime verification. Journal of Logic and Algebraic Programming, volume 78:pages 293–303, 2009. 15. O. Lichtenstein, A. Pnueli, and L. D. Zuck. The glory of the past. In in Procs. of Logic of Programs 1985, pages 196–218. 16. J. Lukasiewicz. O logice trjwartosciowej (On Three-Valued Logic). 1920. 17. A. Pnueli. The temporal logic of programs. In Procs. of the Annual Symposium on Foundations of Computer Science 1977, pages 46–57. 18. G. G. Towell and J. W. Shavlik. Knowledge-based artificial neural networks. Artificial Intelligence, volume 70:pages 119–165, 1994. 19. W. M. P. van der Aalst et.al. Process mining manifesto. In Procs of Business Process Management Workshops 2011, pages 169–194.