Towards Net-based Formal Methods
            for Complex Event Processing

            Michael Offel, Han van der Aa, and Matthias Weidlich

        Deparment of Computer Science, Humboldt-Universität zu Berlin
      {michael.offel,han.van.der.aa,matthias.weidlich}@hu-berlin.de


      Abstract. Complex event processing (CEP) systems evaluate continuous
      queries over event streams in order to detect event patterns that charac-
      terise a situation of interest. Over the last decade, models and systems
      for CEP have been an active area of research. A plethora of languages for
      the definition of queries as well as algorithms and architectures for their
      efficient and robust evaluation have been proposed. However, we argue
      that much of this work adopted a pragmatic view, striving for technical
      solutions of problems that are rooted directly in specific applications. As
      such, formal methods that help in the design and implementation of CEP
      systems are underdeveloped. In this paper, we put forward the idea of
      exploiting well-established formalisms for concurrent systems in the anal-
      ysis of CEP systems. We discuss query verification, query parallelisation,
      and out-of-order event handling as three exemplary use cases for formal
      methods for CEP. Moreover, we outline our initial results on addressing
      these use cases through a Petri-net-based formalisation of event queries,
      event streams, and evaluation architectures.


1   Motivation
In complex event processing (CEP), predefined queries are continuously evaluated
over streams of events in order to detect situations of interest [1]. As such, CEP
serves as the foundation of applications related to monitoring, detection, and
online response in domains such as healthcare and urban transportation.
CEP Query Languages. Various languages for the definition of CEP queries have
been proposed in the literature, differing in syntax and semantics. Yet, many
languages adopt point-based event semantics (an event occurs at a particular point
in time), an attribute-based data model (the payload of an event is structured
by its type, given as a relational schema), and feature a set of common query
operators [8]. The latter includes sequencing, the definition of a list of event
types that shall occur in the respective temporal order; negation, the check for
the absence of an event of a particular type; value predicates, the definition of
attribute-based match conditions for events; and windowing, the specification of
a time interval for the occurrence of events.
    Consider, for instance, monitoring of clinical pathways in a hospital. Events
may indicate that a patient received an infusion or was examined. Evaluating the
query defined in Figure 1 over such events would then detect a situation where a
patient gets two infusions without being examined in between.
  Pattern SEQ ( infusion i1 , ! examination e , infusion i2 )
  Where i1 . patient = e . patient And i1 . patient = i2 . patient
  And e . scope = ’ full vitals ’
  Within 2 h

Fig. 1: Example query to detect a sequence of two infusions without examination.

CEP Systems. Aiming at low-latency evaluation of CEP queries over high-
velocity event streams, various algorithms and architectures for efficient query
processing have been proposed. Here, angles for optimisation include techniques
for parallelisation and distribution of query evaluation, semantic rewriting of
queries [6], or sub-pattern sharing among multiple queries [4].
    Since event streams often originate in distributed sources, techniques to make
query evaluation robust in a distributed setting have been another focal point of
research. To cope with out-of-order arrival of events at a CEP system (i.e., their
arrival order contradicts the timestamps assigned by event sources), techniques
for event buffering and speculative query evaluation have been developed [2].


Use Cases for Formal Methods in CEP. Reflecting on CEP query languages and
systems, we argue that much of the aforementioned work adopted a pragmatic
view and was primarily driven by specific applications. The focus has been on
the definition of query languages that are appropriate for a particular context.
Yet, despite some notable attempts (e.g., based on data-enriched automata or
logic-based rules), there is no commonly accepted, comprehensive model that
could serve as a formal foundation for the semantics of CEP queries. In the
same vein, techniques for optimised and robust query evaluation are mostly
built on empirical arguments (e.g., by tuning cost models for query evaluation).
Formal foundations of CEP architectures (e.g., parallelisation schemes or buffering
methods) have been largely neglected in the literature.
    These observations are despite the potential value of formal methods for
various questions related to the design and implementation of CEP systems.
Examples of such questions are:
  – Query verification: Can a query deployed in a CEP system match at all, if
    event generation follows a set of domain-specific constraints?
  – Sound workload parallelisation: If the queries of a workload are evaluated
    concurrently (e.g., through data- or task-parallelism) [3], will the result
    become non-deterministic due to interactions between different queries?
  – Robustness guarantees: If events arrive out-of-order at a CEP system [2], will
    the result of a query be affected and, if so, what kind of guarantees can still
    be given (e.g., no false positives or no false negatives)?
To enable reasoning on the above questions, we advocate the use of well-established
formalisms to model complex event processing. However, we note that this requires
a comprehensive formalisation that integrates the semantics of event streams,
event queries, and evaluation architectures. Below, we outline our initial steps
towards such a unified formal foundation of CEP.
          oinf1oinf1
                 nexh1nexh1
                         nexv1nexv1

                                                                                 oinf1oinf1oinf2oinf2 ninf1oexh2
                                                                                                              ninf2oexv2
             infusion,1
                 infusion,1                  infusion,2
                                                 infusion,2

          examination,1
             examination,1                  examination,2
                                               examination,2
             (heart)                                                                match
                                                                                       match          f1 f1 f2 f2          ...     fn
sp1 sp1         (heart)               sp2 sp2 (heart)
                                                  (heart)
                                                                  qpqp
                                                                    before
                                                                        before
                                                                                                                            qpafter     qpafter
          examination,1
              examination,1                examination,2
                                               examination,2                               matched
                                                                                             matched           failed      failed
            (full(full
                  vitals)
                       vitals)               (full(full
                                                   vitals)
                                                        vitals)


 (a) Event stream (some places omitted)                               (b) Potential query match

                         Fig. 2: Excerpts of Petri-nets modelling CEP concepts.

2         Formal Analysis of CEP Systems
To analyse CEP systems, we developed a formal model of event streams, event
queries, evaluation architectures based on Petri-nets [5]. The choice of Petri-nets
as a formalism is motivated by their truly concurrent (instead of interleaving) and
local (instead of global) semantics, which naturally captures the characteristics
of distributed event sources and concurrent query evaluation. Also, a large body
of verification techniques (including tool support) are available for Petri-nets.
    Below, we outline the general idea of our formalisation by means of an example.
We further sketch how the above questions on the design and implementation of
a CEP system can be approached based on reachability analysis.
Formalising Streams, Queries, and Evaluation Architectures. A Petri-net is
a directed graph of places (denoted by circles) and transitions (denoted by
rectangles). Its state is modelled by a marking, a distribution of tokens over
places. Transitions model state changes: If all places in the pre-set of a transition
carry a token, the transition may fire and, by doing so, consume one token from
each place in its pre-set and produce one token in each place in its post-set.
     We capture an event stream as shown in Figure 2a. If place sp1 is marked,
which indicates the start of the stream, three transitions can be fired. They
represent the generation of an event, at time unit 1, of type infusion or of type
examination, the latter having attribute scope set to heart or full vitals. Upon
firing, the transitions mark places that capture the occurrence (e.g., oinf 1 for the
infusion event) or non-occurrence (e.g., nexh1 for the heart examination event) of
an event. By extending the net at place sp2 , a complete event stream is modelled.
This way, domain constraints may be represented, e.g., that examination events
of different scope cannot follow upon each other directly.
     For event queries, the model captures potential query matches, i.e., particular
combinations of events that represent a pattern. For the example query, a potential
match would be an occurrence of two infusion events following each other directly.
The respective net, exemplified in Figure 2b, features one transition (match) for
pattern occurrence, which may only fire if the places in its pre-set (potentially
marked by the transitions representing the stream) indeed indicate the occurrence
of the respective events. In addition, it also features two transitions (f1 , f2 ) to
represent that a pattern did not occur. This particular potential match cannot
occur, if either the first or the second stream event is not of type infusion.
   The evaluation architecture is represented by how the nets of potential matches
are glued together. For instance, to capture centralised, single-threaded query
evaluation, the nets of potential matches are composed sequentially, connecting
qpafter of one potential match with qpbefore of another one. Then, the order of
composition may not only capture the temporal order of potential matches, but
can also represent evaluation priorities among queries in a workload. Parallel
composition of nets, in turn, models data- or task-parallelism in query processing.
Reasoning on CEP Systems. A formalisation as sketched above enables reasoning
on CEP systems based on reachability analysis, i.e., by exploring whether certain
states in the Petri-net can be reached from an initial marking. Specifically, for
query verification, the reachability of markings that cover places matched in
all nets of potential query matches determines whether domain constraints on
the event stream preclude the detection of a pattern through a specific query.
Soundness of a workload parallelisation scheme is established by verifying whether
the reachability of markings covering places matched changes, when varying the
composition of the nets that model potential query matches. Finally, out-of-order
arrival of events may directly be encoded when glueing together the nets of a
stream and potential query matches. Again, the reachability of markings covering
places matched and failed after incorporating these changes enables conclusions
on the possibility of generating false positives and false negatives.
    Note that the above formalisation will create Petri-nets of significant complex-
ity for realistic scenarios. However, state-of-the-art model checkers for Petri-nets
are able to handle dozens of millions of states [7], rendering such analysis feasible.

3    Conclusion
We put forward the idea to develop formal methods for CEP, based on well-
established formalisms for concurrent systems. Specifically, we presented initial
results on how to approach query verification, query parallelisation, and out-of-
order event handling through a Petri-net-based formalisation of event queries,
event streams, and evaluation architectures. Challenges are still imposed by
the expressiveness of CEP languages, featuring concepts such as Kleene closure
operators and advanced event consumption policies. Also, we are in the process
of developing tool support for the envisioned verification of CEP systems.

References
1. Cugola, G., Margara, A.: Processing flows of information: From data stream to complex event
   processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)
2. Liu, M., et al.: Sequence pattern query processing over out-of-order event streams. In: ICDE.
   IEEE CS (2009)
3. Mayer, R., et al.: SPECTRE: Supporting consumption policies in window-based parallel complex
   event processing. In: Middleware. pp. 161–173. ACM (2017)
4. Ray, M., et al.: Scalable Pattern Sharing on Event Streams. In: SIGMOD. pp. 495–510 (2016)
5. Reisig, W.: Understanding Petri Nets. Springer (2013)
6. Weidlich, M., et al.: Optimizing Event Pattern Matching Using Business Process Models. IEEE
   TKDE 26(11), 2759–2773 (2014)
7. Wolf, K.: Petri net model checking with LoLA 2. In: PETRI NETS. LNCS 10877, pp. 351–362.
   Springer (2018)
8. Zhang, H., et al.: On complexity and optimization of expensive queries in complex event process-
   ing. In: SIGMOD. pp. 217–228. ACM (2014)