Static Analysis of Complex Event Processing Programs Adrián García-López Loli Burgueño Antonio Vallecillo Universidad de Málaga, Spain Universidad de Málaga, Spain Universidad de Málaga, Spain agl@lcc.uma.es UOC, Barcelona, Spain av@lcc.uma.es CEA-List, Paris, France loli@lcc.uma.es ABSTRACT maintain the code quality and give developers immediate feedback Complex Event Processing (CEP) provides a mechanism to effi- in the early development phases of the CEP software system. ciently correlate and infer conclusions about systems by means of The rest of the paper is structured as follows. First, Section 2 analyzing the events they process. In areas such as the Internet of presents the background of our work. Then, we present our main Things (IoT), Cyber Physical Systems (CPS), system monitoring or contribution in Section 3, and our tool in Section 4. Section 5 dis- data streaming analytics, CEP is able to read events from a data cusses a validation exercise we have used to evaluate our approach, stream and to generate complex events that represent situations using a real CEP application. Finally, Section 6 compares our ap- of interest to the system owner by means of event patterns. Every proach with other related works, and Section 7 concludes and out- time a sequence of events matches a pattern, a complex event is lines our future lines of work. created and added to the data stream. The dependencies among the rules and the possibility of non-confluent behavior of CEP rule- 2 BACKGROUND based systems may lead to unexpected outputs when executing Complex Event Processing offers a form of data processing [6] CEP programs. In this work, we show how to statically check and that aims at defining and detecting situations of interest, from the correct two particular properties of CEP systems: rule acyclicity analysis of low-level event notifications [7]. and rule race conditions. We use Esper EPL as a CEP language, and There are two types of events in a CEP system: simple events, present a tool we have developed to perform these analyses. which contain the information received by the sensors; and complex events, which are generated by the CEP system and inserted into KEYWORDS the data stream [10]. Each event has a type and a set of attributes Complex Event Processing; Esper EPL; Static Analysis; Rule Acyclic- associated. CEP programs are composed of sets of patterns that ity; Rule Race Conditions analyze and match sequences of events (both simple and complex) taking into account their content and temporal relations. Each time a match occurs, a complex event is created. 1 INTRODUCTION Although several CEP systems and languages exist, they all share Domains such as Internet of Things (IoT), Cyber-Physical Systems the same basic concepts, mechanisms and structure. In this paper (CPS), social networks, or system monitoring, generate gigabytes we write the rules in one particular EPL, called Esper EPL [8]. of data every second. In order to take advantage of that situation, approaches for data analytics have arisen and are used to extract useful information and knowledge from it. 2.1 CEP by example Complex Event Processing (CEP) [9, 15] addresses the issue of In order to explain CEP and to illustrate our proposal, we will use analyzing a specific kind of data: events about facts or situations the Smart House case study we introduced in [18]. In this system, occurring in real-time. CEP systems deal with the tasks of process- a house contains a set of devices and services that cooperate to ing streams of events and the identification of significant patterns simplify the lives of its residents. by means of techniques such as detection of relationships among For the particular case of ensuring security in situations of fire, events, event correlation, and aspects such as causality and timing. we know that when a fire starts, the temperature increases at a CEP systems are developed using Event Pattern Languages (EPL), rate higher than a given value; and carbon monoxide (CO), which which are normally rule-based languages that define rules for each weights less than air, increases and accumulates in the ceiling. For pattern. Patterns are triggered when their matching conditions are this reason, temperature and carbon monoxide (CO) sensors have met (e.g., the relevant source event occurs) and are in charge of been installed in the ceiling. They record the absolute value of the generating the resulting complex events. temperature and CO in their respective position every second and As with any other rule-based language, CEP programs are easy send signals to the central system. to develop and very efficient when they are composed of a small In case of fire, another key aspect is to determine whether or not set of rules. However, as the number of patterns (rules) grow, the someone is at home. For this purpose, our system uses the mobile complexity of CEP programs becomes unmanageable, their behav- phones of its occupants as motion detectors that inform of their ior can be unpredictable, and checking their correctness becomes a geographical coordinates at all times. very difficult task. Using the measurements produced by these sensors, and using In this work, we present a tool for the static analysis of CEP pro- Esper EPL, we have defined the set of CEP rules that we show in grams, which addresses the issues of detecting and fixing acyclicity Listing 1. Lines 1–3 show the types of the simple events, in the form and rule race conditions. This static analyses help improve and of templates that specify their name and attributes. In this example, MDE4IoT’18, October 2018, Copenhagen, Denmark Adrián García-López et al. Listing 1: Esper EPL patterns for the Smart House Pattern TempWarning detects if four events of type TempIncrease 1 c r e a t e schema Home ( id i n t , ts i n t , x i n t , y i n t , sqre i n t , occur in a five-minute window where the first one starts from a 2 temp i n t , co i n t , dopen b o o l e a n ) ; temperature equal to or greater than 33 degrees, and the rest of 3 c r e a t e schema Person ( pid i n t , ts i n t , x i n t , y i n t ) ; them detect a temperature increase of two or more degrees. 4 5 @Name( " TempIncrease " ) A FireWarning event is created every time an event of type 6 i n s e r t i n t o TempIncrease COHigh is detected, followed by another of type TempWarning, ev- 7 s e l e c t h2 . ts as ts , h1 . id as id , h2 . temp as temp , erything within less than 5 seconds. 8 h2 . temp −h1 . temp as incr 9 from p a t t e r n [ ( every ( h1 = Home ( ) −> A NobodyHome event is created from the primitive events Home 10 h2= Home ( h2 . temp −h1 . temp >=2 and h2 . id=h1 . id ) ) ) and Person when the main door of the house is closed and there is 11 where timer : within ( 1 minutes ) ] ; 12 nobody whose geographic coordinates are within the perimeter of 13 @Name( " TempWarning " ) the house. 14 i n s e r t i n t o TempWarning Finally, pattern CallFireDepartment creates an event of the 15 s e l e c t t4 . ts as ts t1 . id as id , t4 . temp 16 from p a t t e r n [ ( every ( t1 = TempIncrease ( t1 . temp >= 3 3 ) ) same name when there is nobody at home and a fire warning has 17 −> ( t2 = TempIncrease ( t2 . temp > t1 . temp and t2 . id = t1 . id ) ) been detected. In these cases, the central system must react and call 18 −> ( t3 = TempIncrease ( t3 . temp > t2 . temp and t3 . id = t1 . id ) ) the fire department. 19 −> ( t4 = TempIncrease ( t4 . temp > t3 . temp and t4 . id = t1 . id ) ) ) 20 where timer : within ( 5 minutes ) ] ; In Esper EPL, as can be inferred from previous patterns, each 21 complex event created has a type and set of attributes indicated in 22 @Name( " COHigh " ) 23 i n s e r t i n t o COHigh the select clause of the pattern. 24 s e l e c t h1 . ts as ts , h1 . id as id 25 from p a t t e r n [ ( every ( h1 = Home ( h1 . co >= 5 0 0 0 ) ) ) ] ; 26 27 @Name( " FireWarning " ) 2.2 Problems with patterns 28 i n s e r t i n t o FireWarning 29 s e l e c t tw . id as id , coh . ts as ts Due to the nature of rule-based declarative languages such as Esper 30 from p a t t e r n [ ( every ( coh = COHigh ( ) ) −> EPL, their patterns (rules) could be executed in any order. Pattern 31 every ( tw = TempWarning ( tw . id = coh . id ) ) ) 32 where timer : within ( 5 seconds ) ] ; execution of course depends on the arrival of the events and thus 33 the event arrival would constrain the order in which patterns will 34 @Name( " NobodyHome " ) execute. However, CEP systems can generate complex events that 35 i n s e r t i n t o NobodyHome 36 s e l e c t p . ts as ts , h . x as x , h . y as y , h . id as id are added to the event stream, and there are rules that depend on 37 from p a t t e r n [ ( every h = Home ( not dopen ) −> every ( p= Person ( these complex events to produce further complex events. The order 38 ( p . x <= ( h . x − Math . sqrt ( h . sqre ) / 4 ) ) or 39 ( p . x >= ( h . x + Math . sqrt ( h . sqre ) / 4 ) ) or in which these rules are triggered may have a significant effect on 40 ( p . y <= ( h . y − Math . sqrt ( h . sqre ) / 4 ) ) or the final output of the program. This makes the execution of CEP 41 ( p . y >= ( h . y + Math . sqrt ( h . sqre ) / 4 ) ) ) ) ) programs non-deterministic and hard to maintain, test and debug. 42 where timer : within ( 3 seconds ) ] ; 43 The two main source of problems we detect in this work are rule 44 @Name( " CallFireDepartment " ) acyclicity conditions and rule race conditions. Rule acyclicity takes 45 i n s e r t i n t o CallFireDepartment place, for instance, when a pattern A generates events of type a, 46 s e l e c t fw . id as id , fw . ts as ts 47 from p a t t e r n [ ( every ( nh = NobodyHome ( ) ) −> which are consumed by another pattern B, which at the same time 48 fw = FireWarning ( fw . id = nh . id ) ) ) generates events of type b that are consumed by A. 49 where timer : within ( 5 seconds ) ] ; Rule race conditions occur when there are input-output depen- dencies between rules in uncontrollable or non-deterministic sit- uations. In the case of CEP, rule race conditions happen when a pattern consumes complex events that have not been produced by there are two types of simple events that serve as inputs to the CEP the time it is being executed, but afterwards. Hence, the pattern program: Home and Person. Each Home event keeps information misses them and its match and execution never take place although of the house identifier, the event timestamp, the house location they should. For example, pattern A produces complex event a; given its geographical coordinates x and y, its size measured in pattern B produces complex event b; and pattern C requires the oc- square meters, the temperature degrees and CO accumulated in the currence of the events a and b to produce the event c. What happens environment and whether the main door is open or not. Regarding if pattern C is checked before patterns A or B are triggered? the simple events of type Person, they are represented by a tuple with information about the person identifier, her corresponding geographical location given by its x and y coordinates, and the 3 STATIC ANALYSIS timestamp in which the event is produced. In order to address the problems presented in Section 2.2, we repre- Our example contains six event patterns. The pattern called sent CEP programs as directed graphs where the patterns are nodes TempIncrease matches Home events in a time window of one minute and the dependencies between patterns are the graph edges. Note and checks whether the temperature of the house has increased that two patterns can generate the same complex event. In the case in 2 or more degrees. If so, it generates a complex event with the that that complex event is consumed by a third pattern, this third same name. Similarly, the pattern COHigh creates a complex event pattern has dependencies with the two former patterns. each time the sensors detect CO levels above 5000 ppm. Static Analysis of CEP Programs MDE4IoT’18, October 2018, Copenhagen, Denmark Person Home Home Person NobodyHome TempIncrease TempIncrease NobodyHome TempWarning COHigh COHigh TempWarning FireWarning FireWarning CallFireDepartment CallFireDepartment Figure 1: Rule dependencies Figure 2: Priority problems Once we calculate the priority that each pattern should have, we 3.1 Rule acyclicity detection compare them with the priorities assigned by the user. With the Rule acyclicity is not a problem in itself but it can cause problems, result of the analysis we generate two files. One with the program such as infinite loops that flood the system with useless events. dependency graph that, when interpreted and visualized, shows in Thus, we decided to detect those cycles and inform the engineer red every node for which the priorities do not coincide, and one about them. with the Esper EPL code containing the same code as the original First of all, we build the graph corresponding to the Esper EPL file in which the priorities have been fixed. code. As mentioned before, a node is created for each pattern. Edges As an example, Figure 2 shows the graph generated from the code are created as follows. For each pattern R, we analyze its from in Listing 1. We can see how, due to the lack of priorities (which pattern to detect the events involved in its matching phase, track the Esper engine assumes as priority 0), patterns TempWarning, the patterns in which these events are produced, and create an edge FireWarning and CallFireDepartment are suggested for review. in the graph associating the pattern R with each one with which it Listing 2 shows the excepts of the new generated Esper file in which has dependencies. In a second step, once the dependency graph is the right priorities are explicitly defined. In this context, right means built, we apply the Kosaraju-Sharir algorithm [21] to detect cycles. that the rule priorities ensure the correct order of triggering, in Imagine that due to copy-paste, the code of our example con- order to avoid rule race conditions. tains errors. Imagine that the rule TempIncrease stated in its from pattern that depends on FireWarning events, and the COHigh rule 4 THE CEPA TOOL consumed its own events. The generated graph following our ap- We have created an Eclipse plug-in called CEPA1, 2 to give support proach would be the one presented in Figure 1. Note how cycles to our approach. We have used Xtext,3 which allows the definition of are depicted in red to catch the engineer’s attention. Domain-Specific Languages (DSL) and offers a parser, typechecker and compiler for free; and GEF, 4 which permits the integration of 3.2 Rule race detection and solving the Graphviz5 tool for graph visualization within Eclipse. In Esper EPL, priorities can be assigned to patterns by using the Using Xtext, we have defined a simplified grammar that parses label @Priority(n) where n is a non-negative integer. By default, if any valid Esper EPL program, and builds an Abstract Syntax Tree the priority is not specified, the CEP engine assumes that the pattern (AST) that focuses only on the parts of interest needed to extract has the highest priority, which is 0. Users may assign priorities to the pattern dependencies. In this way, the AST only contains in- rules in order to avoid the rule race conditions described in Sect. 2.2. formation about the select and from pattern parts of the CEP The lack of priority assignation, or errors in their assignments, programs, ignoring the rest of the language syntax. can lead to race conditions between patterns—specially those that Given the AST provided by Xtext, we have built a Java program consume complex events. In order to detect these conditions, we that creates a graph containing the primitive events and pattern also rely on the graph representation of the Esper EPL programs. dependencies and perform the analyses mentioned in Section 3. Firstly, the acyclicity analysis has to be performed. If it detects If the acyclicity analysis detects cycles, we use the Xtend6 code cycles, we cannot make any conclusion about the correctness of generator to automatically generate the textual code that represent the priorities. On the contrary, if the graph does not contain cycles, 1 https://github.com/Garlo13/CEPStaticAnalysis we are able to generate the priority of each node given their depen- 2 CEPA stands for CEP Analysis. Cepa is also the Spanish word for strain. 3 https://www.eclipse.org/Xtext/ dencies. In order to do so, we make use of the graph topological 4 https://www.eclipse.org/gef/ order of their nodes. The priority of each node is the maximum of 5 http://www.graphviz.org/ the priorities of its predecessors plus one. 6 http://www.eclipse.org/xtend/ MDE4IoT’18, October 2018, Copenhagen, Denmark Adrián García-López et al. Listing 2: Esper EPL patterns with priorities 5 VALIDATION 1 @Name( " TempIncrease " ) In order to validate our approach, we used a real CEP application 2 i n s e r t i n t o TempIncrease that currently runs across the Andalusian region in southern Spain, 3 @Priority ( 0 ) 4 select . . . and controls the quality of the air in real time, using the index 5 proposed by the U.S. Environmental Protection Agency (EPA) [17]. 6 @Name( " TempWarning " ) The raw data is obtained from the Andalusian Regional Govern- 7 i n s e r t i n t o TempWarning 8 @Priority ( 1 ) ment’s sensor network, which is composed of 61 sensor stations. 9 select . . . Each station measures every 10 minutes six air pollutants: carbon 10 11 @Name( " COHigh " ) monoxide, ozone, nitrogen dioxide and sulfur dioxide. The EPA 12 i n s e r t i n t o COHigh proposes the analysis of these pollutant every 1, 8 or 24 hours and 13 @Priority ( 0 ) express the results in a 6-grade scale, from Good to Hazardous. 14 select . . . 15 The definition of this CEP program7 , which is written in Esper 16 @Name( " FireWarning " ) EPL [8], contains 43 patterns. The information about each pollutant 17 i n s e r t i n t o FireWarning measurement is aggregated by one pattern, and other six patterns 18 @Priority ( 2 ) 19 select . . . per pollutant determine the air quality grade for it. This makes 20 seven patterns for every pollutant, and therefore the CEP program 21 @Name( " NobodyHome " ) 22 i n s e r t i n t o NobodyHome uses 42 patterns to analyze the six pollutants of interest. A final 23 @Priority ( 0 ) pattern, AirQualityLevel, calculates the global air quality level by 24 select . . . computing the maximum of the grades obtained for each pollutant. 25 26 @Name( " CallFireDepartment " ) Interestingly, we discovered that, although there are no cycles in 27 i n s e r t i n t o CallFireDepartment this program, these 43 patterns suffered from rule race conditions 28 @Priority ( 3 ) due to dependencies between the patterns. These dependencies may 29 select . . . cause that some patterns are triggered before the patterns on which they depend, hence leading to erroneous results and conclusions about the quality of the air in the region. Our tool was able to detect this error and to propose the cor- Parse Program Esper EPL Create Graph rect assignment of priorities to the application CEP patterns that and create AST AST Graph Program (Java) (Xtext) permitted solving the rule-race condition problem. Generate Yes Perform Acyclicity Graphviz Graphviz Cycles? Analysis 6 RELATED WORK Graph Code (Java) (Xtend) Rabinovich et al. [19] analyze the behavior of CEP applications us- No Generate ing static and dynamic techniques using formal methods. Cugola et Perform Race Graphviz Graphviz No Race Condition Analysis al. [7] transform the property checking task into a set of constraint Graph Code Cond? (Xtend) (Java) solving problems. Although these approaches offer more kinds of Yes analyses than we do, for the ones we provide support, given the Generate Generate Esper Graphviz Graphviz EPL Program w/ Esper EPL exploratory nature of CSP solutions, their approaches are not as AND Graph Code Priorities Program efficient as ours. (Xtend) (Xtend) In the context of Active DBMSs [22] there is a group of works related to the analysis of rule-based reactive systems, including Figure 3: CEPA workflow confluence [2, 5], termination [3], and correctness [11, 14]. Our work is similar to those, applied to the context of CEP systems. Race detection in event-driven systems have been previously studied [12, 20]. The novelty of our approach is that here we deal the pattern dependency graph showing the cycles with red arrows. with race conditions between complex events generated by the CEP This graph is interpreted by Graphviz and shown to the user. rules, and therefore the order in which the rules are triggered has In case the acyclicity analysis does not report the presence of cy- effect on the final output of the program. This is not the same as cles, the race condition analysis is carried out. If rule race conditions possible race conditions among the simple events that arrive to the are detected, Xtend is used to generate two files: one containing data stream. the Graphviz graph with the result of the analysis, and another one There are other works modeling CEP systems using Petri Nets. with the Esper code with the right priority for each pattern. If no An an example, Ahmad et al. [1] model CEP applications using rule race condition is detected, only one file is generated with the Timed Net Condition Event Systems (TNCES), a formalism based Graphviz graph. on Timed Petri Nets. This approach checks whether the system For clarity, Figure 3 depicts the CEPA workflow diagram, with satisfies certain properties or not. However, a simple EPL pattern the main steps of its internal behavior. For illustration purposes, such as every (A -> B), is transformed into a vast and hardly Figure 4 presents a screenshot of our tool, showing its graphical readable TNCES model that is difficult to understand and debug. interface to the user. 7 Available from http://atenea.lcc.uma.es/projects/FormalizingCEP.html Static Analysis of CEP Programs MDE4IoT’18, October 2018, Copenhagen, Denmark Figure 4: Screenshot of the CEPA tool showing its graphical interface to the user. Finally, some of the issues discussed here were identified in our [5] Sara Comai and Letizia Tanca. 2003. Termination and Confluence by Rule Priori- previous work [4]. This paper reports on the tool we have developed tization. IEEE Trans. Knowl. Data Eng. 15, 2 (2003), 257–270. [6] Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: to implement the analyses hinted in that work, and how we have From data stream to complex event processing. ACM Comput. Surv. 44, 3 (2012), realized it. 15:1–15:62. [7] Gianpaolo Cugola, Alessandro Margara, Mauro Pezzè, and Matteo Pradella. 2015. Efficient analysis of event processing applications. In Proc. of DEBS’15. ACM, 7 CONCLUSIONS 10–21. [8] EsperTech. (accessed 18 November 2017). Esper - Complex Event Processing. In this contribution, we present our approach to statically analyze http://www.espertech.com/esper/. Esper EPL programs, and to automatically check two properties: [9] Opher Etzion and Peter Niblett. 2010. Event Processing in Action. Manning rule acyclicity and rule race conditions. Furthermore, when rule Publications. [10] Event Processing Technical Society. 2011. Event Processing Glossary, Version race conditions are detected, we suggest how to repair the Esper 2.0. http://www.complexevents.com/wp-content/uploads/2011/08/EPTS_Event_ code to avoid non-confluence problems. We have built an Eclipse Processing_Glossary_v2.pdf. tool, called CEPA, that offers support for conducting such tests. [11] Esa Falkenroth and Anders Törne. 1999. How to construct predictable rule sets. In Proc. of Real Time Programming’99. 33–40. In the future we plan to increase the list of analyses supported [12] Chun-Hung Hsiao, Cristiano Pereira, Jie Yu, Gilles Pokam, Satish Narayanasamy, by our tool, such as for example dead-end detection. Furthermore, Peter M. Chen, Ziyun Kong, and Jason Flinn. 2014. Race detection for event-driven mobile applications. In Proc. of PLDI’14. ACM, 326–336. we would like to extend our current scope, by integrating into our [13] Donald B. Johnson. 1976. Finding all the elementary circuits of a directed graph. tool the dynamic analyses we explored in [4]. Finally, although the SIAM J. Comput. 4, 1 (1976), 77–84. performance of the algorithm currently used for detecting cycles [14] Sin-Yeung Lee and Tok-Wang Ling. 1999. Verify Updating Trigger Correctness. Springer, 382–391. in the graph is acceptable, we would also like to evaluate other [15] David C. Luckham. 2002. The Power of Events: An Introduction to Complex Event algorithms [16], in particular the one by Johnson [13]. Processing in Distributed Enterprise Systems. Addison-Wesley. [16] Prabhaker Mateti and Narsingh Deo. 1976. On Algorithms for Enumerating All Acknowledgements. We would like to thank the reviewers of this Circuits of a Graph Related Databases. SIAM J. Comput. 5, 1 (1976), 90–99. [17] D. Mintz. 2016. Technical Assistance Document for the Reporting of Daily paper for their helpful suggestions and comments. This work has Air Quality - the Air Quality Index (AQI). Technical Report EPA-454/B-16- been funded by Spanish Research Project TIN2014-52034-R. 002. U.S. Environmental Protection Agency. http://www.epa.gov/airnow/ aqi-technical-assistance-document-may2016.pdf [18] Nathalie Moreno, Manuel F. Bertoa, Gala Barquero, Loli Burgueño, Javier Troya, REFERENCES and Antonio Vallecillo. 2018. Managing Uncertain Complex Events in Web [1] Waheed Ahmad, Andrei Lobov, and Jose L. Martinez Lastra. 2012. Formal mod- of Things Applications. In Proc. of ICWE’18 (LNCS). Springer, 349–357. http: elling of Complex Event Processing: A generic algorithm and its application to a //atenea.lcc.uma.es/projects/UCEP.html manufacturing line. In Proc. of INDIN’12. IEEE, 380–385. [19] Ella Rabinovich, Opher Etzion, Sitvanit Ruah, and Sarit Archushin. 2010. Ana- [2] Alexander Aiken, Joseph M. Hellerstein, and Jennifer Widom. 1995. Static Analy- lyzing the behavior of event processing applications. In Proc. of DEBS’10. ACM, sis Techniques for Predicting the Behavior of Active Database Rules. ACM Trans. 223–234. Database Syst. 20, 1 (1995), 3–41. [20] Veselin Raychev, Martin T. Vechev, and Manu Sridharan. 2013. Effective race [3] Elena Baralis, Stefano Ceri, and Stefano Paraboschi. 1998. Compile-Time and detection for event-driven programs. In Proc. of OOPSLA’13. ACM, 151–166. Runtime Analysis of Active Behaviors. IEEE Trans. Knowl. Data Eng. 10, 3 (1998), [21] M. Sharir. 1981. A strong-connectivity algorithm and its application in data flow 353–370. analysis. Computers and Mathematics with Applications 7 (1981), 67–72. Issue 1. [4] Loli Burgueño, Juan Boubeta-Puig, and Antonio Vallecillo. 2018. Formalizing [22] J. Widom and S. Ceri. 1996. Active database systems: Triggers and rules for advanced Complex Event Processing Systems in Maude. IEEE Access 6 (2018), 23222–23241. database processing. Morgan Kaufmann. https://doi.org/10.1109/ACCESS.2018.2831185