Complex Event Processing for the Internet of Things Ariane Ziehn Supervised by Volker Markl and Steffen Zeuch Technical University of Berlin, German Research Center for Artificial Intelligence ariane.ziehn@dfki.de ABSTRACT Cloud Environments Fog-Cloud Environments Complex Event Processing (CEP) enables autonomous and 2 cloud cloud real-time decision making in data management systems. To- day, applications leverage CEP only in cloud-based envi- ronments to provide prompt reactions, although data are 4 generated outside the cloud. In particular, the Internet of fog Things (IoT) will increase the number of data producers 1 3 that a single IoT application has to handle millions of de- vices. The centralized data collection before applying data sensor network sensor network processing introduces a critical bottleneck in current cloud- based solutions, especially for delay-sensitive IoT applica- tions. To overcome this bottleneck, fog computing emerged as a paradigm to process data close to the network edge. (a) (b) However, CEP systems are not yet ready to leverage the fog Figure 1: Data Flows in (a) Cloud Environments and (b) layer as an extension for cloud-based stream processing. In Fog-Cloud Environments. this paper, we examine how the current system has to adapt to exploit the new capabilities of the fog for CEP. To this end, we analyze principal CEP methodologies and propose devices centrally 1 . Usually, the cloud environment uses a new solutions. With this work, we lay the foundation for data broker like Kafka for buffering data. Then, the cloud- large-scale in-network CEP applications on top of the IoT. based SPE utilizes its almost unlimited resources to process the collected data in the cloud 2 . However, the central data 1. INTRODUCTION collection before processing is highly resource-intensive and causes significant delays as well as network overhead. Thus, Complex Event Processing (CEP) is a common method cloud environments introduce a critical bottleneck for future for real-time stream processing to detect sequences of events IoT data management systems that must handle millions of in data streams and triggers actions upon detection [7, 14, data streams [15, 22]. On the right-hand side, the fog-cloud 16]. User-defined rules specify both the events and the ac- environment introduces an additional fog layer with a typ- tions that enable autonomous real-time decision making in ical tree-like network topology 3 . The fog nodes M pro- a wide range of applications, e.g., traffic congestion moni- toring, live maps, intelligent transportation systems, smart cess and reduce data on the paths through the network 4 . street lamps, or vehicle pollution control [1, 8, 24]. Sev- Hence, they mitigate the central bottleneck and release net- eral cloud-based stream processing engines (SPEs) [4, 10, work capacities for low-latency real-time responses. 21] provide CEP for rule-based monitoring as the current We argue that future IoT applications require a fog layer solution for the Internet of Things (IoT) scenarios with low- to process millions of data streams efficiently. Therefore, latency real-time response requirements. Under considera- it is crucial to leverage fog environments for CEP and its tion of the constant increase of IoT devices, future IoT appli- capabilities to enable future IoT applications with millions of cations will process data from potentially millions of devices. devices and thousands of rules. However, fog environments Thus, cloud-based solutions are not capable of fulfilling the introduce new challenges for CEP, e.g., stateful in-network low-latency real-time response requirements due to the mas- processing on low-end devices, changing network topologies, sive amount of data generated in these future IoT applica- and mobile IoT devices. We investigate how to tackle these tions [12, 22]. Zeuch et al. [22] address this problem and challenges in order to leverage the fog layer efficiently for propose a fog-cloud environment that leverages a fog layer CEP. In this paper, we make the following contributions to as extension of the cloud. Figure 1 presents the data flow in enable the IoT for CEP: both (a) cloud and (b) fog-cloud environment. On the left- • We analyze state-of-the-art CEP systems and identify hand side, the cloud environment collects data of connected the major limitations to leverage the IoT as a mean for large-scale, in-network CEP. Proceedings of the VLDB 2020 PhD Workshop, August 31st, 2020. Tokyo, Japan. Copyright (C) 2020 for this paper by its authors. Copying permitted • We highlight three concrete problems and sketch pos- for private and academic purposes. sible solutions to enable the IoT for CEP. full match • We outline possible improvements to our approaches, ignore: ignore: ignore: T1,T2,T3 which extend state-of-the-art CEP processing in the T2,T3 T1,T3 T1,T2 areas of pattern evaluation mechanisms, their opti- take T1 take T2 take T3 partial match q1 q2 q3 mization, and pattern specification languages. F T1,T2 In the remainder of this paper, we analyze state-of-the-art partial match CEP concepts in Section 2. Then, we highlight what pre- full match T1 T2 T3 vents CEP from leveraging the IoT and outline three con- (a) NFA. crete problems in Section 3. In Section 4, we summarize (b) Tree. related work and conclude our findings in Section 5. Figure 2: Evaluations Models for Pattern Detection: NFA (a) and Tree Structure (b). 2. RESEARCH CONTEXT In this section, we introduce the concepts of rules and event types Tn , including a condition cTn for each type. We event patterns. Afterward, we show the state-of-the-art pat- use SASE+ [23] to specify the pattern in Listing 1. tern evaluation and optimization mechanism for CEP. Rules: Rules build the knowledge base of the CEP engine Listing 1: Vehicle Pollution Control Example and are used in active database systems for autonomous re- PATTERN SEQ (T1 e1 , T2 e2 , T3 e3 ) actions on data manipulations. For this purpose, the user WHERE ( e1 . vehicleCount > cT1 AND specifies rules with up to three components, i.e., event e, e2 . speedLevel < cT2 AND condition c, and action a, which lead to the name ECA- e3 . po llutionL evel > cT3 ) WITHIN TIME . MINUTES (30) rules. Each tuple manipulation represents an event e that might trigger the user-desired action a if e matches the Pattern Evaluation Mechanisms: To detect match- relation-specific or inter-event conditions c. Actions are ing events, SPEs create a pattern detection plan given the application-specific and can be user notifications such as internal pattern representation. Common detection plans warnings and alerts or autonomous system actions such as are tree-based plans [17], order-based plans with state ma- updating attribute values or triggering other rules [18]. chines [23], e.g., non-deterministic finite automaton (NFA), Data Streams: We want to detect those rules in data as well as event processing networks and graphs [6, 11]. We streams, where each data stream E is a continuous and un- focus on the research lines with the highest number of repre- bounded flow of data tuples. Each generated tuple is an sentatives, tree-based and order-based evaluation plans [9]. event e that represents the state of its producer, e.g., an IoT Naive approaches of both lines represent the pattern iden- device, at a specific point in time ts. Due to the increasing tical to the user-given formulation and create one instance amount of geographically distributed IoT devices, the device of the evaluation structure for each detected event. The de- location s is another essential tuple attribute. To sum up, tection of a complex pattern subsequence is denoted as a each event source produces a stream E of events en with its partial match, while the detection of a complete pattern is set of attributes E = {ts, s, a1 , ..., an }, where ts and s are called a full match [13]. The example evaluation structures spatiotemporal attributes, and an are non-spatiotemporal of an NFA in Figure 2a represents each partial match in attributes, e.g., measurements and identifiers [3]. one state qn and a full pattern match in the final state F . Event Patterns: In SPEs, users define so-called simple The tree-based mechanism creates a leaf for each simple pat- event patterns [4] as a complement to traditional ECA-rules. tern (Figure 2b). Each intermediate node presents partial A simple pattern is a he, ci pair for one event type T . Event matches and the root a full match. types are the replacement of relations in stream processing CEP belongs to the group of stateful query processing and provide a uniform schema T = {ts, s, a1 , ..., an } for a methods as SPEs need to store all partial matches either group of contributing streams En [20, 24]. For instance, all until the next partial match is detected or the window is sensors that measure the temperature at different locations expired. The number of partial matches is influenced by contribute to the type Ttemp . All arriving events are mon- many factors, e.g., query selectivity or event frequency, but itored with the event type-specific conditions cTn , and an worst-case scenarios have exponential growth [14]. There- action a is triggered if a matching event is detected. fore, pattern detection plan optimization aims to reduce the Pattern Specification Language: The user formulates number of partial matches, e.g., by rewriting of single pat- complex patterns of monitoring tasks as a composition of terns or sharing techniques for multi-pattern CEP [14]. simple patterns from different event types. The relation- An effective rewriting method is called Lazy NFA [13]. It ships between simple patterns are defined by event opera- processes the pattern out-of-order by putting rare events in tors, which are either logical, e.g., AN D, OR, or temporal, front of the pattern sequence. Frequent events are buffered e.g., the window operator W IT HIN or the sequence oper- and only analyzed after the rare event has been detected. ator SEQ, which defines the order of simple patterns [20]. A new line of research focuses on the translation of pat- A wide range of pattern languages with different sets of event terns into multi-join queries. Thereby, patterns can be eval- operators exists, e.g., SQL-like languages such as SASE+ [20, uated as stream queries and leverage existing join query op- 23] and CCL [24], or languages based on event logic, e.g., timizations [13]. For instance, we could rewrite the example CEL [6]. An example pattern for a vehicle pollution control pattern from Listing 1 (excluding W IT HIN ) by replacing application can be formulated as follows: Detect if within the SEQ operator with the AN D operator and adding ad- 30 minutes an increased amount of vehicles (T1 ) is followed ditional inter-event constraints, as shown in Listing 2 [13, by a decrease of the average speed (T2 ), which leads to an 24]. By replacing the temporal operator with a logical al- excess of the air pollution level (T3 ). The example defines ternative, the pattern can be translated into a join query a sequence of three consecutive simple patterns for different and profit from join-query optimizations. Listing 2: CEP Traffic Pattern Example where all data is centrally available, can be evaluated by PATTERN AND (T1 e1 , T2 e2 ,T3 e3 ) accuracy metrics [9]. To this end, we can identify promising WHERE ( e1 . vehicleCount > conditionT1 AND in-network evaluation mechanism for fog environments and e2 . speedLevel < cT2 AND optimize them further using our evaluation results in the e3 . pol lutionL evel > conditionT3 AND following step of our research agenda. e1 . ts < e2 . ts AND e2 . ts < e3 . ts ) 3.2 Optimization of Evaluation Mechanisms Storing and maintaining large amounts of partial matches 3. LEVERAGE IOT FOR CEP is already a significant challenge in cloud environments with As opposed to fog-cloud environments, state-of-the-art almost unlimited resources capacities. For fog environments, SPEs process a global union of all sources En as one large this challenge is even more critical as they contain low-end stream. This processing strategy causes delays for mod- devices with limited capabilities but need to deal with par- ern IoT applications because it enforces the central data tial matches of possibly hundreds of patterns. The hetero- collection from millions of sensors before processing. Fog- geneous hardware in fog environments lead to the second cloud environments allow the processing of individual sen- problem we want to tackle: sor streams E or subsets of event types T close to the data Problem II: Cloud-based optimization techniques do not producers in the fog layer. Thus, this layer allows data re- consider the limitations and challenges of unreliable and mov- duction of more than 80% for stream queries, which reduces ing low-end devices and dynamic network topology. network traffic and enables the system to handle the data Both optimization techniques, rewriting of single patterns, from millions of devices with low-latency [22]. and sharing techniques for multi patterns (Sec. 2), aim to Research Goal: We aim to leverage fog environments improve the pattern detection plan in order to reduce par- for CEP and provide a solution that fulfills the low-latency tial matches. Cloud-based implementations of these strate- and real-time response requirements of future IoT applica- gies make assumptions that do not hold in fog environ- tions. To this end, we investigate the core features of CEP: ments. First, cloud-based SPEs examine the NP-complete pattern evaluation mechanisms, their optimization, and pat- problem of calculating one optimal pattern detection plan tern specification languages. for their static network [7, 13]. In a dynamically chang- ing network, the existing solutions would cause the expen- 3.1 Pattern Evaluation Mechanisms sive re-computation of new optimal plans after each topol- Efficient CEP requires a high-performance pattern eval- ogy change. Second, existing distributable solutions rarely uation mechanism. Currently, available evaluation mecha- consider heterogeneous hardware and resource limitations of nisms optimized for cloud-based environments could be ap- low-end devices for distribution strategies [2, 19]. Third, to plied to fog-cloud environments, yet without leveraging the enable the rewriting technique out-of-order pattern detec- fog layer, bottlenecks of cloud solutions would remain. Thus, tion, we need to store event buffers for retrospective pattern the first problem we want to tackle in this work is: evaluation. Storing potentially high frequent events from Problem I: Common cloud-based pattern evaluation mech- hundred of producers challenges again the capacity limits of anisms use a central component for data processing, pattern low-end devices and requires data compression. detection monitoring, or both. This central component pre- Solution Sketch: With this work, we want to iden- vents leveraging a fog environment without additional in- tify optimization techniques for distributed pattern evalu- network distribution strategies. ation on mobile low-end devices for hundreds of concur- Opposed to the cloud paradigm, fog environments allow us rently running patterns. First, we intend to prevent the NP- to tailor the data to the relevant only on the paths through complete problem of finding one optimal solution and inves- the network. As data is only shared with nodes on the net- tigate strategies that identify sets of possible near-optimal work path, the pattern detection plan needs to be aware of pattern detection plans. These plans can be used as avail- the fog nodes that receive the relevant data to execute sub- able fall-backs in case of topology changes. Second, in a plans. Further, by running sub-plans on fog nodes, we need fog-cloud environment, we can leverage the cloud as coor- to consider that CEP is a stateful processing method that dinator for pattern maintenance and distribution. By dis- needs to store partial matches on low-end devices. tributing stateful computation tasks of pattern evaluation Solution Sketch: We intend to identify promising eval- to potentially mobile devices, an additional challenge ap- uation mechanisms for distributed pattern detection and pears: pattern evaluation might not be possible due to data bring them together with the fog paradigm and distribution producers. In this case, the user must be informed that strategies. Since no general pattern evaluation mechanism currently, either monitoring is not possible or the results with explicit performance guarantees exits, the selection of are probabilistic, e.g., derived from nearby sensors. Third, one research line for fog environments is not straight forward we want to investigate efficient compression techniques, e.g., and requires an experimental evaluation. To this end, we partial aggregations [5], for event buffers on low-end devices consider all three approaches, order-based, tree-based, and to enable out-of-order pattern detection. Furthermore, the pattern translation into multi-join queries (Sec. 2), as possi- translation of patterns into multi-join queries enables an- ble candidates. As the next step of our research agenda, we other potential optimization strategy. In essence, we can intend to implement a naive distributed solution for each of combine both stream queries and pattern detection in one the three pattern evaluation mechanisms, including the nec- engine for optimization and maximize results sharing in fog- essary adaptions to leverage the fog layer. Afterward, we cloud environments. To this end, we leverage traditional can compare our implementations using stream processing CEP optimization strategies for low-end devices in a dy- metrics, e.g., forward delays for matches. Additionally, the namic network and herewith enable efficient CEP for mil- accuracy of our result in comparison with cloud solutions, lions of devices and thousands of patterns. 3.3 Pattern Specification Language tor placement for non-hierarchical SPE. Multi-sink operator Similar to evaluation mechanisms, no general pattern spec- placement is a general stream processing problem and thus ification language exists, so no comprehensive set of event a complementing feature of our solution. operators is available. Further, some languages lack formal semantics, provide limited expressiveness, or prevent auto- 5. CONCLUSION mated optimization [6, 19, 20]. Besides, these languages are In this paper, we introduce and motivate our goal to en- designed and optimized for single machines or cloud appli- able efficient CEP for IoT data management systems in fog- cations without considering future IoT applications. Thus, cloud environments. We review the state-of-the-art solu- the third problem we want to tackle is: tions, identify their problems to leverage the fog layer, and Problem III: Existing specification languages lack essen- suggest possible solutions. Further, we propose the follow- tial event operators because they were initially not intended ing three steps to reach our goal: (I) identify and evaluate for IoT applications. Further, many of them introduce re- appropriate in-network pattern evaluation mechanisms for strictions that negatively impact efficient distributed CEP fog-cloud environments, (II) leverage and adapt optimiza- optimizations. tion techniques for these mechanisms that fit the properties We want to enable the formulation of complex patterns of low-end devices, and (III) build a pattern specification for IoT applications by identifying an easy-to-use specifica- language for the IoT with CEP operators that leverage the tion language with the necessary set of event operators. For fog layer. Next, we focus on Problem (I) and the imple- Problem I and II, we focus on the most common set of event mentation of evaluation mechanisms, including adaptions, operators, i.e., AN D, OR, N OT , SEQ [6, 13] and extend to leverage the fog layer. Then, we use this baseline to op- it in this step of our research agenda. timize our solution. Solution Sketch: Giatrakos et al. [11] reviewed the pat- tern specification languages of several CEP systems accord- 6. ACKNOWLEDGMENTS ing to their expressiveness, including the Big Data SPE Apache Flink [4]. In contrast to other SPEs such as Apache This work was partly supported by the German Federal Storm [10] or Spark [21], Flink provides built-in support for Ministry for Economic Affairs and Energy (BMWi) through CEP [11] and additional operators compared to traditional the KI-SIGS – KI-Space for intelligent health systems (grant single-machine CEP systems. However, Flink does not offer no. 01MK20012P). Furthermore, we thank Holmer Hemsen a specification language but provides an API with low-level for the valuable input and discussions. functions. To this end, we use its pattern API as a baseline for our operator set and investigate how these operators can 7. REFERENCES leverage a fog layer. Further, we build a pattern specification [1] A. Ahmed, H. Arkian, D. Battulga, and et al. Fog computing applications: Taxonomy and requirements. arXiv preprint:1907.11621, 2019. language on top of this operator set under the consideration [2] M. Akdere, U. Çetintemel, and N. Tatbul. Plan-based complex event detection across distributed sources. VLDB Endowment, pages 66–77, 2008. of other leading languages, e.g., ZStream [17]. [3] S. Akili. On the need for distributed complex event processing with multiple sinks. In DEBS, pages 248–249, 2019. [4] A. Alexandrov, R. Bergmann, S. Ewen, and et al. The stratosphere 4. RELATED WORK platform for big data analytics. The VLDB Journal, pages 939–964, 2014. [5] L. Benson, P. M. Grulich, S. Zeuch, and et al. Disco: Efficient distributed In this section, we summarize the state of the art of related window aggregation. In EDBT, 2020. [6] M. Bucchi, A. Grez, C. Riveros, and M. Ugarte. Foundations of complex work and highlight the major differences to our approach. event processing. arXiv preprint:1709.05369, 2017. [7] J. Chen, L. Ramaswamy, D. K. Lowenthal, and et al. Comet: CEP Optimization: Kolchinsky and Schuster [13] proved Decentralized complex event detection in mobile delay tolerant networks. that the pattern detection plan could be translated into In IEEE, pages 131–136, 2012. [8] W. Fengjuan, Z. Xiaoming, and et al. The research on complex event multi-join queries and leverage join-query optimization. We processing method of internet of things. In ICMTMA, pages 1219–1222. IEEE, 2013. utilize this result and consider multi-join queries as one pos- [9] I. Flouris, N. Giatrakos, and et al. Issues in complex event processing: sible evaluation mechanism for fog-cloud environments. An- Status and prospects in the big data era. JSS, pages 217–236, 2017. [10] A. S. Foundation. Apache storm, 2012. Accessed January other novel approach proposed by Kolchinsky and Schus- 2020: https://storm.apache.org/. ter [14] is the combination of rewriting and prefix sharing to [11] N. Giatrakos, E. Alevizos, A. Artikis, and et al. Complex event recognition in the big data era: a survey. The VLDB Journal, pages 313–352, 2020. optimize multi-pattern CEP. We intend to leverage a subset [12] M. Hung. Leading the iot, gartner insights on how to lead in a connected world. Gartner Research, pages 1–29, 2017. of these techniques for our approach. [13] I. Kolchinsky and A. Schuster. Join query optimization techniques for In-network CEP: Madumal et al. [16] proposed a tree- complex event processing applications. VLDB, pages 1332–1345, 2018. [14] I. Kolchinsky and A. Schuster. Real-time multi-pattern detection over based pattern evaluation approach for CEP with a rule en- event streams. In MOD, pages 589–606. ACM, 2019. gine that schedules the events either to a local CEP engine [15] J. Lin, W. Yu, N. Zhang, and et al. A survey on internet of things: Architecture, enabling technologies, security and privacy, and of a fog node or the Cloud CEP engine. As opposed to applications. IoT, pages 1125–1142, 2017. [16] M. P. Madumal and et al. Adaptive event tree-based hybrid cep their approach, we focus on pattern evaluation in multiple computational model for fog computing architecture. In ICTer. IEEE, 2016. nodes to leverage the tree-like topology of fog environments. [17] Y. Mei and S. Madden. Zstream: a cost-based query processor for adaptively detecting composite events. In SIGMOD, pages 193–206, 2009. Comet [7] is a decentralized ordered-based CEP approach for [18] N. W. Paton and O. Dı́az. Active database systems. ACM CSUR, pages 63–103, 1999. delay-tolerant networks. Akdere et al. [2] propose network- [19] N. P. Schultz-Møller, M. Migliavacca, and P. Pietzuch. Distributed aware distribution strategies managed by a central control complex event processing with query rewriting. In DEBS, pages 1–12, 2009. [20] E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing instance. Parts of both the approaches mentioned above over streams. In SIGMOD, pages 407–418, 2006. can be reused for our CEP implementation. Nevertheless, in [21] M. Zaharia, R. S. Xin, P. Wendell, and et al. Apache spark: a unified engine for big data processing. ACM, pages 56–65, 2016. contrast to both, we focus on a solution that considers the [22] S. Zeuch, A. Chaudhary, B. Del Monte, and et al. The nebulastream platform: Data and application management for the internet of things. limitations of low-end devices in a dynamic environment. CIDER, 2020. Akili [3] motivated the need for decentralized CEP and pro- [23] H. Zhang, Y. Diao, and et al. On complexity and optimization of expensive queries in complex event processing. In SIGMOD, pages 217–228, 2014. posed a tree-based approach for efficient multi-sink opera- [24] S. Zhang, H. T. Vo, and et al. Multi-query optimization for complex event processing in sap esp. In ICDE, pages 1213–1224. IEEE, 2017.