A Distributed Event Calculus for Event Recognition Alexandros Mavrommatis1 2, Alexander Artikis3 2 , Anastasios Skarlatidis2 and Georgios Paliouras2 Abstract. Events provide a fundamental abstraction for represent- Prolog systems is not always straightforward, requiring workarounds ing time-evolving information. Complex event recognition focuses that hinder performance. on tracking and analysing streams of events, in order to detect pat- To deal with the increasing velocity and volume demands of to- terns of special significance. The event streams may originate from day’s applications, we developed ‘dRTEC’, a distributed implemen- various types of sensor, such as cameras and GPS sensors. Further- tation of RTEC. dRTEC employs Spark Streaming4 , an extension of more, the stream velocity and volume pose significant challenges the Apache Spark API that enables scalable, high-throughput and to event processing systems. We propose dRTEC, an event recog- fault-tolerant stream processing. Reasoning in Spark Streaming may nition system that employs the Event Calculus formalism and oper- be performed exclusively in memory, where the input SDE stream is ates in multiple processing threads. We evaluate dRTEC using two aggregated into a series of batch computations on small time inter- real-world applications and show that it is capable of real-time and vals. dRTEC uses Spark Streaming’s inherent support for distributed scalable event recognition. processing to take advantage of modern multi-core hardware for scal- able event recognition. The use of Spark Streaming additionally facilitates the integration 1 Introduction of dRTEC, as an event recognition module, into existing (large-scale) stream processing systems. dRTEC has been evaluated in the con- Today’s organisations need to act upon Big Data streams in or- text of two such systems. First, in the context of the SYNAISTHISI der to support their resource management, capitalise on opportu- project5 , dRTEC is the human activity recognition module detecting nities and detect threats. Towards this, event recognition systems ‘long-term activities’, such as fighting and leaving unattended ob- have been particularly helpful, as they support the detection of com- jects, given ‘short-term’ activities detected on video frames by the plex events (CE)s of special significance, given streams of ‘simple, underlying visual information processing components. In this appli- derived events’ (SDE)s arriving from various types of sensor [9]. cation, we evaluated dRTEC using a benchmark activity recognition A CE is a collection of events (SDEs and/or CEs) that satisfy a dataset. Second, in the datACRON project6 , dRTEC recognises sus- (spatio-)temporal pattern. In the maritime domain, for example, picious and illegal vessel activities given a compressed vessel posi- event recognition systems have been used to make sense of position tion stream produced by a trajectory processing module. To evaluate streams emitted from thousands of vessels, in order to detect, in real- dRTEC on the maritime domain, we used a real position stream from time, suspicious and illegal activity that may have dire effects in the over 6,000 vessels sailing through the Greek seas in the summer of maritime ecosystem and passenger safety [2]. 2009. The empirical analysis showed that dRTEC scales better than In previous work, we developed the ‘Event Calculus for Run- RTEC, both to increasing velocity SDE streams and larger numbers Time reasoning’ (RTEC), a formal computational framework for of CEs. event recognition [3]. RTEC is an Event Calculus dialect [18] that The remainder of the paper is organised as follows. In the follow- includes optimisation techniques supporting efficient event recogni- ing section we briefly review RTEC. Then, we introduce they key tion. A form of caching stores the results of sub-computations in the components of dRTEC. Section 4 presents our empirical analysis, computer memory to avoid unnecessary re-computations. A simple while in Section 5 we summarise our approach, discuss related work indexing mechanism makes RTEC robust to events that are irrelevant and outline directions for further research. to the computations we want to perform. A set of interval manipu- lation constructs simplify event patterns and improve reasoning effi- ciency. Furthermore, a ‘windowing’ mechanism makes event recog- 2 Event Calculus for Run-Time Reasoning nition history-independent. RTEC is a logic programming implementation of the Event Cal- dRTEC is a distributed implementation of RTEC7 , the ‘Event Cal- culus. This way, event patterns have a formal, declarative semantics culus for Run-Time reasoning’ [3]. The time model of RTEC is lin- [3]. On the other hand, RTEC does not have built-in support for dis- ear including integer time-points. Where F is a fluent—a property tributed processing. This is a significant limitation, as Big Data appli- that is allowed to have different values at different points in time— cations, such as maritime monitoring, require the processing of high the term F = V denotes that fluent F has value V . Table 1 presents velocity SDE streams. Moreover, the integration of RTEC into non- the main RTEC predicates. Variables start with an upper-case letter, while predicates and constants start with a lower-case letter. The hap- 1 School of Electronic & Computer Engineering, Technical University of pensAt predicate defines the event instances, the initiatedAt and termi- Crete, Greece 4 http://spark.apache.org/streaming/ 2 Institute of Informatics & Telecommunications, NCSR “Demokritos”, 5 http://iot.synaisthisi.iit.demokritos.gr/ Greece 3 Department of Maritime Studies, University of Piraeus, Greece 6 http://www.datacron-project.eu/ {blackeye,a.artikis,anskarl,paliourg}@iit.demokritos.gr 7 https://github.com/aartikis/RTEC 31 natedAt predicates express the effects of events, while the holdsAt and threshold of pixel positions. person(P ) is a simple fluent indicating holdsFor predicates express the values of the fluents. holdsAt and whether there is sufficient information that entity P is a person as holdsFor are defined in such a way that, for any fluent F , opposed to an object. According to rule (1), ‘leaving object’ is ini- holdsAt(F = V, T ) if and only if T belongs to one of the maximal tiated when an inactive entity starts being tracked close to a person. intervals of I for which holdsFor(F = V, I). Rule (2) dictates that ‘leaving object’ stops being recognised when We represent instantaneous SDEs and CEs by means of happensAt, the entity is no longer tracked. The maximal intervals during which while durative SDEs and CEs are represented as fluents. The ma- leaving object(P , Obj ) = true holds continuously are computed us- jority of CEs are durative and thus, in CE recognition the task is to ing the built-in RTEC predicate holdsFor from rules (1) and (2). compute the maximal intervals for which a fluent representing a CE In addition to the domain-independent definition of holdsFor, has a particular value continuously. RTEC supports application-dependent holdsFor rules, used to define the values of a fluent F in terms of the values of other fluents. Table 1. RTEC Predicates. Such a fluent F is called statically determined. holdsFor rules of this type make use of interval manipulation constructs—see the last three Predicate Meaning items of Table 1. Consider the following example: happensAt(E, T ) Event E occurs at time T holdsFor(greeting(P1 , P2 ) = true, I ) ← initiatedAt(F = V, T ) At time T a period of time holdsFor(close(P1 , P2 ) = true, I1 ), for which F = V is initiated holdsFor(active(P1 ) = true, I2 ), terminatedAt(F = V, T ) At time T a period of time for which F = V is terminated holdsFor(inactive(P1 ) = true, I3 ), holdsAt(F = V, T ) The value of fluent F is V at time T holdsFor(person(P1 ) = true, I4 ), holdsFor(F = V, I) I is the list of the maximal intervals intersect all([I3 , I4 ], I5 ), for which F = V holds continuously (3) union all([I2 , I5 ], I6 ), union all(L, I ) I is the list of maximal intervals holdsFor(person(P2 ) = true, I7 ), produced by the union of the lists of maximal intervals of list L holdsFor(running(P2 ) = true, I8 ), intersect all(L, I ) I is the list of maximal intervals holdsFor(abrupt(P2 ) = true, I9 ), produced by the intersection of relative complement all(I7 , [I8 , I9 ], I10 ), the lists of maximal intervals of list L intersect all([I1 , I6 , I10 ], I ) relative complement all(I 0 , L, I ) I is the list of maximal intervals produced by the relative complement In activity recognition, we are interested in detecting whether two of the list of maximal intervals I 0 with respect to every list people are greeting each other. A greeting distinguishes meetings of maximal intervals of list L from other, related types of interaction. Similar to inactive, active (mild body movement without changing location), running and abrupt are durative SDEs produced by the vision algorithms. Ac- Fluents in RTEC are of two kinds: simple and statically deter- cording to rule (3), two tracked entities P1 and P2 are said to be mined. For a simple fluent F , F = V holds at a particular time- greeting, if they are close to each other, P1 is active or an inac- point T if F = V has been initiated by an event that has occurred tive person, and P2 is a person that is neither running nor moving at some time-point earlier than T , and has not been terminated abruptly. in the meantime. This is an implementation of the law of inertia. RTEC restricts attention to hierarchical formalisations, those To compute the intervals I for which F = V holds continuously, where it is possible to define a function level that maps all fluents i.e. holdsFor(F = V, I), we compute all time-points Ts at which and all events to the non-negative integers as follows. Events and F = V is initiated, and then, for each Ts , we find the first time-point statically determined fluents of level 0 are those whose happensAt and Tf after Ts at which F = V is terminated. Consider the following holdsFor definitions do not depend on any other events or fluents. In example from activity recognition: CE recognition, they represent the input SDEs. There are no simple initiatedAt(leaving object(P , Obj ) = true, T ) ← fluents in level 0. Events and simple fluents of level n (n>0) are happensAt(appear (Obj ), T ), defined in terms of at least one event or fluent of level n−1 and a holdsAt(inactive(Obj ) = true, T ), (1) possibly empty set of events and fluents from levels lower than n−1. holdsAt(close(P , Obj ) = true, T ), Statically determined fluents of level n are defined in terms of at least holdsAt(person(P ) = true, T ) one fluent of level n−1 and a possibly empty set of fluents from lev- els lower than n−1. terminatedAt(leaving object(P , Obj ) = true, T ) ← (2) RTEC performs CE recognition by means of continuous query happensAt(disappear (Obj ), T ) processing, and concerns the computation of the maximal intervals The above rules are intended to capture the activity of leaving an of fluents. At each query time Qi , the input entities that fall within object unattended. appear and disappear are instantaneous SDEs a specified sliding window ω are taken into consideration. All input produced by the underlying computer vision algorithms. An entity entities that took place before or at Qi −ω are discarded/‘forgotten’. ‘appears’ when it is first tracked. Similarly, an entity ‘disappears’ This constraint ensures that the cost of CE recognition depends only when it stops being tracked. An object carried by a person is not on the size of ω and not on the complete SDE history. The size of ω, tracked—only the person that carries it is tracked. The object will and the temporal distance between two consecutive query times—the be tracked, that is, it will ‘appear’, if and only if the person leaves it ‘step’ Qi −Qi−1 —are tuning parameters that can be chosen by the somewhere. inactive is a durative SDE. Objects (as opposed to per- user. sons) can exhibit only inactive activity. close(P , Obj ) is a statically When ω is longer than the step Qi −Qi−1 , it is possible that an determined fluent indicating whether the distance between two enti- SDE occurs in the interval (Qi −ω, Qi−1 ] but arrives at RTEC only ties P and Obj, tracked in the surveillance videos, is less than some after Qi−1 ; its effects are taken into account at query time Qi . This 32 Window ω 3.1 Dynamic Grounding & Indexing At each recognition time Qi , RTEC grounds the CE patterns using time Q136 Q137 Q138 Q139 a set of constants for the variables appearing in the patterns, except the variables related to time. Moreover, RTEC operates under the as- Figure 1. Windowing in RTEC. sumption that the set of constants is ‘static’, in the sense that it does not change over time, and known in advance. For instance, in the maritime surveillance domain, RTEC operates under the assumption is illustrated in Figure 1. The figure displays the occurrences of in- that all vessel ids are known beforehand. Similarly, in activity recog- stantaneous SDEs as dots and durative ones as line segments. For nition all ids of the tracked entities are assumed to be known. For CE recognition at Q138 , only the SDEs marked in black are con- many application domains, this assumption is unrealistic. More im- sidered, whereas the greyed out ones are neglected. Assume that all portantly, there are (many) query times in which RTEC attempts to SDEs marked in bold arrived only after Q137 . Then, we observe that recognise CEs for (many) constants, for which no information exists two SDEs were delayed i.e. they occurred before Q137 , but arrived in the current window. only after Q137 . In this example, the window is larger than the step. To address this issue, dRTEC supports ‘dynamic’ grounding. At Hence, these SDEs are not lost but considered as part of CE recogni- each query time Qi , dRTEC scans the SDEs of the current window tion at Q138 . ω to construct the list of entities for which CE recognition should After ‘forgetting’ SDEs, RTEC computes and stores the intervals be performed. Then, it appends to this list all entities that have CE of CEs. At Qi , the CE intervals computed by RTEC are those that intervals overlapping Qi −ω. Such intervals may be extended or (par- can be derived from SDEs that occurred in the interval (Qi −ω, Qi ], tially) retracted, given the information that is available in the current as recorded at time Qi . RTEC adopts a caching technique where flu- window. In this manner, dRTEC avoids unnecessary calculations by ents are processed in a bottom-up manner; this way, the intervals of restricting attention to entities for which a CE may be recognised at the fluents that are required for the processing of a fluent of level the current query time. n will simply be fetched from the cache without the need for re- Indexing is used to convert the input SDEs into a key-value pair computation. More details about the reasoning engine of RTEC (in- format for data partitioning. The partitions are distributed among the cluding a complexity analysis), as well as its expressivity, may be available cores (processing threads) of the underlying hardware for found at [3]. parallel processing. Each SDE is indexed according to its entity. In activity recognition, for example, the index concerns the ids of the tracked entities (see ‘Dynamic Grounding & Indexing’ in Figure 2). 3 Distributed Event Calculus For each window, the SDEs concerning the same entity are grouped together and subsequently sent to the same processing thread. dRTEC is a distributed implementation of RTEC in Spark Stream- 3.2 Non-Relational Processing ing using the Scala programming language. Like RTEC, dRTEC per- forms CE recognition by means of continuous temporal projection, Indexing is followed by non-relational fluent processing performed at i.e. at each query time dRTEC computes the maximal intervals of each thread in parallel (see the ‘Non-Relational Processing’ boxes of fluents given an incoming SDE stream. Other tasks offered by other Figure 2). Non-relational processing refers to the computation of the Event Calculus implementations, such as abduction, are not sup- maximal intervals of fluents involving a single entity. (In the absence ported. In addition to the optimisation techniques of RTEC, such as of such fluents, dRTEC proceeds directly to ‘pairing’.) In activity windowing, dRTEC supports CE recognition using a structured set of recognition, for example, we want to determine whether a tracked operations for distributed reasoning. dRTEC follows a syntax-based, entity is a human or an object (see the rules presented in Section 2). application-independent approach to translate query processing into An entity is said to be a person if it has exhibited one of the ‘running’, distributed reasoning. Figure 2 illustrates the basic components of ‘active’, ‘walking’ or ‘abrupt movement’ short-term behaviours since the engine using the activity recognition application. dRTEC accepts it started being tracked. In other words, the classification of an en- SDE streams through MQTT8 , a lightweight publish-subscribe mes- tity as a person or an object depends only the short-term activities saging transport. Spark Streaming separates the incoming stream into of that entity. The distinction between non-relational and relational individual sets, called ‘micro-batches’. The window in dRTEC may processing allows us to trivially parallelise a significant part of the contain one or more micro-batches. Each micro-batch may contain CE recognition process (non-relational CE patterns). The processing events, expressed by happensAt, and fluents, expressed by holdsFor. threads are independent from one another, avoiding data transfers For example, according to the SDEs in the first micro-batch shown among them that are very costly. in Figure 2, the entity id0 started being tracked—‘appeared’—at Non-relational, as well as relational processing, concerns both time/video frame 80. Moreover, the entity id1 was running contin- statically determined and simple fluent processing, and windowing. uously in the interval [90,100). These tasks are discussed in Section 3.4. dRTEC performs various tasks on the incoming SDE streams. These are presented in the sections that follow. (We focus on the 3.3 Pairing & Relational Processing novel components of dRTEC, discussing only briefly the implemen- tation of the RTEC reasoning techniques in Spark Streaming.) The Relational processing concerns CE patterns that involve two or more CEs that are recognised using the incoming SDEs are streamed out entities. In activity recognition, we want to recognise whether two through MQTT (see ‘Output Stream’ in Figure 2). people are moving together or fighting. Prior to relational CE recog- nition, dRTEC produces all possible relations that may arise from the list of entities computed by the dynamic grounding process— 8 http://mqtt.org/ see ‘Pairing’ in Figure 2. Then, these relations are distributed to all 33 Input Stream Non-Relational Processing Pairing Micro-Batch M Processing Thread 1 happensAt, appear, id0, 80 Key Values Key Values holdsFor, running, id1, true, 90;100 happensAt, appear [80-81) id0 happensAt, appear [80-81) holdsFor, inactive, true [100-110) id0 (id0,id1) Micro-Batch M+1 holdsFor, inactive, true [100-110) holdsFor, running, true [90-100) holdsFor, abrupt, id1, true, 100;145 id1 holdsFor, abrupt, true [100-145) ... holdsFor, inactive, id0, true, 100;110 holdsFor, person, true [90-145) Non-Relational Processing holdsFor, running, true [90-100) Processing Thread N holdsFor, abrupt, true [100-145) id1 (id1,id0) holdsFor, person, true [90-145) Key Values Dynamic Grounding & holdsFor, running, true [90-100) happensAt, appear [80-81) id0 Indexing id1 holdsFor, abrupt, true [100-145) holdsFor, inactive, true [100-110) Key Values happensAt, appear [80-81) id0 holdsFor, inactive, true [100-110) Relational Processing Relational Processing holdsFor, running, true [90-100) Processing Thread 1 Processing Thread N id1 holdsFor, abrupt, true [100-145) Key Values Key Values ... happensAt, appear [80-81) holdsFor, running, true [90-100) id0 id1 holdsFor, inactive, true [100-110) holdsFor, abrupt, true [100-145) (id0,id1) (id0,id1) holdsFor, person, true [90-145) holdsFor, running, true [90-100) happensAt, appear [80-81) id1 id0 holdsFor, abrupt, true [100-145) holdsFor, inactive, true [100-110) holdsFor, person, true [90-145) Output Stream Micro-Batch M Micro-Batch M+1 holdsFor, moving, id1;id0, true, 90;100 holdsFor, fighting, id1;id0, true, 100;145 Figure 2. dRTEC processing. available processing threads for parallel CE recognition. Note that, Definition 1 Initiation of leaving object in dRTEC. in contrast to non-relational processing, the information available to I1 ← G I(occurrences, Obj , Fluent(happensAt, appear )) each processing thread is not disjoint. Assume, for example, that the I2 ← G I(occurrences, Obj , Fluent(holdsFor , inactive, true)) pair (id0, id1) is processed by processing thread 1, while the pair (id1, I3 ← G I(occurrences, (P , Obj ), Fluent(holdsFor , close, true)) id2) is processed by thread 2. Then both threads will have the output I4 ← G I(occurrences, P , Fluent(holdsFor , person, true)) I ← I1.INTERSECT ALL(I2).INTERSECT ALL(I3).INTERSECT ALL(I4) of non-relational processing concerning id1 (e.g. the list of maximal intervals during which id1 is said to be a person). However, there is no ‘collection occurrences’, i.e. a map pointing to the list of maximal replication of computation, as the output of non-relational process- intervals of a fluent, (b) the list of entities/arguments of the fluent, ing is cached, and the sets of relations of the processing threads are and (c) the fluent object. dRTEC uses exclusively intervals in its pat- disjoint. Furthermore, similar to non-relational processing, each pro- terns. The occurrence of an event (e.g. ‘appear’) is represented by cessing thread has all the necessary information, thus avoiding costly an instantaneous interval. This way, in addition to statically deter- data transfers. mined fluents, the interval manipulation constructs can be used for specifying simple fluents. In dRTEC these constructs are supported 3.4 Fluent Processing by ‘interval instances’ (see e.g. the last line of Definition 1). As mentioned earlier, both relational and non-relational processing Definition 2 Termination of leaving object in dRTEC. concern the computation of the list of maximal intervals of fluents. I ← G I(occurrences, Obj , Fluent(happensAt, disappear )) For both types of fluent, simple and statically determined, dRTEC follows the reasoning algorithms of RTEC. For example, in the case Statically determined fluents in dRTEC are specified in a similar of a simple fluent CEs , dRTEC checks, at each query time Qi , if manner. Definition 3, for example, shows the specification of ‘greet- there is a maximal interval of CEs that overlaps Qi −ω. If there is ing’ (see rule (3) for the RTEC representation). such an interval then it will be discarded, while its initiating point will be kept. Then, dRTEC computes the initiating points of CEs in (Qi −ω, Qi ], and appends them to initiating point (if any) prior to Definition 3 Statically determined fluent greeting in dRTEC. Qi −ω. If the list of initiating points is empty then the empty list of I1 ← G I(occurrences, (P1 , P2 ), Fluent(holdsFor , close, true)) I2 ← G I(occurrences, P1 , Fluent(holdsFor , active, true)) intervals is returned. Otherwise, dRTEC computes the terminating I3 ← G I(occurrences, P1 , Fluent(holdsFor , inactive, true)) points of CEs in (Qi −ω, Qi ], and pairs adjacent initiating and ter- I4 ← G I(occurrences, P1 , Fluent(holdsFor , person, true)) minating points, as discussed in Section 2, to produce the maximal I5 ← I3.INTERSECT ALL(I4) intervals. I6 ← I2.UNION ALL(I5) I7 ← G I(occurrences, P2 , Fluent(holdsFor , person, true)) Definitions 1 and 2 show, respectively, the initiating and termi- I8 ← G I(occurrences, P2 , Fluent(holdsFor , running, true)) nating conditions of the ‘leaving object’ CE that were presented in I9 ← G I(occurrences, P2 , Fluent(holdsFor , abrupt, true)) Section 2 in the language of RTEC. Recall that ‘leaving object’ is a I10 ← I7.RELATIVE COMPLEMENT ALL(I8.UNION ALL(I9)) I ← I1.INTERSECT ALL(I6).INTERSECT ALL(I10) simple fluent. The G I function (GET I NTERVAL, in full) retrieves the list of maximal intervals of a fluent. G I has three parameters: (a) the 34 dRTEC 24 threads 15 Avg. recognition time (sec) Avg. recognition time (sec) 400 300 4 RTEC 24 threads dRTEC Avg # of SDEs (thousands) Avg # of SDEs Avg # of CEs RTEC RTEC 1 thread 300 3 10 Avg # of CEs 200 200 2 5 100 100 1 0 0 0 0 0 20 40 60 80 100 120 0 5 10 15 20 25 0 20 40 60 80 100 120 Window size (sec) # of processing threads Window size (sec) (a) Number of SDEs and CEs. (b) dRTEC vs RTEC. (c) dRTEC vs RTEC: 110 sec window. 1000 60 Avg. recognition time (sec) Avg. recognition time (sec) dRTEC Avg # of SDEs (thousands) Avg # of SDEs 8 dRTEC Avg # of CEs (thousands) Avg # of CEs 1 RTEC RTEC 800 0.8 6 40 600 0.6 4 400 0.4 20 2 200 0.2 0 0 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 5 10 15 20 25 Window size (sec) Window size (sec) # of processing threads (d) Number of SDEs and CEs. (e) dRTEC vs RTEC: 24 processing threads. (f) dRTEC vs RTEC: 110 sec window. Figure 3. Activity recognition. Figures (a)–(c) (respectively (d)–(f)) concern the dataset with 10 (20) tracked entities. 4 Empirical Analysis larger dataset. Instead of reporting SDEs every 40 ms, the enlarged dataset provides data in every ms. The SDEs of video frame/time k dRTEC has been evaluated in the context of two stream processing of the original dataset are copied 39 times for each subsequent ms systems. In the system of the SYNAISTHISI project, dRTEC is the after time k. The resulting dataset has on average of 3,474 SDEs per long-term activity recognition module operating on short-term ac- sec. Figures 3(a)–3(c) show the experimental results on this dataset. tivities detected on video frames. In the datACRON project, dRTEC We varied the window size from 10 sec to 110 sec. The slide step recognises suspicious and illegal vessel activities given a compressed Qi −Qi−1 was set to be equal to the size of the window. Figure 3(a) vessel position stream produced by a trajectory processing module. shows the average number of SDEs per window size. The 10 sec The empirical analysis presented below was performed on a com- window corresponds to approximately 36K SDEs while the 110 sec puter with dual Intel Xeon E5-2630 processors, amounting to 24 one corresponds to 365K SDEs. Figure 3(a) also shows the number processing threads, and 256GB RAM, running Ubuntu 14.04 LTS of recognised CEs; these range from 80 to 230. 64-Bit with Linux kernel 3.13 and Java OpenJDK 1.8. dRTEC is The average CE recognition times per window (in CPU seconds) implemented in Apache Spark Streaming 1.5.2 using Scala 2.11.7. for both dRTEC and RTEC are shown in Figure 3(b). dRTEC made The source code, including the CE patterns for both applications, is use of all 24 processing threads. With the exception of the smallest publicly available9 . dRTEC’s warm up period is excluded from the window size, dRTEC outperforms RTEC. To allow for a fairer com- presented results. In all cases, dRTEC recognises the same CEs as parison, we invoked 24 instances of RTEC, each using in parallel one RTEC. processing thread of the underlying hardware. Every RTEC instance was set to perform CE recognition for at most 4 entity pairs, and was 4.1 Activity Recognition provided only with the SDEs concerning the entities of these pairs (no load balancing was performed). In this setting, dRTEC outper- The SYNAISTHISI project aims at developing customisable, dis- forms RTEC for most window sizes, but only slightly. tributed, low-cost security and surveillance solutions. To evaluate Figure 3(c) shows the effect of increasing the number of avail- dRTEC, we used the CAVIAR benchmark dataset10 which consists able processing threads on the performance of dRTEC and RTEC. of 28 surveillance videos of a public space. The CAVIAR videos We varied the number of available threads from 2 to 24; the win- show actors which are instructed to carry out several scenarios. Each dow size was set to 110 sec. RTEC achieves its best performance video has been manually annotated by the CAVIAR team to pro- early—the increase of processing threads affects it only slightly. In vide the ground truth for activities which take place on individual contrast, dRTEC requires all 24 processing threads to match (slightly video frames. These short-term activities are: entering and exiting the outperform) RTEC. The cost of data partitioning through dynamic surveillance area, walking, running, moving abruptly, being active grounding and indexing in dRTEC pays off only in the case of 24 and being inactive. We view these activities as SDEs. The CAVIAR threads. team has also annotated the videos with long-term activities: a per- To stress test further dRTEC, we constructed an even larger dataset son leaving an object unattended, people having a meeting, moving by adding a copy of the previous dataset with new identifiers for together, and fighting. These are the CEs that we want to recognise. the tracked entities. Thus, the resulting dataset contains a total of 20 The CAVIAR dataset includes 10 tracked entities, i.e. 90 entity tracked entities and 380 entity pairs, while approximately 7K SDEs pairs (most CEs in this application concern a pair of entities), while take place per sec. Figures 3(d)–3(e) show the experimental results. the frame rate is 40 milliseconds (ms). On average, 179 SDEs are We varied again the window size from 10 sec to 110 sec. In this case, detected per second (sec). To stress test dRTEC, we constructed a however, the SDEs range from 72K to 730K (see Figure 3(d)). The 9 https://github.com/blackeye42/dRTEC number of recognised CEs is also much higher; it ranges from 390 to 10 http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1 1100. 35 30 Avg. recognition time (sec) Avg. recognition time (sec) 400 100 dRTEC dRTEC Avg # of SDEs (thousands) Avg # of SDEs Avg # of CEs (thousands) Avg # of CEs 10 RTEC RTEC 20 200 50 5 10 0 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Window size (hours) Window size (hours) # of processing threads (a) Number of SDEs and CEs. (b) dRTEC vs RTEC: 24 processing threads. (c) dRTEC vs RTEC: 24 hour window. Figure 4. Event recognition for maritime surveillance. Figure 3(e) shows the average CE recognition times per window the effect of increasing the processing threads. Unlike the activity when all 24 processing threads were available both to dRTEC and recognition application, dRTEC outperforms RTEC even when just RTEC. Each RTEC instance was set to perform CE recognition for at a few processing threads are available. Similar to the activity recog- most 16 entity pairs, having available only the SDEs concerning the nition domain, dRTEC makes better use of the increasing number of entities of these pairs. Both dRTEC and RTEC remain real-time, even threads. in the presence of 730K SDE windows. In this set of experiments, dRTEC outperforms RTEC in all window sizes, and the difference is more significant. This is an indication that dRTEC scales better to 5 Discussion larger datasets. Figure 3(f) shows the effect of increasing the number Several techniques have been proposed in the literature for complex of processing threads. We observe a similar pattern to that of the event processing in Big Data applications, including pattern rewrit- previous experiments (see Figure 3(c)). ing [26], rule distribution [25], data distribution [13, 4] and parallel publish-subscribe content matching [21]. See [15, 16] for two recent 4.2 Maritime Surveillance surveys. Moreover, Spark Streaming has been recently used for com- plex event processing13 . The key difference between our work and The datACRON project aims to develop novel methods for detecting these approaches is the use of the Event Calculus. dRTEC inherits threats and abnormal activity in very large numbers of moving enti- from RTEC the ability to represent complex temporal phenomena, ties operating in large geographic areas. In the stream processing sys- explicitly represent CE intervals and thus avoid the related logical tem of datACRON, dRTEC serves as the component recognising var- problems [23], and perform reasoning over background knowledge. ious types of suspicious and illegal vessel activity. We conducted ex- This is in contrast to other complex event recognition systems, such periments against a real position stream from the Automated Identi- as [8, 19], the well-known SASE engine14 [27], and the Chronicle fication System11 , spanning from 1 June 2009 to 31 August 2009, for Recognition System [12]. 6,425 vessels sailing through the Aegean, the Ionian, and part of the Concerning the Event Calculus literature, dRTEC includes a win- Mediterranean Sea12 . The trajectory detection module of datACRON dowing technique. On the contrary, no Event Calculus system ‘for- compresses the vessel position stream to a stream of critical move- gets’ or represents concisely the SDE history. Moreover, dRTEC em- ment events of the following types: ‘low speed’, ‘speed change’, ploys a data partitioning technique using dynamic grounding and ‘gap’, indicating communication gaps, ‘turn’, and ‘stopped’, indicat- indexing. This way, dRTEC can take advantage of modern multi- ing that a vessel has stopped in the open sea. Each such event includes core hardware. This is in contrast to Event Calculus approaches the coordinates, speed and heading of the vessel at the time of crit- [7, 5, 24, 6, 22], where the implementations have no built-in sup- ical event detection. This way, the SDE stream includes 15,884,253 port for distributed processing. For instance, our empirical evaluation events. Given this SDE stream, we recognise the following CEs: il- verified that dRTEC scales better than RTEC, both to SDE streams legal shipping, suspicious vessel delay and vessel pursuit. of increasing velocity and larger numbers of CEs. Note that RTEC We varied the window size from 1 hour, including approximately has proven efficient enough for a variety of real-world applications 26K SDEs, to 24 hours, including 285K SDEs (see Figure 4(a)). The [3, 2], and already outperforms the well-known Esper engine15 in a slide step Qi −Qi−1 is always equal to the window size. The number wide range of complex event recognition tasks [1]. of recognised CEs ranges from 5K to 86K. In other words, the recog- Several event processing systems, such as [14, 11, 8, 10, 20], op- nised CEs are almost two orders of magnitude more than the CEs in erate only under the assumption that SDEs are temporally sorted. On the activity recognition application. the contrary, dRTEC supports out-of-order SDE streams and may dy- Figure 4(b) shows the average CE recognition times per window namically update the intervals of recognised CEs, or recognise new when all processing threads were used by both implementations. CEs, as a result of delayed SDE arrival. Similar to the previous experiments, each RTEC instance was given The use of Spark Streaming in dRTEC facilitates the integration only the SDEs of the vessels for which it performs CE recognition. with other modules not implemented in Prolog, such as the computer Although RTEC matches the performance of dRTEC for small win- vision modules of the SYNAISTHISI project and the trajectory com- dow sizes (1 hour and 2 hour windows), dRTEC scales much better to pression module of datACRON. The integration of RTEC with such larger window sizes. In other words, dRTEC seems to perform much modules was often problematic, due to issues of the libraries integrat- better in the presence of a large number of CEs. Figure 4(c) shows ing Prolog with other programming languages. Moreover, dRTEC 11 http://www.imo.org/OurWork/Safety/Navigation/Pages/AIS.aspx 13 https://github.com/Stratio/Decision 12 This anonymised dataset (for privacy, each vessel id has 14 http://sase.cs.umass.edu/ been replaced by a sequence number) is publicly available at http://chorochronos.datastories.org/?q=content/imis-3months 15 http://www.espertech.com/esper/ 36 avoids the memory management issues of Prolog systems that arise [18] R. Kowalski and M. Sergot, ‘A Logic-based Calculus of Events’, New from continuous query computations. Generation Computing, 4(1), 67–95, (1986). [19] J. Krämer and B. Seeger, ‘Semantics and implementation of For further work, we are investigating the use of a streaming in- continuous sliding window queries over data streams’, ACM frastructure that does not rely on micro-batching (e.g. Flink16 ). Fur- Transactions on Database Systems, 34(1), 1–49, (2009). thermore, we aim to integrate (supervised) structure learning tech- [20] M. Li, M. Mani, E. A. Rundensteiner, and T. Lin, ‘Complex event niques for the automated construction of Event Calculus patterns pattern detection over streams with interval-based temporal [17]. semantics’, in Proceedings of DEBS, pp. 291–302, (2011). [21] A. Margara and G. Cugola, ‘High-performance publish-subscribe matching using parallel hardware’, IEEE Trans. Parallel Distrib. Syst., 25(1), 126–135, (2014). ACKNOWLEDGEMENTS [22] M. Montali, F. M. Maggi, F. Chesani, P. Mello, and W. M. P. van der Aalst, ‘Monitoring business constraints with the Event Calculus’, This work was funded partly by the SYNAISTHISI project, which ACM TIST, 5(1), (2014). was co-financed by the European Fund for Regional Development [23] A. Paschke, ‘ECA-RuleML: An approach combining ECA rules with and from Greek National funds, and partly by the EU-funded H2020 temporal interval-based KR event/action logics and transactional datACRON project. We would also like to thank Elias Alevizos for update logics’, Technical Report 11, Technische Universität München, (2005). his help in the empirical analysis of dRTEC. [24] A. Paschke and M. Bichler, ‘Knowledge representation concepts for automated SLA management’, Decision Support Systems, 46(1), 187–205, (2008). REFERENCES [25] B. Schilling, B. Koldehofe, and K. Rothermel, ‘Efficient and distributed rule placement in heavy constraint-driven event systems’, [1] E. Alevizos and A. Artikis, ‘Being logical or going with the flow? A in Proceedings of IEEE HPCC, pp. 355–364, (2011). comparison of complex event processing systems’, in Proccedings of [26] N. P. Schultz-Møller, M. Migliavacca, and P. Pietzuch, ‘Distributed SETN, pp. 460–474, (2014). complex event processing with query rewriting’, in Proceedings of [2] E. Alevizos, A. Artikis, K. Patroumpas, M. Vodas, Y. Theodoridis, and DEBS, pp. 4:1–4:12, (2009). N. Pelekis, ‘How not to drown in a sea of information: An event [27] H. Zhang, Y. Diao, and N. Immerman, ‘On complexity and recognition approach’, in IEEE International Conference on Big Data, optimization of expensive queries in complex event processing’, in pp. 984–990, (2015). Proccedings of SIGMOD, pp. 217–228, (2014). [3] A. Artikis, M. Sergot, and G. Paliouras, ‘An event calculus for event recognition’, Knowledge and Data Engineering, IEEE Transactions on, 27(4), 895–908, (2015). [4] C. Balkesen, N. Dindar, M. Wetter, and N. Tatbul, ‘Rip: Run-based intra-query parallelism for scalable complex event processing’, in Proceedings of DEBS, (2013). [5] I. Cervesato and A. Montanari, ‘A calculus of macro-events: Progress report’, in Proceedings of TIME, pp. 47–58, (2000). [6] F. Chesani, P. Mello, M. Montali, and P. Torroni, ‘A logic-based, reactive calculus of events’, Fundamenta Informaticae, 105(1-2), 135–161, (2010). [7] L. Chittaro and A. Montanari, ‘Efficient temporal reasoning in the cached event calculus’, Computational Intelligence, 12(3), 359–382, (1996). [8] G. Cugola and A. Margara, ‘TESLA: a formally defined event specification language’, in Proceedings of DEBS, pp. 50–61, (2010). [9] G. Cugola and A. Margara, ‘Processing flows of information: From data stream to complex event processing’, ACM Comput. Surv., 44(3), 15:1–15:62, (June 2012). [10] N. Dindar, P. M. Fischer, M. Soner, and N. Tatbul, ‘Efficiently correlating complex events over live and archived data streams’, in Proceedings of DEBS, pp. 243–254, (2011). [11] L. Ding, S. Chen, E. A. Rundensteiner, J. Tatemura, W.-P. Hsiung, and K. Candan, ‘Runtime semantic query optimization for event stream processing’, in Proceedings of ICDE, pp. 676–685, (2008). [12] C. Dousson and P. Le Maigat, ‘Chronicle recognition improvement using temporal focusing and hierarchisation’, in Proceedings of IJCAI, pp. 324–329, (2007). [13] B. Gedik, S. Schneider, M. Hirzel, and K. Wu, ‘Elastic scaling for data stream processing’, IEEE Trans. Parallel Distrib. Syst., 25(6), 1447–1463, (2014). [14] D. Gyllstrom, E. Wu, H.-J. Chae, Y. Diao, P. Stahlberg, and G. Anderson, ‘SASE: Complex event processing over streams’, in Proceedings of the International Conference on Innovative Data Systems Research (CIDR), (2007). [15] M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm, ‘A catalog of stream processing optimizations’, ACM Comput. Surv., 46(4), 46:1–46:34, (2013). [16] Martin Hirzel, ‘Partition and compose: parallel complex event processing’, in Proceedings of ACM DEBS, pp. 191–200, (2012). [17] N. Katzouris, A. Artikis, and G. Paliouras, ‘Incremental learning of event definitions with inductive logic programming’, Machine Learning, 100(2-3), 555–585, (2015). 16 https://flink.apache.org/ 37