=Paper= {{Paper |id=None |storemode=property |title=Towards Efficient Semantically Enriched Complex Event Processing and Pattern Matching |pdfUrl=https://ceur-ws.org/Vol-1303/paper_2.pdf |volume=Vol-1303 |dblpUrl=https://dblp.org/rec/conf/semweb/Gillani14 }} ==Towards Efficient Semantically Enriched Complex Event Processing and Pattern Matching== https://ceur-ws.org/Vol-1303/paper_2.pdf
       Towards Efficient Semantically Enriched Complex
           Event Processing and Pattern Matching
                       (Position Paper)

    Syed Gillani12 , Gauthier Picard1 , Frédérique Laforest2 , and Antoine Zimmermann1
                      1
                       Institute Henri Fayol, EMSE, Saint Etienne, France
                            firstname.lastname@emse.fr,
             2
               Telecom Saint Etienne, Université Jean Monnet, Saint Etienne, France
                  frederique.laforest@telecom-st-etienne.fr



         Abstract. Management and recognition of event patterns is becoming thoroughly
         ingrained in many application areas of Semantically enabled Complex Event Pro-
         cessing (SCEP). However, the reliance of state-of-the-art technologies on rela-
         tional and RDF triple model without having the notion of time has severe lim-
         itations. This restricts the system to employ temporal reasoning at RDF level
         and use historical events to predict new situations. Additionally, the semantics of
         traditional query languages makes it quite challenging to implement distributed
         event processing. In our vision, SCEP needs to consider RDF as a first class cit-
         izen and should implement parallel and distributed processing to deal with large
         amount of data streams. In this paper, we discuss various challenges and limi-
         tations of state-of-the-art technologies and propose a possible solution to extend
         RDF data model for stream processing and pattern matching. Furthermore, we
         describe a high-level query design that enables efficient parallel and distributed
         pattern matching through query rewriting.


1     Introduction
Complex event processing (CEP) has proven to be the key technique for handling and
analysing complex relations over high volume of dynamic data [1]. An additional prop-
erty of data produced from diverse sources is the heterogeneity and potentially the large
number of distributed event sources. CEP systems are characterised by real-time re-
sponse and complex events are typically described by a set of temporally and spatially
distributed patterns [5]. The Relational database community has proposed various so-
lutions to capture the streaming nature of data [1, 2, 16]. But due to the limitations in
their underlying data model, the handling of heterogeneous data with variable seman-
tics and dynamic information inference is still an open issue. Thus, graph based data
models, such as RDF have emerged to be an accepted solution for such problem and
proved to be a vital component of the Semantic Web. RDF data model shows great
promise but lacks the notion of time. This constraints most of the existing Semantically
enabled Complex Event Processing (SCEP) systems (CQELS [11], C-SPARQL [6], EP-
SPARQL [4]) to base their roots on the work of relational CEP. Such models cut down
on the syntactic diversity and lack in customisable selection policies [12]. Additionally,
existing SCEP solutions commonly assume centralised availability and processing of
all relevant events, and thus incur significant overheads in distributed settings. Our vi-
sion is to extend the design of stream reasoning and pattern matching, while introducing
RDF as a first class citizen in modelling and reasoning on incoming streams.
     In this paper, we propose a new event data model for SCEP that allows temporal
reasoning at RDF level. We also propose the design of new query language that enables
parallel and distributed pattern matching, and stream reasoning by employing query
rewriting. The event data model underlying the system allows filters and predicates to
be expressed with respects to the events in specified patterns. The integration of event
model and query semantics allows the use of event detection graphs (EDG) and non-
deterministic automata (NFA) in the context of Semantic Web. In particular, Section
2 reports some challenges and limitations of traditional SCEP. Section 3 describes the
detailed event model and its advantages, Section 4 presents the semantics of our high-
level pattern matching query. Section 5 provides the details of patterns detection and
finally, Section 6 outlines some conclusive remarks.

2   Foundations
This section highlights some of the foundational challenges faced by the traditional
SCEP.
    Distributed Event Processing: The detection of complex event patterns is comput-
ing intensive and the system needs to scale in a concurrent distributed manner to detect
patterns by utilising a set of distributed machines or existing resources [7]. Tradition-
ally, streams processing for pattern matching is based on a centralised, push-based event
acquisition and processing model [2]. The limitations of such centralised model are two
folded: first, it inevitably limits SCEP scalability and performance to deal with high rate
event sources; second, it requires communicating all base events to the main process-
ing engine, while only a small fraction of all base events eventually make up complex
events. This motivates the design of a distributed SCEP system in order to utilise the
resources in an efficient way with optimised performance.
    Language for Pattern Queries: The ability to recognise complex patterns is critical
for various system settings. On one hand, most of the relational languages for event pro-
cessing systems are inspired from regular languages and therefore have automata based
semantics [1]. While on the other hand SCEP mostly relies on rule-based processing
due to the inability of reasoning capability for automata-based semantics. For instance,
ETAILS (EP-SPARQL) converts queries to prolog rules and evaluates them as a single
thread [12]. This limits the use of various operators including kleene closure, negation,
union/intersection of patterns. Thus, this requires a model that can borrow the useful
operators from relational CEP and embed them in a semantically enriched CEP.
    Predictive Event Processing: Most of the early SCEP research is primary focussed
on pattern matching techniques over real-time event streams. However, querying his-
torical events has fundamental importance in many SCEP applications and has so far
not received much attention [10]. The presence of stream archive enables a new class of
SCEP applications, which can correlate patterns matched over live and archived event
streams. This builds the foundation of forecasting, prediction of future events occur-
rence and identification of casual relationships among complex events across multiple
time scales. The analysis of historical patterns is a particularly challenging task. It in-
volves in processing of potentially large amount of historical data and requires efficient
query semantics, storage, retrieval and load shedding of time-annotated information.
Currently, no solution fully integrates stream processing and historical data manage-
ment [12].
    RDF Graph Based Streaming: Generally most of the existing SCEP assumed that
incoming data is delivered as individual RDF triples and implicitly ordered by arrival
time, or explicitly ordered by timestamps [12]. This triple based model suffers from
various limitations, as event objects generally consist of RDF graph (e.g., data enacting
from sensors), rather than individual RDF triples. When event objects are decomposed
into a list of RDF triples for streaming purpose, firstly, it becomes essential that the
boundaries of the event objects can be respected within queries, since a query that ob-
serves only a partial RDF graph may return false results. Secondly, RDF is structured as
graph. Thus, in order to distribute it into triples for the sake of allowing the consump-
tion in triples, the system has to shred the general joined structure of RDF. This results
in extra computation to rejoin the triples for graph manipulation. The use of RDF graph
for streaming purpose instead of individual triples would greatly simplify the system
design. It would enable the event producers to group the triples from the same source
with the same timestamps into one graph, while implementing fair boundaries for the
streaming queries to operate in.

3     Event and Stream Data Model
Our vision for SCEP is to use RDF as a first class citizen for continuous query process-
ing and pattern matching. The standard RDF model consists of sets of atomic statements
called triples (subject, predicate, object). In order to integrate temporal relatedness and
to deal with the problem of diachronic identity [9] (i.e., how do we logically account
for the fact that the “same” entity appears to be “different” at different times), it should
be extended to associate time as metadata. This timestamp could be a single temporal
identifier to represent the arrival time, but in order to assimilate facts, system requires
an interval or time range on which data item is valid. The temporal relations between
events can be captured through a set of operators, such as operators from Allens’s in-
terval algebra [3]. Hence, we put forward a new event and stream model, where each
incoming event is a temporally annotated RDF Named Graph and bounds a set of RDF
triples. Our data model is analogous to temporally annotated RDF [9] with the integra-
tion of Named Graphs. The serialisation of such a model can be achieved by relying on
existing proposals such as reification or TriG3 format.
    Definition: Temporally Annotated RDF Named Graph A Named Graph NG is a
pair (u, G), where u is an IRI and G is a RDF graph. A temporally annotated Named
graph is a tuple hNG, At i, Where NG is a Named Graph and At is a time interval to
describe the validity of Named Graph.
    Definition : Event An event E hNG, At i consists of a temporally annotated RDF
Named graph and temporal interval At = [t s , te ], where t s , te ∈ At is a time interval such
that x ∈ At iff x ≥ t s and x ≤ te .
    Definition : Event Stream An Event stream S is a totally ordered sequence of events
S = (E1 , E2 ..., En ). Each event instance Ei belongs to a particular event type ci ∈ , and
                                                                                        P

 3
     TriG (http://www.w3.org/TR/2014/REC-trig-20140225/)
                                                     = {ci } is the domain of all possible
                                                  P
has a unique annotated time interval At , where
event types.
     In this work we make the simplifying assumption that events in the sequence are in
strictly increasing time order, and there are no out-of-order events in the sequence.
     Definition : Complex Event A complex event is an event that result from the appli-
cation of a function that takes as input a sequence S of events, a set of query patterns Qi
of the form Qi = (p1 , p2 .., pn ) with pi ∈ being an event type (resulted from a certain
                                            P
sub-query, see section 4), and that outputs a set M = qi ∈Q M(qi , S ), where M(qi , S ) is
                                                        S
the set of all query matches of qi over S .

     Annotated RDF graph allows to capture the diachronic nature of events. It describes
temporally valid facts, where validity is indicated by specified time intervals. For in-
stance, if James was a CEO of IBM from June 2011 to July 2013 and later he was CEO
of Oracle from July 2013 to January 2014, then this generates two facts with their inter-
val and validities. There are numerous cases where streaming of valid facts is essential
for complex pattern matching. The status of these facts should be maintained by the
stream processing platform and later saved for archived query processing.
     In the context of event processing, the use of RDF Named Graphs has the following
advantages including data partitioning, stream data summarising, event body definition,
access control and provenance tracking.
     Data Partitioning: Efficiently storage and management of historical stream archives
on a single machine could be unrealistic, due to the large volume of data streams and
their heterogeneity. Traditionally, triple stores use hashing approaches to allocate triples
to different data stores. However, complex pattern queries usually involve multiple joins
and would result in heavy communication costs [15]. Hence, the use of logical parti-
tioning with RDF Named Graphs can realise a vertical partitioning of data using Named
Graph indexing. This makes it easier to identify and query subsets within an aggregation
of events.
     Summarising Stream Data: High volume and frequent sampling aspect of stream-
ing data propose a reasonable assumption, that two identical values that are reported
within some time threshold indicate unchanged state between these values. In our case,
summarisation equates the merging of two Named Graphs, where each represents an
observation. Summarisation of events can be achieved by merging two events into a
single one, whose validity spans the duration of both events. This will further decrease
the size of data stored and increase the query performance [14].
     Defining Event Boundary, Access control and Provenance Tracking: As dis-
cussed earlier, Named Graphs are essential to define the boundary to a set of triples
emanating from the same source with the same temporal information (e.g., sensor data
with values of current temperature, device ID, location). Further, Named Graphs aid
in tracking provenance and managing the access control of stream processing in a re-
stricted environment.

4   Query Model and Language Specification
Complex pattern queries cover a considerable amount of literature in the relational com-
munity. But in the context of the Semantic Web EP-SPARQL is considered to be the
 1 PREFIX sm:                                                  Sub-Query 1 (Event Pattern A)
                                                               1      Select *
 2 Select *                                                    2 From Stream S1 

 3 From Stream S1  Window From Now 10 hours                   4 (? event rdfs : subclassof owl : thing ;

 5 From Stream S2  Window From Now 2 mins                   7       sm : power ? pow ]. ) :[ T1030 , T2030 ]
 6 From Stream S3  Window From Now 10 mins                 10

 7 Where {                                                    11    }
 8 SEQ ( EVENTPATT A, (EVENTPATT B)+, (EVENTPATT C AND
                                                                                   Sub-Query 2 (Event Pattern B)
      EVENTPATT B))                                                1 Select *
 9                                                                 2 From Stream S3  Window From Now 10 mins
                                                                   3 Where {
        thing; sm:events[ sm:eventType sm:powersource; sm:         4 (? event rdfs : subclassof owl : thing ;

        id ?id; sm:power ?genpow]. GRAPH  {?id lv:name ?locName}        7      sm: temp ? temp ;
        FILTER( ?id = ’gen1’, ?pow = ’60’) }                       8      sm: pressure ? pres ].) : [T1030 , T1040 ]
                                                                   9
11
                                                               10       FILTER ( ?id = ’Wsource1 ’, ? temp = ’20 ’, ? pres =’10 ’)
12   DEFINE EVENTPATT B ON S2 { ?event rdfs:subclassof owl:    11       }
        thing; sm:events[ sm:eventType sm:weathersoruce; sm
                                                                                    Sub-Query 3 (Event Pattern C)
        :id ?id; sm:temp ?temp; sm:pressure ?pres]. FILTER(
         ?id = ’Wsource1’, ?temp = ’20’, ?pres=’10’) }              1 Select *
13                                                                  2 From Stream S2  Window From Now 5 hours
14   DEFINE EVENTPATT C ON S3 { ?event rdfs:subclassof owl:         3 Where {


        thing; sm:events[ sm:eventType sm:electricappliance         4 (? event rdfs : subclassof owl : thing ;

                                                                    5 sm : events [ sm : eventType   sm : storagesource ;
        ; sm:name ?name; sm:usagepower ?pow; sm:loadclass ?         6       sm: id ? id ;
        load]. FILTER( ?id = ’heater’, ?pow 80)
                                                                   11
16   }                                                             12   }


                        (a)                                                                           (b)


Fig. 1: Conceptual High-level Pattern Matching Query and Resulted Sub-queries after Query-
rewriting

only core language for such case. However, it suffers from various drawbacks including
reliance on triple-based data model, no support for multiple and heterogeneous event
sources, event source based filtering, use of black-box approach to delegate the process-
ing to other engines and overhead in query and data translation for underlying model
[12]. Our vision is to introduce a new core language for pattern matching. It covers
various limitations of previous approaches and includes constructs that are useful in
real-world applications. Our core language employs the event model described above
and is applicable to distributed event streams. It seeks for a series of events that occur in
required temporal order, while satisfying other constraints. Some of the constructs are
borrowed from SCEP [12], while many are new to this context. The newly introduced
constructs and overall structure of a pattern query are as follows.
     ::= SELECT | CONSTRUCT
     FROM STREAM   WINDOW