=Paper= {{Paper |id=Vol-1447/paper1 |storemode=property |title=Towards Enriching CQELS with Complex Event Processing and Path Navigation |pdfUrl=https://ceur-ws.org/Vol-1447/paper1.pdf |volume=Vol-1447 |dblpUrl=https://dblp.org/rec/conf/ki/Dao-TranP15 }} ==Towards Enriching CQELS with Complex Event Processing and Path Navigation== https://ceur-ws.org/Vol-1447/paper1.pdf

Towards Enriching CQELS with
Complex Event Processing and Path Navigation

Minh Dao-Tran1 and Danh Le-Phuoc2
1
Institute of Information Systems, Vienna University of Technology
Favoritenstraße 9-11, A-1040 Vienna, Austria
dao@kr.tuwien.ac.at
2
Insight Centre for Data Analytics, National University of Ireland, Galway
danh.lephuoc@nuigalway.ie

Abstract. The increasing popularity of RDF Stream Processing has led
to the developments of several RSP engines. Among these, CQELS has
been developed with a native and adaptive approach, which gives it a
performance advantage over other engines. However, it currently does
not support two important features, namely Complex Event Process-
ing and RDFS reasoning. We propose in this paper an extension of the
CQELS query language and semantics towards enriching CQELS with
these attractive functionalities.

Keywords: RDF Stream Processing, Complex Event Processing, Path
Navigation

1 Introduction
Following the trend of using RDF as a unified data model for integrating diverse
web data sources across systems under different controls, RDF Stream Processing
(RSP) extends RDF to addresses the challenges in querying heterogeneous data
streams coming from dynamic data sources of emerging ICT systems such as
IoT and Smart Cities. This research field has being increasingly getting more
attention with the developments of several RSP engines such as C-SPARQL [5],
CQELS [17], EP-SPARQL [3], SPARQLStream [7], and Sparkwave [13] which
mainly aim at providing stream processing functionalities.
Among these engines, CQELS engine was designed to achieve the strict per-
formance requirements of stream processing engines [20] with native physical
query operators. The current implementation of CQELS engine provides high
throughput physical operators to cover the combination of CQL operators and
SPARQL 1.1 with CQELS query language (CQELS-QL). It has been shown to
have a performance advantage over other RSP engines [18].
However, CQELS currently does not support two important features, namely
Complex Event Processing (CEP) and RDFS reasoning. The former play an im-
portant role in handling and analyzing complex relation over high volume of
dynamic data [1], while the latter allows reasoning about types and brings addi-
tional expressiveness for RSP. Having them in CQELS will enable a new range of
`2
Gulda Lane (g) Haydn Park. (h)

,m
h

h
y,

y,
y,

ay
la

la
la
Mozart C. (m) Center Sq. (c)

l
de

de
de

de
1,

1,
1,

1,
d

d
a

a
`1 Beethoven St. (b)
`3 10 12 14 16

Fig. 1: Public Transportation Scenario

applications that require not only high performance but also high expressiveness,
for example, as in the motivating scenario below.
Towards a more powerful CQELS, in this paper, we enrich the current query
language of CQELS to support the desired features. Our contributions are:

– extending the RDF stream processing model to work on time intervals in-
stead of time points;
– proposing an extension of the CQELS-QL to handle navigational paths and
complex event processing. For the former, we make use of nSPARQL [16]
and SPARQL 1.1 property paths; for the latter, we start the sequence oper-
ator SEQ, which is the most popular CEP operator in practice;
– giving a semantics of the new language in the SPARQL style, by lifting the
join and sequence-related operators to work on sets of mappings with inter-
vals, and introducing evaluation functions that make use of these operators.

The proposed new language features and semantics will be illustrated with the
following scenario.
Motivating Scenario. Suppose that the traffic center at city of Vienna wants
to improve the quality of public transportation by means of smart services. For
this purpose, the center has background datasets regarding available vehicles
such as subways, trams, buses, and connections between every pair of consecutive
stops, including the stop names, types of vehicles and the duration of time needed
to travel between them. Moreover, the center receives sensor data reporting the
arrival and delay of vehicles with respect to stops. Passengers who register to
the smart service also provide their locations as input streams.
On top of these background and streaming data, the center can offer smart
services (i) to aid traffic officers in monitoring and quickly identifying potential
problems in the transportation network or (ii) to notify passengers of potential
traffic delays and recommend alternative routes.

Example 1 To make sure smooth connections to the city center, a continuous
query can be placed to notify about recent repeated delays of subways follow-
ing by no arrival at stops that according to the network plan can be reached by
subways. When such information is instantly available to an officer, he/she can
immediately react by triggering rerouting or providing complementary vehicles
to solve the traffic.
Take a simplified public transportation map in Figure 1 where `1 and `2 are
subway lines, and `3 is a tram line. Assume that delays and arrivals of vehicles
wrt. stops are reported along the time lines. At time point 14, a repeated delay
for the subway d1 on the way to h is detected, but this is not reported as getting
from h to c needs to use the tram line `3 . At time point 16, the repeated delay for
the subway a1 at m is however reported as one can get from m to c by subways.

2 Preliminaries
This section briefly reviews the basic building blocks of this work, namely RDF,
SPARQL, RDF stream processing, the navigation language nSPARQL, and CEP.

2.1 RDF and SPARQL
RDF is a W3C recommendation for data interchange on the Web [8]. It models
data as directed labeled graphs whose nodes are resources and edges represent
relations among them. Each node can be a named resource (identified by an IRI),
an anonymous resource (a blank node), or a literal. We denote by I, B, L the sets
of IRIs, blank nodes, and literals, respectively. Let IB = I ∪ B, IBL = I ∪ B ∪ L.
A triple (s, p, o) ∈ IB × I × IBL) is an RDF triple, where s is the subject, p
the predicate, and o the object. An RDF graph is a set of RDF triples.
Example 2 (cont’d) The background dataset in Example 1 can be represented
by the following RDF graph:
 

 :conn1 :beg :m, :conn1 :end :b, :conn1 :means :subway, :conn1 :dur :3m,  
 


 :conn2 :beg :b, :conn2 :end :c, :conn2 :means :subway, :conn2 :dur :2m,  

 


 :conn 3 :beg :h, :conn3 :end :g, :conn 3 :means :subway, :conn 3 :dur :3m, 



 :conn4 :beg :g, :conn4 :end :c, :conn4 :means :tram, 
:conn4 :dur :5m, 
D= ..
.

 


 


 :a1 rdf:type :subway, 

 :d1 rdf:type :subway, 


.

 


 .. 


A triple pattern is a tuple (sp, pp, op) ∈ (IB ∪ V )×(I ∪ V )×(IBL ∪ V ), where V
is a set of variables. A basic graph pattern is a set of triple patterns.
SPARQL [12], a W3C recommendation for querying RDF graphs, is essen-
tially a graph-matching query language. A SPARQL query is of the form H ← B,
where B, the body of the query, is a complex RDF graph pattern composed of
basic graph patterns with different algebraic operators such as UNION, OPTIONAL,
etc.; and H, the head of the query, is an expression that indicates how to con-
struct the answer to the query [15].
The semantics of SPARQL is defined via mappings. A mapping is a partial
function µ : V → IBL. The result of a SELECT SPARQL query is a set of mappings
that match the query’s body, projected to the variables specified in the SELECT
clause. However, one-shot queries by themselves are not able to give answers
under dynamic input as in the running scenario. For this purpose, we need RDF
stream processing.
2.2 RDF Stream Processing

RDF Streams and Temporal RDF Graphs. In continuous query process-
ing over dynamic data, the temporal nature of the data is crucial and needs
to be captured in the data representation. This applies to both Linked Stream
Data and Linked Data, as updates in Linked Data collections are also possible.
In [17], RDF streams and instantaneous RDF datasets were defined by intro-
ducing timestamps (time points) to input triples of the former and RDF graphs
of the latter. In this paper, we adapt these notions by using time intervals as
timestamps. Thereby,
1. An RDF graph at timestamp t, denoted by G(t), is a set of RDF triples valid
at time t and called an instantaneous RDF graph. A temporal RDF graph is
a sequence G = [G(t)], t ∈ N, ordered by t.
2. An RDF stream S is a sequence of elements hg : [t1 , t2 ]i, where g is an RDF
graph and [t1 , t2 ] is a time interval.
Example 3 (cont’d) The sensor data regarding delay and arrival of vehicles
as in Figure 1 can be represented as the following RDF stream.3
h{(a1 , delay, m)}, [10, 10]i, h{(d1 , delay, h)}, [12, 12]i,
S=
h{(d1 , delay, h)}, [14, 14]i, h{(a1 , delay, m)}, [16, 16]i, . . .

Continuous Queries. Queries in CQELS are inspired by the Continuous Query
Language (CQL) [4], where a continuous query is composed from three classes
of operators, namely stream-to-relation (S2R), relation-to-relation (R2R), and
relation-to-stream (R2S) operators. In CQELS, the former are captured by ex-
tending SPARQL 1.1 grammar4 with a “stream graph” pattern, while the latter
are taken care of by SPARQL’s operators. For more details on the query syntax,
we refer the reader to [17].
Example 4 (cont’d) The following continuous query in the CQELS-QL noti-
fies stops with delays of subways during the last 10 minutes to users:
1 SELECT ?s
2 FROM ex : transportationMap
3 FROM NAMED WINDOW : W ON ex : publicTransport [ RANGE 10 m ]

4 WHERE {

5 WINDOW : W { ? v : delayAt ? s }
6 ? v rdf : type : subway .
7 }

3
Note that the notification of delay or arrival here is considered atomic event; there-
fore, the time intervals associated with this data is of the form [t, t]. On the other
hand, the output of a CQELS query containing non-atomic events (cf. Example 9)
can be fed as input stream to another query. In such situation, input events are
associated with time interval [t1 , t2 ] with t1 ≤ t2 .
4
http://www.w3.org/TR/sparql11-query/#grammar
[[self :: a]]G = {(a, a)}
[[next :: a]]G = {(x, y) | ∃z : (x, z, y) ∈ G}
[[axis −1 :: a]]G = {(x, y) | (y, x) ∈ [[axis :: a]]G }, where axis ∈ {next, node, edge}
..
.
Table 1: Formal semantics of nested regular expressions

2.3 Navigating RDF Graphs with nSPARQL
For a data model with graph structure like RDF, being able to navigate through
the graphs is fundamentally important. In [16], the authors proposed nSPARQL,
a language that incorporates navigational capabilities to a fragment of SPARQL
using nested regular expressions. nSPARQL allows to pose interesting and nat-
ural queries over RDF data, and has an attractive computational property that
nested regular expressions can be efficiently evaluated in polynomial time.
The new SPARQL 1.1 query language introduced property paths 5 that covers
a fragment of nSPARQL without nesting. In this paper, we augment CQELS-QL
with the full functionalities of nSPARQL based on the syntax of SPARQL 1.1.
We now briefly recall nSPARQL, the following grammar defines the syntax of
nested regular expressions:

exp ::= axis | axis :: a(a ∈ IBL) | axis :: [exp] | exp/exp | exp|exp | exp ∗ | exp + ,

where axis ∈ {self, next, next−1 , edge, edge−1 , node, node−1 }. The evaluation
of a nested regular expression exp in a graph G is formally defined as a binary
relation [[exp]]G , denoting the pairs of nodes (x, y) such that y is reachable from x
in G by following a path that conforms to exp. Table 1 partly shows the formal
semantics of nSPARQL on constructs that are used in our running example. For
the full semantics, we refer the reader to [16].
Example 5 (cont’d) To find all stops from which one can reach the city center
by subway connections, we can use the following expression:

?s (next−1 ::beg[next::means/self::subway]/next::end)+ :c.

Based on nSPARQL, we define an extended triple pattern as either a triple pat-
tern or a triple (sp, exp, op), where sp, op ∈ I ∪ L ∪ V and exp is a nested regular
expression. An extended graph pattern P is a set of extended triple patterns.

2.4 Complex Event Processing
Complex Event Processing [14] emerged from publish-subscribe systems [19].
While the latter refer only to single isolated events, CEP aims at timely detecting
5
http://www.w3.org/TR/sparql11-query/#propertypaths
high-level events as complex patterns of incoming single atomic events whose
order is crucial. The defined complex events can in turn be used to compose
even more complex events and so forth.
To express ordering between events, CEP languages assign time intervals
as timestamps for events and make use of operators rooted from Allen’s in-
terval algebra [2]. In this paper, we take the first step to incorporate CEP
into CQELS-QL by adopting the sequencing operator SEQ from the SASE sys-
tem [21] and applies it to time intervals. The version of SEQ with time points
in SASE can be briefly described as follows.
Let Ai be an event type. Its semantics represented as Ai (t), is that at a
given point t in time, Ai (t) is True if an Ai type event occurs at t, and is False
otherwise. The SEQ operator takes a list of n > 1 event types as its parameters,
e.g., SEQ(A1 , . . . , An ) and specifies a particular order in which the events of
interest should occur. Formally speaking:

SEQ(A1 , . . . , An ) ≡ ∃ t1 < t2 < . . . < tn : A1 (t1 ) ∧ A2 (t2 ) ∧ . . . ∧ An (tn ).

When an event type is negatively specified in the sequence, that is:

SEQ(A1 , . . . , Ai−1 , !Ai , Ai+1 , . . . , An ),

this corresponds to the SEQ WITHOUT operator (abbreviated as SEQ WO in
this paper), which intuitively detects sequences of events of types A1 , . . . , Ai−1 ,
Ai+1 , . . . , An where no event of type Ai occurs in between two events of types Ai−1
and Ai+1 . Formally:

SEQ(A1 , . . . , Ai−1 , !Ai , Ai+1 , . . . , An ) ≡
∃t1 < . . .