=Paper= {{Paper |id=None |storemode=property |title=Towards a Logic-Based Framework for Analyzing Stream Reasoning |pdfUrl=https://ceur-ws.org/Vol-1303/paper_3.pdf |volume=Vol-1303 |dblpUrl=https://dblp.org/rec/conf/semweb/Beck14 }} ==Towards a Logic-Based Framework for Analyzing Stream Reasoning== https://ceur-ws.org/Vol-1303/paper_3.pdf
               Towards a Logic-Based Framework for
                  Analyzing Stream Reasoning?

             Harald Beck, Minh Dao-Tran, Thomas Eiter, and Michael Fink

                 Institut für Informationssysteme, Technische Universität Wien
                           Favoritenstraße 9-11, A-1040 Vienna, Austria
                     {beck,dao,eiter,fink}@kr.tuwien.ac.at



        Abstract. The rise of smart applications has drawn interest to logical reason-
        ing over data streams. Recently, different query languages and stream process-
        ing/reasoning engines were proposed in different communities. However, due to
        a lack of theoretical foundations, the expressivity and semantics of these diverse
        approaches were given only informally. Towards clear specifications and means
        for analytic study, a formal framework is needed to define their semantics in pre-
        cise terms. To this end, we present ongoing work towards such a framework and
        show how a core fragment of the prominent Continuous Query Language can be
        captured.


1     Introduction
The emergence of sensors, networks, and mobile devices has generated a trend towards
pushing rather than pulling of data in information processing. In the setting of stream
processing [1] studied by the database community, input tuples dynamically arrive at
systems in form of possibly infinite streams. To deal with unboundedness of data, re-
spective systems typically apply window operators to obtain snapshots of recent data.
On these, the user runs continuous queries which are triggered either periodically or by
events, e.g., by the arrival of new input. A well-known stream processing language is
the Continuous Query Language (CQL) [2] which has an SQL-like syntax and a clear
operational semantics.
    Recently, the rise of smart applications such as smart cities, smart home, smart grid,
etc., has raised interest in the topic of stream reasoning [3], i.e., logical reasoning on
streaming data. Consider the following example from the public transport domain.
Example 1. To monitor a city’s public transportation, the city traffic center has a back-
ground data set regarding the assignment of trams to lines of the form line(ID, L),
where ID is the tram identifier and L the identifier of the line. The planned travelling
time (duration Z) between stops X and Y with line L is stored in a database by a
row plan(L, X, Y, Z). Old trams which are not convenient for travelling with wheel
chairs or baby strollers are classified with facts of the form old (ID). Moreover, sensor
data of the form tram(ID, X) reports the appearance of tram ID at stop X.
    On top of these background data and streaming tuples, one may provide smart ser-
vices that report the traffic status and suggest updates for travel routes in real time. For
?
    This research has been supported by the Austrian Science Fund (FWF) project P26471.
                                                  p1
      PLAN                LINE           `1
 L    X Y Z              ID L                                           p3
 `1   p1 p3 8            a1 `1
 `2   p2 p3 3            a2 `2                              p2
       ...                 ...           `2
                                                     (a) Public transportation map
  OLD
   a1                                                                             waiting
                                           tram(a1 , p1 )        tram(a2 , p2 )    time
   ...
                                                                                            t
                                                36                     40         43 44
                                                                 (b) Time line
                                   Fig. 1: Traffic scenario



instance, it is of interest whether the service is reliable, i.e., whether at a given stop
a tram appeared within the last 10 minutes. In addition, a user usually wants to know
the expected arrival time of the next tram. Moreover, consider the case that the user
travelling with a baby stroller has different options to change trams to reach a goal. She
prefers to have short waiting time, and to switch to trams that are convenient for the
baby stroller, which are not the old ones. Thus, we are interested in services that report
about: (s1 ) the reliability of a tram line at a stop, (s2 ) the expectation about tram arrival
times, and (s3 ) the quality of the connection between two lines at a given stop.
    Fig. 1a shows a simple map with two tram lines `1 and `2 . The background data
comprises three tables. PLAN states the planned travelling time between the stops,
LINE shows the assignments of trams to lines, and OLD identifies old trams. Consider
the scenario of Fig. 1b which depicts tram arrival. Let the time granularity be minutes,
the reading of predicate tram(a1 , p1 ) at t = 36 means that tram a1 arrived at stop p1 at
minute 36. Similarly, tram a2 arrived at stop p2 at minute 40. Based on this input stream
and the background data, services (s1 )–(s3 ) provide the following reports:
(s1 ) At minute 40, line `1 is reliable at stop p1 but not at p3 , because in the last 10
      minutes, a1 appeared at p1 but no other tram of line `1 appeared at p3 .
(s2 ) By the first row of PLAN, tram a1 is expected to arrive at p3 at minute 44. Then,
      by the second row, a2 should arrive at p3 one minute earlier, i.e., at minute 43.
(s3 ) The option to switch from line `2 to `1 at p3 satisfies the short waiting time re-
      quirement. However, since tram a1 is old, it is not a good connection.            

Different communities have contributed to different aspects of this topic. (i) The Se-
mantic Web community extends SPARQL to allow querying on streams of RDF triples.
Engines such as CQELS [4] and C-SPARQL [5] also follow the snapshot semantics ap-
proach of CQL. (ii) In Knowledge Representation and Reasoning (KRR), first attempts
towards expressive stream reasoning have been carried out by considering continuous
data in Answer Set Programming (ASP) [6, 7] or extending Datalog to sequential logic
programs [8]. However, the state of the art in either field has several shortcomings.
    Approaches in (i) face difficulties with extensions of the formalism to incorporate
the Closed World Assumption, non-monotonicity, or non-determinism. Such features
are important to deal with missing or incomplete data, which can, e.g., temporarily
happen due to unstable network connections or hardware failure. In this case, engines
like C-SPARQL and CQELS remain idle, while some output based on default reason-
ing might be useful. Moreover, e.g., in the use case of dynamic planning on live data,
multiple plans shall be generated based on previous choices and the availability of new
data. This is not possible with current deterministic approaches.
    On the other hand, advanced reasoning has extensively been investigated in (ii) but
traditionally only on static data. First attempts towards stream reasoning reveal many
problems to solve. The plain approach of [6] periodically calls the dlvhex solver [9]
without incremental reasoning and thus is not capable of handling heavy data load.
StreamLog [8] is an extension of Datalog towards stream reasoning. It always com-
putes a single model. OSMS [10] considers streams of ontologies. Both StreamLog and
OSMS do not consider window mechanisms. ETALIS [16] provides a rule-based lan-
guage for pattern matching over event streams with declarative monotonic semantics.
Windows are also not first-class citizens in ETALIS. Time-decaying logic programs [7]
are an attempt to implement time-based windows in reactive ASP [11] but its relation
to other stream processing/reasoning approaches has not yet been explored.
    Moreover, as observed in [12], conceptually identical queries may produce different
results in different engines. While such deviations may occur due to differences (i.e.,
flaws) in implementations of a common semantics, they might also arise from (correct
implementations of) different semantics. For a user it is important to know the exact
capabilities and the semantic behavior of a given approach. However, there is a lack
of theoretical underpinning or a formal framework for stream reasoning that allows
to capture different (intended) semantics in precise terms. Investigations of specific
languages, as well as comparisons between different approaches, are confined to ex-
perimental analysis [13], or informal examination on specific examples. A systematic
investigation, however, requires a formalism to rigorously describe the expressivity and
the properties of a language.
Contributions. In [14], we presented a first step towards a formal framework for stream
reasoning that provides a common ground to express concepts from different stream
processing/reasoning formalisms and engines. This allows for systematic analysis and
comparison between existing stream processing/reasoning semantics. The idea of cap-
turing related systems was informally discussed. This paper establishes the next step
towards these goals by (i) generalizing further the notion of window functions, (ii) ex-
tending the formalization by rules, and (iii) formally capturing a core language of CQL.
    We are interested in a stream reasoning semantics which is idealized in the sense
that no information needs to be dropped but can be abstracted away. In particular, we
allow to distinguish notions of truth of a formula at (i) specific time points, (ii) some
time point in a window, or (iii) all time points in a window. Second, we idealize with
respect to implementations and do not consider processing time, delays or outages in
the semantics itself.


2   Streams

In this section, we introduce streams and generalize the window functions of [14].
2.1   Streaming Data
We build upon mutually disjoint sets of predicates P and constants C. The set A of
atoms is defined as {p(c1 , . . . , cn ) | p ∈ P, c1 , . . . , cn ∈ C}. If i, j ∈ N, we call the
set [i, j] = {k ∈ N | i ≤ k ≤ j} an interval. We divide the set P into two disjoint sub-
sets, namely the extensional predicates PE and the intensional predicates PI . The for-
mer is used for input streams and background data, while the latter serves for intermedi-
ate and output streams. Additionally, we assume basic arithmetic operations (+, −, ×, ÷)
and comparisons (=, 6=, <, >, ≤, ≥) to be predefined by designated predicates B ⊆ PE .
For convenience, these predicates may be written in infix notation.
    We now present the central notion of streams.

Definition 1 (Stream). Let T be an interval and υ : N → 2A an evaluation function
such that υ(t) = ∅ for all t ∈ N \ T . Then, the pair S = (T, υ) is called a stream, T is
called the timeline of S and the elements of T are called time points.

Let S = (T, υ) and S 0 = (T 0 , υ 0 ) be two streams. We say S 0 is a substream or window
of S 0 , denoted S 0 ⊆ S, if T ⊆ T 0 and υ 0 (t0 ) ⊆ υ(t0 ) for all t0 ∈ T 0 . Moreover, S 0 is a
proper substream of S, denoted S 0 ⊂ S, if S 0 ⊆ S and S 0 6= S. The size #S of S is
defined by Σt∈T |υ(t)|. The restriction S|T 0 of S to T 0 ⊆ T , is the stream (T 0 , υ|T 0 ),
where υ|T 0 restricts the domain of υ to T 0 , i.e., υ 0 (t) = υ(t) for all t ∈ T 0 , else υ 0 (t) = ∅.
A stream is called data stream iff it contains only extensional predicates.

Example 2. The input for Example 1 can be modelled as a data stream D = (T, υ)
where T = [0, 50], υ(36) = {tram(a1 , p1 )}, υ(40) = {tram(a2 , p2 )}, and υ(t) = ∅
for all t ∈ T \ {36, 40}. The granularity of T is minutes. The evaluation function υ
can be equally represented as {36 7→ {tram(a1 , p1 )}, 40 7→ {tram(a2 , p2 )}}.    



2.2   Windows
An essential aspect of stream reasoning is to restrict the considered data to so-called
windows, i.e., recent substreams, in order to limit the amount of data and forget outdated
information. Traditionally [2], these windows take a fixed size ranging backwards from
query time. This size is determined in different ways. A time-based (or sliding) window
contains all tuples of a fixed amount of time, e.g., the last five seconds. A tuple-based (or
count-based) window, on the other hand, takes a fixed number of the most recent tuples,
regardless of when they arrived. A more generic variant of the latter is the partition-
based window, where a tuple-based window is applied on substreams obtained by the
input stream. This way, the recent values for a specified list of attributes can be selected.
    In [14] generalized notions of windows are presented. Based on a reference time
point t, window functions can also collect tuples that arrive after t. A time-based (resp.
tuple-based) window is then associated with fixed parameters ` and u that select the
previous ` and the next u time units (resp. number of tuples) relative to t. For partition-
based windows, another parameter is needed to specify how the input stream is split
into substreams.
    In order not to fix these parameters, but to be able to control the valid range of win-
dows within rules, we take a more generic approach in the present work. In addition to
a stream and a time point, window functions take a vector of further window parame-
ters x, which may contain constants (e.g. integers), time variables and functions.
Definition 2 (Window function). A window function wι of type ι takes as input a
stream S = (T, υ), a time point t ∈ T , called the reference time point, and a vector of
window parameters x for type ι and returns a substream S 0 of S.
Currently, the most common types of windows in practice are time-, tuple-, and partition-
based windows. We associate them with three respective window functions wτ , w# ,
and wp . Their difference in the window parameters x and window building can be in-
tuitively summarized as follows:
 – Time-based: x = (`, u, d), where `, u ∈ N ∪ {∞} and d ∈ N. The window func-
   tion wτ (S, t, x) returns the substream of S that contains all tuples of the last ` time
   units and the next u time units relative to a pivot time point t0 derived from t and the
   step size d. We use ` = ∞ (resp. u = ∞) to take all previous (resp. later) tuples.
 – Tuple-based: x = (`, u), where `, u ∈ N. The function w# (S, t, x) selects a sub-
   stream of S with the shortest interval [t` , tu ] ⊆ T as timeline, where t` ≤ t ≤ tu ,
   such that ` tuples are in [t` , t] and u tuples are in [t + 1, tu ]. Exactly `, resp. u tuples
   are returned. In case of multiple options due to multiple tuples at time points t` ,
   resp. tu , only tuples from there are removed at random.
 – Partition-based: x = (idx, n) where idx and n are two total functions
                            idx : A → I ⊂ N and n : I → N × N.
    Here, I is a non-empty, finite (index) set of integers. The application wp (S, t, x)
    first splits the input stream S = (T, υ) into |I| substreams Si = (T, υi ) by tak-
    ing υi (t) = {a ∈ υ(t) | idx(a) = i}. Then, a tuple-based window w# is applied on
    each respective Si with parameters taken from n(i) = (`i , ui ). The output streams
    after w# are then merged to produce the resulting window.
Here, we gave a slight generalization of window functions as presented (more formally)
in [14], using the general parameter vector x. This will be more useful when we discuss
dynamic window applications below.
    Due to space reason, we present only the adaptation of time-based windows for-
mally. The same idea can be applied to other types of windows straightforwardly.
Definition 3 (Time-based window). Let S = (T, υ) be a stream, T = [tmin , tmax ]
and t ∈ T . Moreover, let x = (`, u, d) where `, u ∈ N ∪ {∞} and d ∈ N s.t. d ≤ ` + u.
The time-based window with range (`, u) and step size d of S at time t is defined by
                                 wτ (S, t, x) = (T 0 , υ|T 0 ),
where T 0 =[t` , tu ], t` = max{tmin , t0 −`} with t0 = b dt c·d, and tu = min{t0 +u, tmax }.
Example 3 (cont’d). To express the monitoring over the stream D of Example 2, one
can use a time-based window function wτ (D, t, (10, 0, 1)) ranging over the past 10
minutes and step size of 1 minute. The result of applying this function at t = 40 is
 wτ (D, 40, (10, 0, 1)) = ([30, 40], {36 7→ {tram(a1 , p1 )}, 40 7→ {tram(a2 , p2 )}}).
To get the last tram appearance, we can use a tuple-based window w# (D, t, (1, 0)). For
example: w# (D, 40, (1, 0)) = ([40, 40], {40 7→ {tram(a2 , p2 )}}).                   
3     Logical Framework

We now introduce a logical framework for reasoning over streams. In addition to [14],
the present work allows for rule-based programs. For their evaluation, we will make
use of interpretation streams which augment data streams by intensional predicates.
The semantics of programs is then defined by an entailment relation for formulas. The
underlying logical language includes various operators to deal with reference to time.


3.1   Window operators

In what follows, we present a semantics for reasoning over streams, where we make
use of window operators x      ι,ch that work on two streams. This allows us to distin-
guish between a given stream S1 and the currently considered window S2 . Therefore,
before the application wι (S, t, x) of a window function, a stream choice function ch
determines a stream S based on two streams S1 and S2 . We use two simple stream
choices ch i (S1 , S2 ) = Si , where i ∈ {1, 2}.

Definition 4 (Window operator). Let wι be a window function of type ι, let ch be a
stream choice function and let x be a vector of window parameters for type ι. Then, x
                                                                                     ι,ch
denotes a window operator.
For more convenience, we omit writing ch2 . That is, by the window operator of form xι,
we mean x  ι,ch 2 . Moreover, we use special syntax for typical parameters x for tuple-
based windows, and for time-based windows with step size d = 1. For both types, we
write only ` for x, if u = 0, and +u, if ` = 0. Consider the following examples:
                 10,0,1                    0,5,1                      1,0
           10
            τ = τ,ch 2              +5
                                      τ = τ,ch 2              1# = #,ch 2



Here, all operators extract tuples from the second stream. The symbol 10 τ abbreviates
the time-based window operator that takes all tuples of the last 10 time points and +5
                                                                                      τ
takes all tuples of the next 5 time points. On the other hand, 1# takes the latest tuple
which arrived until the reference time point. The relationship between window func-
tions (Def. 2) and window operators (Def. 4) will be made precise in Def. 7 below.


3.2   Formulas

Syntax. In addition to the window operators x     ι,ch , we make use of further means to
refer to or abstract from time. Similarly as in modal logic, we will use operators 2 and 3
to test whether a tuple (atom) or a formula holds all the time, respectively sometime in
a window. Moreover, we use an exact operator @ to refer to specific time points.

Definition 5 (Formulas). The set F of formulas is defined by the grammar

           α ::= a | ¬α | α ∧ α | α ∨ α | α → α | 3α | 2α | @t α | x
                                                                    ι,ch α            (1)

where a is any atom in A, t ∈ N and x
                                     ι,ch is a window operator.
Intuitively, given a stream S ? and a considered window S (which initially is S ? ), each
formula α will be evaluated based on a reference time point t within S. An application
of a window operator x     ι,ch creates a new window S , which depends on S and S as
                                                          0                      ?

specified by the stream choice. Within the current window S, 3α (resp. 2α) holds, if α
holds at some time point (resp. at all time points) in S. Relative to t, the formula α
holds if α is true exactly at t, and @t0 α holds if t0 is in the timeline of S and α is true
exactly at time point t0 . That is, the operator @ allows to jump to a specific time point
within the window.
Semantics. In addition to streams, we consider background knowledge in form of a
static data set, i.e., a set of atoms which does not change over time. From a semantic
perspective, the difference to streams is that static data is always available, regardless
of window applications.
Definition 6 (Structure). Let S = (T, υ) be a stream, W be a set of window functions
and B ⊆ A a set of facts. Then, we call M = hT, υ, W, Bi a structure, S the interpre-
tation stream and B the data set or background data of M .
We now define when a formula holds in a structure.
Definition 7 (Entailment). Let M = hT ? , υ ? , W, Bi be a structure, S ? = (T ? , υ ? )
and let S = (T, υ) be a substream of S ? . Moreover, let t ∈ T . The entailment rela-
tion between (M, S, t) and formulas is defined as follows. Let a ∈ A be an atom, and
let α, β ∈ F be formulas. Then,
        M, S, t    a          iff   a ∈ υ(t) or a ∈ B,
        M, S, t    ¬α         iff   M, S, t 1 α,
        M, S, t    α∧β        iff   M, S, t α and M, S, t β,
        M, S, t    α∨β        iff   M, S, t α or M, S, t β,
        M, S, t    α→β        iff   M, S, t 1 α or M, S, t β,
        M, S, t    3α         iff   M, S, t0 α for some t0 ∈ T,
        M, S, t    2α         iff   M, S, t0 α for all t0 ∈ T,
        M, S, t    @t0 α      iff   M, S, t0 α and t0 ∈ T,
        M, S, t    xι,ch α   iff   M, S 0 , t α where S 0 = wι (ch(S ? , S), t, x).

If M, S, t    α holds, we say that (M, S, t) entails α.
Example 4 (cont’d). Let D = (T, υ) be the data stream of Ex. 3 and S ? = (T ? , υ ? ) ⊇ D
be a stream such that T ? = T and
                                                                               
          ?     36 7→ {tram(a1 , p1 )}, 40 7→ {tram(a2 , p2 ), rel (a1 , p1 )},
         υ =                                                                      .
                43 7→ {exp(a2 , p3 )}, 44 7→ {exp(a1 , p3 )}
Let M = hT ? , υ ? , W, Bi where W = {wτ , w# , wp } and let B be the background data
from Example 1. We see that M, S ? , 43 +5     τ 3tram(a2 , p3 ) holds. Indeed, the win-
dow operator +5τ    selects the substream S 0
                                               = (T 0 , υ 0 ) where T 0 = [43, 48], and
                  υ 0 = 43 7→ {exp(a2 , p3 )}, 44 7→ {exp(a1 , p3 )} .
                        

Next, because exp(a1 , p3 ) ∈ υ 0 (43) and 43 ∈ T 0 , it holds that M, S 0, 43   3exp(a2 , p3 ).
Similarly, we have M, S ? , 40 wτ10 3tram(a1 , p1 ) holds.                                    
3.3   Programs
Based on the entailment relation, we are now going to define a rule-based semantics.
Definition 8 (Rule, Program). A program is a set of rules, i.e., expressions of the form
                         α ← β1 , . . . , βj , not βj+1 , . . . , not βn ,               (2)
where α, β1 , . . . , βn ∈ F are formulas and α contains only intensional predicates.
Let D be a data stream based on which we want to evaluate a program P . Moreover,
let I=(T, υ) be a stream such that D ⊆ I. If for all time points in T , all predicates that
occur in I but not in D are intensional, then we call I an interpretation stream for D
and a structure M = hT, υ, W, Bi an interpretation (for D). We say that M satisfies P
at time t, denoted by M, t |= P , if (M, I, t) entails each rule r ∈ P viewed as classical
implication β(r) → α, where β(r) = β1 ∧ . . . ∧ βj ∧ ¬βj+1 ∧ . . . ∧ ¬βn . In this case,
we say M is a model (of P for D at time t). We call M a minimal model, if no model
M 0 = hT 0, υ 0, W, Bi of P for D at time t exists such that (T 0, υ 0 ) ⊂ (T, υ). The reduct
of a program P w.r.t. M at time t is defined by P M,t = {r ∈ P | M, t |= β(r)}.
Definition 9 (Answer). Let P be a program and D be a data stream. The interpreta-
tion M = hT, υ, W, Bi for D is called an answer of P for D at time t if M is a minimal
model of the reduct P M,t .
Note that the notion of an answer is relative to fixed functions W and static background
data B.
    Towards more conciseness, we consider schematic programs with variables of two
sorts, namely constant variables and time variables. The semantics of these non-ground
programs is given by the answers of according groundings (obtained by replacing vari-
ables with constants from C, respectively time points from T , in all possible ways).
Example 5. The requests (s1 )–(s3 ) from Example 1 can be formulated by rules (3)–(5).
             rel (L, X) ← line(ID, L), 10
                                        τ 3tram(ID, X).                                   (3)

        @T exp(ID, Y ) ← idx,n
                          p     @T1 tram(ID, X),
                              line(ID, L), plan(L, X, Y, Z), T = T1 + Z.                  (4)

      gc(ID 1 , ID 2 , X) ←   @T exp(ID 1 , X), @T +5
                                                    τ 3exp(ID 2 , X), not old (ID 2 ).    (5)
Rule (3) uses the time-based window operator that checks whether a tram of line L
appeared within the last 10 minutes to conclude whether the line is reliable. Rule (4)
calculates when a tram is expected at later stops. To get the last appearances of ev-
ery individual tram instance from the input stream, we use the partition-based window
operator idx,n
           p    , where the two functions idx and n are defined by
      idx(tram(ai , X)) = i and idx(a, X) = 0 for all a ∈ A \ {tram(ai , X)};
      n(i) = (1, 0) for i > 0 and n(0) = (0, 0).
Finally, rule (5) exploits the result from rule (4) to identify good connections where the
targeted tram is not old and the expected waiting time is at most 5 minutes. This rule
uses a time-based window that looks forward 5 mins from the time when exp(ID 1 , X)
is concluded and the operator 3 to check the existence of an expected tram instance.
One can check that the structure M in Example 4 is an answer of P .                      
4   Capturing CQL

The Continuous Query Language (CQL) [2] is an SQL based language for maintain-
ing continuous queries over streaming data. Such queries are suitable for real-time and
reactive programs. To handle input streams, CQL extends SQL with three sets of oper-
ators:

  (i) Stream-to-relation (S2R) operators apply window functions to the input stream to
      create a mapping from execution times to bags of valid tuples (w.r.t. the window)
      without timestamps. This mapping is called a relation.
 (ii) Relation-to-relation (R2R) operators allow for modification of relations similarly
      as in relational algebra, respectively SQL.
(iii) Relation-to-stream (R2S) operators convert relations to streams by directly asso-
      ciating the timestamp of the execution with each tuple (RStream). The other op-
      erators IStream/DStream, which report inserted/deleted tuples, are derived from
      RStream.

Example 6. Let PLAN(ID,X,Y,Z), LINE(ID,L), and OLD(ID) be three relations
that correspond to the background data in Example 1. Let TRAM(ID,X,T1) be an
input stream of tram appearances detected at respective tram stops. Compared to the
input stream S in Example 1, the schema of TRAM carries an additional field for explicit
timestamps of the input tuples. The following CQL queries serves the requests (s1 )
and (s2 ) in Example 1:

 q1 (rel ) = SELECT L, X FROM TRAM [RANGE 5], LINE
             WHERE TRAM.ID = LINE.ID
q2 (exp) = SELECT TRAM.ID, PLAN.Y, T2
           FROM TRAM [PARTITION BY ID ROWS 1], LINE, PLAN
           WHERE TRAM.ID = LINE.ID AND LINE.L = PLAN.L AND
                  TRAM.X = PLAN.X AND T2 = TRAM.T1 + PLAN.Z                           


We now present an approach for capturing CQL using the presented logic-based frame-
work. In this paper, we consider a core fragment of CQL without nested queries:
− The FROM clause is a sequence of only input streams with windows, or background
data table names. In that sense, only the keyword AND is used for connecting the ele-
ments of the FROM clauses. Furthermore, renaming the above input sources using the
keyword AS is also left out.
− The WHERE clause may contain only plain arithmetic operators and comparisons.
Nested expressions can always be rewritten into multiple such plain operators. For in-
stance, X, ≤, ≥} and op ∈ {+, −, ×, ÷}. Without loss of
           Element f in FROM clause                             ∆FROM (f )
           S                                                    S(X)
           S [RANGE L]                                           τ 3S(X)
                                                                L
           S [RANGE L SLIDE D]                                  L,0,D
                                                                 τ     3S(X)
           S [RANGE UNBOUNDED]                                   τ 3S(X)
                                                                ∞
           S [NOW]                                              S(X)
           S [ROWS N]                                            # 3S(X)
                                                                N
           S [PARTITION BY X1,X2,...,Xk ROWS N]                 idx,n
                                                                 p     3S(X)

                         Table 1: Translation for FROM clauses


generality, we assume for case (a) that only = is used for cmp when X1 and X2 are
identical, and for case (b) that all attribute names are different. This assumption is
to avoid going into technical details about renaming of attribute names to deal with
special cases such as comparison of the form P1.X < P2.X, or expression of the
form P1.X = P2.X + P3.X.
− Post-processing functionalities such as GROUP BY, ORDER BY are omitted.
    To capture the considered CQL core fragment, we introduce three translation func-
tions to cover the three types of operators (i)-(iii) above, with the following intuition:
  (i) For S2R operators, ∆FROM takes input streams and background data table names
      appearing in the FROM clause. For streams, ∆FROM translates them to rule bod-
      ies where window operators are applied on the input streams. For background
      data, ∆FROM simply generates a corresponding predicate for the rule body.
 (ii) For R2R operators, ∆WHERE takes the arithmetic operators and comparisons in
      the WHERE clause to create corresponding built-in or comparison predicates.
(iii) For R2S operators, ∆SELECT creates a rule head with a single atom whose predi-
      cate name is the query name. Its arguments are from the attributes specified in the
      SELECT clause.
More formally, assume that S is the name of an input stream or a background data ta-
ble whose schema is given by a vector X of parameters. Let Fq = f1 , . . . , fk be the
list of elements of the FROM clause of a CQL query q. Then, we obtain the translation
of Fq by ∆FROM (q) = {∆FROM (fi ) | fi ∈ Fq } as in Table 1. (We assume that idx mim-
ics the partitioning based on attributes X1 , . . . , Xk of X to a finite index set I ⊂ N
and n(i) = (N, 0) for all i ∈ I.)
Example 7 (cont’d). We have the following FROM clause translations for Example 6:

      ∆FROM (q1 ) = {5τ 3tram(ID, X, T1 ), line(ID, L)},
      ∆FROM (q2 ) = {idx,n
                       p    3tram(ID, X, T1 ), line(ID, L), plan(ID, X, Y, Z)}. 

For the WHERE clause of a query q, let WHq = wh 1 , . . . , wh ` be the list of conditions
of the form (a) or (b) in the WHERE clause of a query q. The translation ∆WHERE applied
to wh i , i ≤ 1 ≤ ` is as follows:
 – ∆WHERE (P1.X = P2.X) = ∅.
 – ∆WHERE (P1.X1 cmp P2.X2) = cmp(X1 , X2 ), wherethe attribute names X1
   and X2 are not identical.
 – ∆WHERE (P1.X1 = P2.X2 op P3.X3) = op(X2 , X3 , X1 ).
The translation of WHq is ∆WHERE (q) = {∆WHERE (wh j ) | wh j ∈ WHq }.
Example 8 (cont’d). Consider the WHERE clauses of Example 6. Its translations are
given by ∆WHERE (q1 ) = ∅ and ∆WHERE (q2 ) = {T2 = T1 + Z}.                     

Finally, the translation ∆SELECT of a query named q is ∆SELECT (q) = q(Y ), where Y is
the list of attribute names appearing in the SELECT clause. For SELECT *, the list of
parameters Y is the collection of all parameters from ∆FROM (q) ∪ ∆WHERE (q). Putting
all together, given a CQL query q, the function ∆ translates it into a rule of the form
                    ∆(q) = ∆SELECT (q) ← ∆FROM (q) ∪ ∆WHERE (q).                      (6)
The translation of a set Q = {q1 , . . . , qm } of CQL queries is ∆(Q) = {∆(q) | q ∈ Q}.
Example 9 (cont’d). For the queries from Example 6, we have the SELECT transla-
tions ∆SELECT (q1 ) = rel (L, X) and ∆SELECT (q2 ) = exp(ID, Y, T2 ). Thus, we get
          ∆(q1 ) =     rel (L, X) ← 10τ 3tram(ID, X, T1 ), line(ID, L).
          ∆(q2 ) = exp(ID, Y, T2 ) ← idx,n
                                       p    3tram(ID, X, T1 ), line(ID, L),
                                     plan(ID, X, Y, Z), T2 = T1 + Z.
We observe that these rules are not identical to rules (3) and (4) from Example 5 because
CQL needs to explicitly carry the timestamps of input tuples to perform operators such
as @, 3 after the application of windows.                                               

    The following proposition states that this translation faithfully captures the consid-
ered core fragment of CQL.1
Proposition 1. Let Q = {q1 , . . . , qm } be a set of CQL queries and P = ∆(Q). There
is a 1-1 correspondence between the results of Q and the answers of P .
Discussion. We now give hints on how to extend this translation to a larger fragment
of CQL. The most important operation is nesting queries. Capturing them is expected
to succeed as usual, i.e., by generalizing mappings of SQL statements to stratified pro-
grams [15] where subqueries provide lower strata for evaluating the query at a higher
stratum. Moreover, the combination of window operators with the operators 2 and 3,
preceded by the negation operator not, can already capture subqueries that check for
the non-existence of a value reading within a range defined by the window. Orthogonal
to this is aggregation in CQL queries. For this, we need to extend programs with ag-
gregations, which helps not only for capturing aggregations, but also for capturing the
post-processing statement GROUP BY. Another statement, ORDER BY, needs further
auxiliary rules for encoding the order explicitly, because in the answers of programs,
the evaluation function υ maps time points to sets of atoms which impose no order.
Thus, while technically involved, generalizing our result to capture CQL without note-
worthy restrictions seems feasible and is subject of our ongoing work.
 1
     A proof sketch is available in the extended version at http://www.kr.tuwien.ac.at/
     research/projects/dhsr/pub/2014/bdef2014-ordring-ext.pdf.
5    Conclusion
Towards a logic-based framework for analyzing stream reasoning, we extended the
work in [14] with a generalized notion of window functions and a rule-based language.
We demonstrated the capability of this framework by capturing a core fragment of CQL.
    Next steps include deeper study regarding capturing CQL fully as well as stream
processing/reasoning systems, and a systematic comparison between these systems. In
relation to complex event processing, considering time intervals is an interesting re-
search issue. To improve practicality (as a tool for formal and experimental analysis)
one might develop an operational characterization of the framework. In a longer per-
spective, along the same lines as [17], we aim at a formalism for stream reasoning in
distributed settings across heterogeneous nodes that have potentially different logical
capabilities.

References
 1. Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Record 3(30) (2001)
    109–120
 2. Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations
    and query execution. VLDB J. 15(2) (2006) 121–142
 3. Valle, E.D., Ceri, S., van Harmelen, F., Fensel, D.: It’s a streaming world! reasoning upon
    rapidly changing information. IEEE Intelligent Systems 24 (2009) 83–89
 4. Phuoc, D.L., Dao-Tran, M., Parreira, J.X., Hauswirth, M.: A native and adaptive approach
    for unified processing of linked streams and linked data. In: ISWC (1). (2011) 370–388
 5. Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: C-SPARQL: a continuous
    query language for rdf data streams. Int. J. Semantic Computing 4(1) (2010) 3–25
 6. Do, T.M., Loke, S.W., Liu, F.: Answer set programming for stream reasoning. In: AI. (2011)
    104–109
 7. Gebser, M., Grote, T., Kaminski, R., Obermeier, P., Sabuncu, O., Schaub, T.: Stream reason-
    ing with answer set programming. preliminary report. In: KR. (2012) 613–617
 8. Zaniolo, C.: Logical foundations of continuous query languages for data streams. In: Data-
    log. (2012) 177–189
 9. Eiter, T., Ianni, G., Schindlauer, R., Tompits, H.: dlvhex: A prover for semantic-web reason-
    ing under the answer-set semantics. In: Web Intelligence. (2006) 1073–1074
10. Ren, Y., Pan, J.Z.: Optimising ontology stream reasoning with truth maintenance system. In:
    CIKM. (2011) 831–836
11. Gebser, M., Kaminski, R., Kaufmann, B., Ostrowski, M., Schaub, T., Thiele, S.: Engineering
    an incremental ASP solver. In: ICLP. (2008) 190–205
12. Dindar, N., Tatbul, N., Miller, R.J., Haas, L.M., Botan, I.: Modeling the execution semantics
    of stream processing engines with secret. VLDB J. 22(4) (2013) 421–446
13. Phuoc, D.L., Dao-Tran, M., Pham, M.D., Boncz, P., Eiter, T., Fink, M.: Linked stream data
    processing engines: Facts and figures. In: ISWC - ET. (2012)
14. Beck, H., Dao-Tran, M., Eiter, T., Fink, M.: Towards Ideal Semantics for Analyzing Stream
    Reasoning. In: ReactKnow. (2014)
15. Ullman, J.D.: Principles of Database and Knowledge-Base Systems, Volume I. Computer
    Science Press (1988)
16. Anicic, D., Rudolph, S., Fodor, P., Stojanovic., N.: Stream reasoning and complex event
    processing in ETALIS. Semantic Web Journal (2012)
17. Brewka, G.: Towards reactive multi-context systems. In: LPNMR. (2013) 1–10