Stream-temporal Querying with Ontologies Ralf Möller, Christian Neuenstadt, and Özgür L. Özçep Institute of Information Systems (Ifis) University of Lübeck Lübeck, Germany {moeller,neuenstadt,oezcep}@ifis.uni-luebeck.de Abstract. Recent years have seen theoretical and practical efforts on temporalizing and streamifying ontology-based data access (OBDA). This paper contributes to the practical efforts with a description/evaluation of a prototype implementation for the stream-temporal query language framework STARQL. STARQL serves the needs for industrially moti- vated scenarios, providing the same interface for querying historical data (reactive diagnostics) and for querying streamed data (continuous moni- toring, predictive analytics). We show how to transform STARQL queries w.r.t. mappings into standard SQL queries, the difference between histor- ical and continuous querying relying only in the use of a static window table vs. an incrementally updated window table. Experiments with a STARQL prototype engine using the PostgreSQL DBMS show the im- plementability and feasibility of our approach. Keywords: stream reasoning, OBDA, monitoring, temporal reasoning 1 Introduction This paper contributes results to recent efforts for adapting the paradigm of ontology-based data access (OBDA) to scenarios with streaming data [12,6,22] as well as temporal data [5,4]. It extends our previous work [20,21,19]—started in the context of the IP7 EU project Optique and resulting in the query lan- guage framework STARQL (Streaming and Temporal ontology Access with a Reasoning-based Query Language)—with a proof-of-concept implementation that is based on PostgreSQL. STARQL serves the needs for industrially motivated scenarios, providing the same interface for querying historical data—as needed for reactive diagnostics—and for querying streamed data—as needed for contin- uous monitoring and predictive analytics in real-time scenarios. Considering streams, the main challenge is that data cannot be processed as a whole. The simple but fundamental idea is to apply a (small) sliding window which is updated as new elements from the stream arrive at the query-answering system (see, e.g., [3]). The idea of previous approaches adapting OBDA to streams [12,6,22] is that answering continuous ontology-level queries on streams can be reduced to answering ontology-level queries on dynamically updated fi- nite window contents, which are treated as single ABoxes. This approach can lead to unintended inconsistencies, as exemplified by industrial use cases such as that of Siemens, one of the industrial stakeholders in the Optique project. For example, multiple measurement values for a sensor collected at different time points lead to inconsistencies since the value association should be functional. In STARQL, the idea of processing streams with a window operator is pushed further by defining a finite sequence of consistent ABoxes for each window. Considering temporal reasoning for reactive diagnostics as needed for the Optique use case provided by Siemens [14], we found that a window based ap- proach leads to elegant solutions as well. A window is used to focus on some subset of temporal data in a temporal query. Now, over time, different foci are relevant for reactive diagnosis. Thus, foci changes can be modelled by a window virtually “sliding” over temporal data, albeit the case that sliding is not done in realtime. Thus, STARQL is defined such that it can be used equally for temporal and stream reasoning. The semantics of STARQL does not distinguish between the realtime and the historical querying scenario. The ABox sequencing strategy for windows required to avoid inconsistencies, as argued above, makes rewriting and, more importantly, unfolding of STARQL queries a challenging task. In particular, one may ask whether it is possible to rewrite and unfold one STARQL query into a single backend query formulated in the query language provided by the backend systems. We show that (Post- gre)SQL transformations are indeed possible and describe them in the paper for the special case of one stream with slide parameter identical to pulse parameter and a specific sequencing strategy (for the details see the following section). 2 The STARQL Framework We recap the syntax and the semantics of STARQL with a simple example. For a complete formal treatment we refer the reader to [20,21]. We assume familiarity with the description logic DL-Lite [7]. For illustration purposes, our running example is a measurement scenario in which there is a (possibly virtual) stream SMsmt of ABox assertions. A stream of ABox assertions is an infinite set of timestamped ABox assertions of the form axhti. The timestamps stem from a flow of time (T, ≤) where T may even be a ≤5s dense set and where ≤ is a linear order. The initial part of SMsmt , called SMsmt here, contains timestamped ABox assertions giving the value of a temperature sensor s0 at 6 time points starting with 0s. ≤5s SMsmt = {val (s0 , 90◦ )h0si, val (s0 , 93◦ )h1si, val (s0 , 94◦ )h2si val (s0 , 92◦ )h3si, val (s0 , 93◦ )h4si, val (s0 , 95◦ )h5si} Assume further, that a static ABox contains knowledge on sensors telling, e.g., which sensor is of which type. In particular, let BurnerT ipT empSens(s0 ) be in the static ABox. Moreover, let there be a pure DL-Lite TBox with additional information such as BurnerT ipT empSens v T empSens saying that all burner tip temperature sensors are temperature sensors. The Siemens engineer has the following information need: Starting with time point 00:00 on 1.1.2005, tell me every second those temperature sensors whose value grew monotonically in the last 2 seconds. A possible STARQL representa- tion of the information is illustrated in Listing 1. 1 PREFIX : < http :// www . optique - project . eu / siemens > 2 CREATE STREAM S_out AS 3 CONSTRUCT GRAPH NOW { ? s rdf : type MonInc } 4 FROM STREAM S_Msmt [ NOW -2 s , NOW ] - >"1 S "^^ xsd : duration 5 WITH START = "2005 -01 -01 T01 :00:00 CET "^^ xsd : dateTime , 6 END = "2005 -01 -01 T02 :00:00 CET "^^ xsd : dateTime 7 STATIC ABOX < http :// www . optique - project . eu / siemens / ABoxstatic > , 8 TBOX < http :// www . optique - project . eu / siemens / TBox > 9 USING PULSE WITH 10 START = "2005 -01 -01 T00 :00:00 CET "^^ xsd : dateTime , 11 FREQUENCY = "1 S "^^ xsd : duration 12 WHERE { ? s rdf : type : TempSens } 13 SEQUENCE BY StdSeq AS seq 14 HAVING FORALL ? i < ? j IN seq ,? x ,? y : 15 IF ( GRAPH ? i { ? s : val ? x } AND GRAPH ? j { ? s : val ? y }) THEN ? x <= ? y Listing 1: Example query in STARQL After the create expressions for the stream and the output frequency the queries’ main contents are captured by the CONSTRUCT expressions. The construct expression describes the output format of the stream, using the named-graph notation of SPARQL for fixing a basic graph pattern (BGP) and attaching a time expression, here NOW, for the evolving time. The actual result in the example (in DL notation) is a stream of ABox assertions of the form M onInc(s0 )hti. ≤5s Sout = {M onInc(s0 )h0si, M onInc(s0 )h1si, M onInc(s0 )h2si, M onInc(s0 )h5si} The WHERE clause binds variables w.r.t. the non-streaming sources (ABox, TBox) mentioned in the FROM clause by using (unions) of BGPs. We assume an underlying DL-Lite logic for the static ABox, the TBox and the BGP (considered as unions of conjunctive queries UCQs) which allows for concrete domain values, e.g., DL-LiteA [7]. In this example, instantiations of the sensors ?s are fixed w.r.t. a static ABox and a TBox. The semantics of the UCQs embedded into the WHERE and the HAVING clause ist the certain answer semantics [7]. The heart of the STARQL queries is the window operator in combination with sequencing. The operator [NOW-2s, NOW]->"1S"^^xsd:duration used in Listing 1 describes a sliding window, which collects the timestamped ABox as- sertions in the last two seconds and slides 1s forward in time. Note that the START and END specifications over the stream: These make sense only for tem- poral queries over streamed historical data (see Sect. 3.1). Every temporal ABox produced by the window operator is converted to a sequence of (pure) ABoxes. The sequence strategy determines how the times- tamped assertions are sequenced into ABoxes. The sequencing method used in the example is standard sequencing (StdSeq): assertions with the same times- tamp come into the same Abox. So, in the example the resulting sequence of ABoxes at time point 5s is trivial as there are no more than two ABox assertions with the same timestamp: {val (s0 , 92◦ )}h0i, {val (s0 , 93◦ )}h1i, {val (s0 , 95◦ )}h2i}. Now, at every time point, one has a sequence of ABoxes on which temporal (state-based) reasoning can be applied. This is realized in STARQL by a sorted first-order logic template in which state stamped UCQs conditions are embed- ded. We use here again the GRAPH notation from SPARQL. In our example the HAVING clause expresses a monotonicity condition stating that for all values ?x that are values of sensor ?s w.r.t the ith ABox (subgraph) and for all values ?y that are values of the same sensor ?s w.r.t. the j t h ABox (subgraph), it must be the case that ?x is less than or equal to ?y. The grammar for the HAVING clause (not shown here) exploits a safety mechanism. Without it a HAVING clause such as ?y > 3, with free concrete domain variable ?y over the reals, would be allowed: the set of bindings for ?y would be infinite. Even more, the safety conditions guarantees that the evaluation of the HAVING clause on the ABox sequence depends only on the active domain not the whole domain, i.e., HAVING clauses are domain independent (d.i.) (see [1] for a definition of domain independence). This in turn guarantees that the HAVING clause can be smoothly transformed into queries of d.i. languages such as SQL or CQL [3]. For the details we refer again to [21]. 3 OBDA Transformation of STARQL As the HAVING clause language is d.i. (see [21]), STARQL can be used as ontol- ogy query language in the OBDA paradigm: STARQL queries can be transformed into queries over data sources that are quipped with d.i. query languages. As (backend) data source candidates we consider any DBMS providing a declarative language such as SQL. This is not a limitation in comparison with those approaches (in particular our own [21,19]) that allow relational data stream management systems (DSMS) as data sources. In fact, the STARQL prototype in the Optique platform uses a stream-extension of ADP [24] which provides a CQL-like [3] query language. For the transformation of this paper we by-pass the additional abstraction of DSMS by reconstructing the implementation of the relational window operators on top of incrementally updated window tables: The operators are the same as for ordinary RDBMS, but the tables are dynamic. Because our transformation does not rely on a window operator on the back- end side but constructs the window contents within a window table, two different implementations become possible: 1. Preprocesses the data in the backend in or- der to materialize the window table for the whole time interval fixed within the STARQL query. The abstract computation model for this implementation is combined rewriting: The given query is rewritten w.r.t. the TBox and the rewrit- ten query is posed to a pre-processed ABox resulting from the original ABox by (partially) materializing TBox axioms (see [18,17,15]). 2. Generate the window contents on the fly—during query run time. The abstract computation model is that of classical OBDA, in which the query is rewritten w.r.t. the TBox, unfolded w.r.t. the mappings and issued to original data—without any preprocessing of the data source. Our experiments below show that the second approach outper- forms the first one. But the former approach is useful as a caching means for Query& SQL&Query& (String)& and&Views& STARQL& Normaliza Grammar& =on&Rules& Query&& Query& Parser& (Data& Normalizer& (Normal& Structure)& Form)& SQL& Transforma=on& Rules& Op=mized& Datalog& Datalog&Transformer& Op=mizer& Datalog& SQL&Transformer& Program& Program& Datalog& Op=miza Mapping Transforma=on& =on&Rules& Rules& Rules& Fig. 1: Processing pipeline for transforming a STARQL query into an SQL query. scenarios with multiple query processing where (many of) the queries use the same windows on the same streams. As mentioned before, the aim of this paper is to give a proof-of-concept im- plementation for a stream-temporal engine. In particular, regarding the required OBDA transformations from the ontology layer to the backend sources, the en- gine is applicable to the following special case: There is only one stream, the slide is identical to the pulse, and the sequencing strategy is standard sequencing. 3.1 Temporal Reasoning by Mapping STARQL to SQL For reactive diagnosis as investigated in Optique with the Siemens power plant use case, specific patterns, aka events, are to be found in previously recorded data. Data represents 1.) measurement values of sensors and 2.) turbines with assembly groups, mounted sensors and their properties. Reactive diagnosis deals with analyzing data in order to understand, e.g., reasons for engine shutdown. If we consider again the information need discussed in Section 2, it becomes clear that in the Siemens use case, patterns to be detected are defined w.r.t. certain time windows in which relevant events take place (e.g., monotonically increasing in a window of 10 minutes, say). Thus, in this context, temporal queries are formulated using time windows in the same spirit as time windows are used for continuous queries in stream processing systems. The same queries can be registered as a continuous query or a historical query. For the latter, it should obviously be possible to specify a period of interest, i.e., a starting point and an end point for finding answers. We consider the example query of Listing 1 for demonstrating the transfor- mation of historical queries. Figure 1 shows how historical STARQL queries are processed internally. The prototype is implemented in Prolog, and the rule sets used as inputs in Fig. 1 are Prolog (DCG) rules. A query is parsed in order to produce parse tree data structures, which then are normalized. Normaliza- tion converts FORALL expressions in the HAVING clause into NOT EXISTS NOT expressions by pushing negation inside. The following listing shows the normalized HAVING part of our example query. 1 HAVING NOT EXISTS ? i < ? j IN seq ?x , ? y : 2 ( GRAPH ? i { ? s val ? x } AND GRAPH ? j { ? s val ? y } AND ? x > ? y ) ; The normalized query is translated into a datalog program (see Figure 1) with fresh predicate names being generated automatically. The WHERE expression is rewritten and unfolded due to the axioms in the TBox given in the query and the mapping rules, respectively. Here we assume that sensors are created by an SQL view defined below. In the body of the first rule, datalog code for the WHERE clause is inserted (rule q0, see Listing 2)). For every language element found in the parse tree (e.g., NOT EXISTS...), we generate a new datalog rule. For every {?x rdf:type C} ({?x R ?y})) mentioned in the WHERE or HAVING clause we insert C(WID, X) (R(WID, X, Y)) in the datalog program. WID is an implicit parameter representing a so-called window ID. Correspond- ingly, for every GRAPH i {?x rdf:type C} (GRAPH i {?x R ?y}) mentioned in the HAVING clause we insert C(WID, X, i) (R(WID, X, Y, i)) with i repre- senting the specific ABox in each window sequence. The datalog clauses q1 to q4 in Listing 2 are generated automatically from the HAVING clause of the query above. 1 q0 ( WID , S) :- sensor ( WID , S ) . 2 q1 ( WID , S) :- q0 ( WID , S ) , not q2 ( WID , S ) . 3 q2 ( WID , S) :- seq ( WID , I ) , seq ( WID , J ) , q3 ( WID , I , J , S ) , I < J . 4 q3 ( WID , I, J, S ) : - q4 ( WID , I , J , X , Y , S ) . 5 q4 ( WID , I, J, X , Y , S ) : - val ( WID , I , X , S ) , val ( WID , J , Y , S ) , X > Y . Listing 2: Generated datalog rules The mapping specifications of the stream s_msmt, the relations val and sensor are predefined using datalog clauses and added to the datalog rules from a file. The additional mapping rules are shown in Listing 3. 1 val ( WID , Index , Sensor , Value ) : - 2 window ( WID , Index , Timestamp ) , 3 measurement ( Timestamp , Sensor , Value ) . 4 sensor ( WID , Sensor ) : - val ( WID , _Index , Sensor , _Value ) . 5 seq ( WID , Index ) : - window ( WID , Index , _Timestamp ) . Listing 3: Mapping specifications using datalog clauses During clause generation, SQL transformation rules are generated (see Fig- ure 1). Transformation rules represent the name of the SQL relation, the number of arguments as well as the type of each parameter. SQL transformation are stat- ically specified also for the relations val, window, and seq. For the CREATE STREAM and CONSTRUCT expression in the query, a clause 1 s_out (T , S ) : - q1 ( WID , S ) , window_range ( WID , T ) . is added to the datalog program. The datalog program generated by the module Datalog Transformer (see Fig- ure 1) is then optimized. Optimization eliminates wrapper clauses. An according optimization of Listing 2 is shown below. 1 q1 ( WID , S ) : - sensor ( WID , S ) , not q2 ( WID , S ) . 2 q3 ( WID , I , J , S ) : - seq ( WID , I ) , seq ( WID , J ) , val ( WID , I , X , S ) , 3 val ( WID , J , Y , S ) , X > Y , I < J . Here, the datalog rules q0 and q1 have been reformulated to q1 and q2 to q4 from Listing 2 have been reformulated by eliminating wrapper clauses to q3. Moreover, body atoms are removed if bindings are already generated by other atoms. In our example we see that a seq clause is already a subclause of the val clause (Listing 3). Using semantic query optimization [8,9], both seq atoms can be eliminated in q3. As a consequence also the clause for seq is eliminated. 1 s_out ( Right , S ) : - q1 ( WID , S ) , window_range ( WID , _Left , _Right ) . 2 q1 ( WID , S ) : - sensor ( WID , S ) , not q3 ( WID , S ) . 3 q3 ( WID , S ) : - val ( WID , I , X , S ) , val ( WID , J , Y , S ) , I < J , X > Y . 4 sensor ( WID , Sensor ) : - val ( WID , _Index , Sensor , _Value ) . 5 val ( WID , Index , Sensor , Value ) : - 6 window ( WID , Index , Timestamp ) , 7 measurement ( Timestamp , Sensor , Value ) . Listing 4: Optimized datalog rules The datalog program is non-recursive and safe. So it can be translated to SQL as shown in Listing 5. The column names for relations are given by the SQL Transformation Rules generated by the Datalog Transformer and by mapping rules given as input to the processing pipeline (see Figure 1). The translation to SQL relies on tables window and window_range (List- ing 6). These are based on the stream specification(s) in the query. Given start ($start$) and end ($end$) points for accessing temporal data as well as window size ($window_size$) and window slide ($slide$), a representation for all possi- ble windows (with specific time points for the window range) together with all in- dexes for states that are built for the window by the specified sequencing method specified are computed. For standard sequencing, window and window_range are generated using PostgreSQL functions such as generate_series. 3.2 Transformations for Continuous STARQL Querying The transformation above applies to temporal queries on historical data stored in a RDBMS. So, the window table generation as part of the transformation above is a one-step generation. In order to cope with streaming data, the trans- formation process has slightly to be adapted. The window table now is assumed to be incrementally updated by some function. Apart from that, the same trans- formation as for temporal queries can be applied to realize continuous querying with STARQL. In fact, the second implementation of the transformation that we tested does not materialize the whole window table, and so it can be directly adapted to DBs with dynamically updated entries. Similar ideas for continuos processing are used for TelegraphCQ [10], an DSMS built on top of PostgreSQL. 1 CREATE VIEW val AS 2 SELECT rel1 . WID , rel2 . SID , rel2 . VALUE , rel1 . INDEX 3 FROM window rel1 , measurement rel2 4 WHERE rel2 . timestamp = rel1 . timestamp ; 5 6 CREATE VIEW sensor AS 7 SELECT rel1 . WID , rel1 . SID 8 FROM val rel1 ; 9 10 CREATE VIEW q3 AS 11 SELECT rel1 . WID , rel1 . SID AS S 12 FROM HASVAL rel1 , HASVAL rel2 13 WHERE rel2 . WID = rel1 . WID AND rel2 . SID = rel1 . SID AND 14 rel1 . INDEX < rel2 . INDEX AND 15 rel1 . VALUE > rel2 . VALUE ; 16 17 CREATE VIEW q1 AS 18 SELECT rel1 . WID , rel1 . SID AS S 19 FROM sensor rel1 20 WHERE NOT EXISTS ( SELECT * 21 FROM q3 rel2 22 WHERE rel2 . WID = rel1 . WID AND rel2 . S = rel1 . SID ) ; 23 24 CREATE VIEW s_out AS 25 SELECT rel2 . right , rel1 . S AS SID 26 FROM q1 rel1 , window_range rel2 27 WHERE rel1 . WID = rel2 . WID ; Listing 5: SQL transformation result 1 CREATE TABLE window_range AS 2 SELECT row_number () OVER ( ORDER BY x . timestamp ) - 1 AS wid , 3 x . timestamp as left 4 x . timestamp + $window_size$ as right 5 FROM ( SELECT g en er a te _s er i es ( $start$ , $end$ , $slide$ ) AS timestamp ) x ; 6 7 CREATE VIEW wid AS 8 SELECT DISTINCT r . wid , mp . timestamp 9 FROM measurement mp , window_range r 10 WHERE mp . timestamp BETWEEN r . left AND r . right ; 11 12 CREATE TABLE window AS 13 SELECT wid , rank () OVER ( PARTITION BY wid ORDER BY timestamp ASC ) as ind , timestamp 14 FROM wid ; Listing 6: Window (range) tables 4 Evaluation The system is evaluated along two example STARQL queries for reactive diagno- sis in the Siemens use case. They representatives for queries expected to demand processing times that are quadratic (Query1) and linear (Query2). The engine transforms the queries w.r.t. predefined mappings into PostgreSQL queries. The datasets that we use and describe in the following are part of the Siemens use case [23] in Optique. The original data processed/produced by Siemens ap- pliances are sensor measurements, event data, operation logs, and other data stored in tables. These data are confidential, so Siemens provided a small public Dataset Total Measured Timespan Sensors Measurements Total Data Size Values per Day/Sensor Ds1 82080 3 Days 19 1440 5 MB Ds2 210,000 1506 Days 3 46.5 10 MB Ds3 515,845,000 1824 Days 204 1386 23,000 MB Table 1: The three used data sets dataset and two larger anonymized datasets for use inside Optique. The public dataset (approximately 100 MB) has a simplified structure. For the evaluation we used three datasets: Ds1, Ds2 and Ds3. Ds1 contains public data and has a size of approximately 5MB. Ds2 contains anonymous data and has a size of approximately 10 MB, and Ds3 contains anonymous data with size 23 GB. The (simplified) schema of the normalized tables is as follows: CREATE TABLE measurement (timestamp,sensor,value); CREATE TABLE sensor (id,assemblypart,name) Measurements are represented with a table measurement and consist of a timestamp, a reference to the sensor, and a value. For our evaluation we are using one dataset of about 82,000 entries with a timestamp ranging over 3 days (Ds1), another dataset with about 210,000 entries with a timestamp ranging over 5 years (Ds2), and a dataset (Ds3) with more than 500 million entries over 5 years. All datasets contain data referring to a number of sensors. The datasets differ in the total number of recorded values and also in the number of measured values per timeframe. In Ds1 and Ds3 a value is measured every minute. In Ds2 a value is measured only every 30 minute in average. So we expect the calculation of a single time window for Ds1 and Ds3 to be more difficult compared to the calculation for Ds2. Query1 (Listing 7) builds, within each window, a sequence of all sensor values in the last 24 hours and checks whether one sensor increased monotonically. This query is expected to run quadratically slower as window size increases, due to the comparisons of all value pairs (?x, ?y) for all pairs of states (i,j). Query2 (Listing 8) outputs, every minute, sensors that show a value higher than 90. This query is expected to be faster because of its simple window content with at most one timestamp. Both queries can be transformed to PostgreSQL. We show only the transformation for Query1 in Listing 9. The tests were run in a VM on a system with an i7 2.8 GHz CPU and 16 GB of ram. Mean values of several test runs with cold cache are shown in the following tables. For our tests we used two approaches corresponding to combined rewriting and classical rewriting, respectively (cf. Sect. 3), and for each of these Query1 and Query2. In the first approach we pre-calculated all time windows in one large table, where every window has a window id, evaluated both queries once and a second time with additional window index structures added to the table. 1 CREATE STREAM S_out1 AS 2 SELECT { ? sensor rdf : type : RecentMonInc } < NOW > 3 FROM b u r n e r _ r e g u l a t o r [ NOW - 24 hours , NOW ] - > 24 hours 4 SEQUENCE BY StdSeq AS seq 5 HAVING FORALL i , j IN seq , ?x ,? y 6 IF {? sensor : hasVal ? x } AND { : Regulator : hasVal ? y } AND i < j 7 THEN ? x <= ? y ELSE TRUE Listing 7: STARQL query Query1 1 CREATE STREAM S_out2 AS 2 SELECT { ? sens rdf : type : tooHigh } < NOW > 3 FROM burner_3 [ NOW , NOW ] - > 1 minute 4 SEQUENCE BY StdSeq AS seq 5 HAVING FORALL i IN seq , ? x IF { ? sens : hasVal ? x } THEN ? x > 90 Listing 8: STARQL query Query2 1 CREATE OR REPLACE VIEW window_range AS 2 SELECT row_number () OVER ( ORDER BY x . timestamp ) - 1 AS wid , 3 x . timestamp as left , x . timestamp + ’1 hour ’:: interval as right 4 FROM ( SELECT ge ne r at e_ se r ie s ( MIN ( mp . timestamp ) , 5 MAX ( mp . timestamp ) , ’1 hour ’:: interval ) AS timestamp FROM measurement mp ) x ; 6 7 CREATE OR REPLACE VIEW wid AS 8 SELECT distinct wid , timestamp 9 FROM measurement mp , window_range w 10 WHERE mp . timestamp >= w . left and mp . timestamp < w . right ; 11 12 CREATE VIEW win AS 13 SELECT wid , rank () OVER ( PARTITION BY wid ORDER BY timestamp ASC ) as ind , 14 timestamp FROM wid ; 15 16 CREATE VIEW val AS 17 SELECT DISTINCT rel1 . WID , rel2 . SID , rel2 . VALUE , rel1 . ind 18 FROM win rel1 , measurement rel2 19 WHERE rel2 . timestamp = rel1 . timestamp 20 ORDER BY wid , ind ; 21 22 CREATE VIEW sensors AS SELECT rel1 . WID , rel1 . SID FROM val rel1 ; 23 24 CREAT VIEW q3 AS 25 SELECT rel1 . WID , rel1 . SID AS S 26 FROM val rel1 , val rel2 27 WHERE rel2 . WID = rel1 . WID AND rel2 . SID = rel1 . SID AND 28 rel1 . ind < rel2 . ind AND 29 rel1 . VALUE > rel2 . VALUE ; 30 31 CREATE VIEW q1 AS 32 SELECT rel1 . WID , rel1 . SID AS S 33 FROM sensors rel1 34 WHERE NOT EXISTS ( SELECT * 35 FROM q3 rel2 36 WHERE rel2 . WID = rel1 . WID AND rel2 . S = rel1 . SID ) ; 37 38 CREATE VIEW s_out AS 39 SELECT rel2 . right , rel1 . S AS SID 40 FROM q1 rel1 , window_range rel2 41 WHERE rel1 . WID = rel2 . WID ; 42 43 SELECT DISTINCT s . right as timestamp , s . SID as Subject , ’ rdf : type ’ as Predicate , ’: recentMonInc ’ from s_out s ; Listing 9: Query1 in PostgreSQL Query1 Query2 1 hour window 1 day window 1 minute window precalc no index indexed precalc no index indexed precalc No Index Indexed Ds1 1.5s 2s 1s 2s 30s 26s 2s 2s 2s Ds2 5s 4s 2.8s 5s 17s 8s 20s 8s 7s Ds3 N/A N/A N/A N/A N/A N/A N/A N/A N/A Table 2: Times for precalculated window tables Query1 Query2 1 hour window 1 day window 1 minute window Ds1 2s 45s 12s Ds2 40s 26s 1500s Ds3 3942s 92637s 7123s Table 3: Query times for dynamic stream generated windows Results for the first approach are shown in Table 2. For every query you see a column with times in seconds for the non-indexed and indexed table. The non- indexed times consist of generating the table and evaluating the query. In the indexed version we added a B-tree index. The precalculation column shows the additional time for generating the window table, which has to be added in every case to the next two evaluation columns for non-indexed and indexed data. There are different trade offs for both queries. For Query1 the total evaluation time increases as the window size increases, the precalculation time stays the same. The evaluation time for Ds1 increases faster as it has more measured values per window, compared to Ds2, which only has about one measured value per 30 minutes. Comparing the time for indexed and non-indexed data, one sees that with progressing time more single windows are used per query. On Ds1 we use only three windows and can decrease time from 30 to 26 seconds. On Ds2 we have about 1800 windows and can decrease the time from 17 to 8 seconds, which is more than 50 percent. Query2 produces a lot of small windows, so the evaluation time for each window is very short. The precalculation time for Ds2 increases a lot as we have about 2 million potential single windows in the window table. With a timeframe of only three days, the precalculation time for Ds1 stays small. The index has nearly no influence for Query2 as each window has only one tuple entry and all windows are evaluated once. Ds3 could not be evaluated with the precalculation step, as it requires more than 50 GB additional memory for the window table. Therefore, we implemented a second approach by additional pl/pgsql functions. The idea was to generate each window dynamically, evaluate it, and delete the memory afterwards. Results for the second approach are shown in Table 3. As each window is generated dynamically, there is no precalculation. On the other hand, no indexes can be added to a materialized table. The main disadvantage is that windows without values can not be filtered out in advance. As there are potentially 2 million windows for Query2 on Ds2, the system tries also to generate the empty windows, which increases the time a lot. Nevertheless, the problem of additional required space is solved and also the complete 20GB of Ds3 can be evaluated. 5 Related Work Much of the relevant work on stream processing has been done in the context of DSMS [3,10,13,16]. First steps towards streamified OBDA are stream extensions of SPARQL with a window operator having a multi-bag semantics where timestamps are dropped [12,6,22]. This does not interfere with the potential inconsistency of functional constraints on sensor values mentioned in the introduction, as these approaches handle timestamps by reification, for example, talking about mea- surements. Reification may lead to bulky representations of simple facts and hin- ders expressing simple functionality constraints (as mentioned above) in OWL. The temporal OBDA approach of [5] uses an LTL based language with em- bedded CQs and not a sorted FOL language. For engineering applications with information needs as in the monotonicity example LTL is not sufficient, as it does not provide exists quantifier on top of the embedded CQs. Though not directly related to OBDA, other relevant work stems from the field of complex event processing. For example, EP-SPARQL/ETALIS [2] uses also a sequencing constructor; and T-REX with the event specification language TESLA [11] uses an FOL language for identifying patterns. 6 Conclusion and Outlook The main objective in designing a query language that is intended to be used in industry is to find the right balance between expressibility and feasibility. OBDA goes for weak expressibility and high feasibility by choosing rewriting and unfolding as methodology for query answering. But even in OBDA, feasibility is not a feature one gets for free; rather it has to be achieved with optimizations. So, for engines that are implemented according to the OBDA paradigm one has to show that such transformations are theoretically possible and feasible. In this paper we argued that STARQL provides an adequate solution for streamified and temporalized OBDA scenarios as that of Siemens because: 1. It offers a semantics that allows a unified treatment of querying temporal and streaming data. 2. It combines high expressivity with safeness to guarantee a smooth transformation into standard domain independent backend queries such as SQL. 3. It can be implemented in an engine that implements the transfor- mations s.t. acceptable query execution times are achievable, if run with the optimization mentioned here. Future work contains, amongst others, the following: 1. extensive (scalability) tests with known benchmarks for stream processing, 2. generalization of the transformation to multiple streams where the slides parameters are not equal to the pulse parameter and where non-standard-sequencing strategies are used, and 3. extensive comparison with other approaches, in particular CEP approaches. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Anicic, D., Rudolph, S., Fodor, P., Stojanovic, N.: Stream reasoning and complex event processing in ETALIS. Semantic Web 3(4), 397–407 (2012) 3. Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. The VLDB Journal 15, 121–142 (2006) 4. Artale, A., Kontchakov, R., Wolter, F., Zakharyaschev, M.: Temporal description logic for ontology-based data access. In: Proceedings of the Twenty-Third Interna- tional Joint Conference on Artificial Intelligence. pp. 711–717. IJCAI’13 (2013) 5. Borgwardt, S., Lippmann, M., Thost, V.: Temporal query answering in the descrip- tion logic DL-Lite. In: Fontaine, P., Ringeissen, C., Schmidt, R.A. (eds.) Frontiers of Combining Systems. LNCS, vol. 8152, pp. 165–180. Springer (2013) 6. Calbimonte, J.P., Jeung, H., Corcho, O., Aberer, K.: Enabling query technologies for the semantic sensor web. Int. J. Semant. Web Inf. Syst. 8(1), 43–63 (Jan 2012) 7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rodríguez- Muro, M., Rosati, R.: Ontologies and databases: The DL-Lite approach. In: Tes- saris, S., Franconi, E. (eds.) Semantic Technologies for Informations Systems (RW 2009), LNCS, vol. 5689, pp. 255–356. Springer (2009) 8. Chakravarthy, U.S., Grant, J., Minker, J.: Foundations of semantic query opti- mization for deductive databases. In: Foundations of deductive databases and logic programming, pp. 243–273. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1988) 9. Chakravarthy, U.S., Grant, J., Minker, J.: Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15(2), 162–207 (1990) 10. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: CIDR (2003) 11. Cugola, G., Margara, A.: TESLA: A formally defined event specification language. In: Proceedings of the Fourth ACM International Conference on Distributed Event- Based Systems. pp. 50–61. DEBS ’10, ACM, New York, NY, USA (2010) 12. Della Valle, E., Ceri, S., Barbieri, D., Braga, D., Campi, A.: A first step towards stream reasoning. In: Domingue, J., Fensel, D., Traverso, P. (eds.) Future Internet – FIS 2008, LNCS, vol. 5468, pp. 72–81. Springer Berlin / Heidelberg (2009) 13. Hwang, J.H., Xing, Y., Çetintemel, U., Zdonik, S.B.: A cooperative, self-configuring high-availability solution for stream processing. In: ICDE. pp. 176–185 (2007) 14. Kharlamov, E., Solomakhina, N., Özcep, Ö.L., Zheleznyakov, D., Hubauer, T., Lamparter, S., Roshchin, M., Soylu, A.: How semantic technologies can enhance data access at siemens energy. In: Proceedings of the 13th International Semantic Web Conference (ISWC 2014) (2014) 15. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com- bined approach to ontology-based data access. In: Walsh, T. (ed.) IJCAI. pp. 2656– 2661. IJCAI/AAAI (2011) 16. Krämer, J., Seeger, B.: Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Database Syst. 34(1), 1–49 (Apr 2009) 17. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL using a relational database system. In: Proceedings of the 21st Interna- tional Joint Conference on Artificial Intelligence (IJCAI-09). AAAI Press (2009) 18. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in EL using a database system. In: In Proceeding of the 5th International Workshop on OWL: Experiences and Directions (OWLED 2008) (2008) 19. Özçep, Ö. L.., Möller, R.: Ontology based data access on temporal and streaming data. In: Koubarakis, M., Stamou, G., Stoilos, G., Horrocks, I., Kolaitis, P., Lausen, G., Weikum, G. (eds.) Reasoning Web. Reasoning and the Web in the Big Data Era. Lecture Notes in Computer Science, vol. 8714. (2014) 20. Özçep, Ö.L., Möller, R., Neuenstadt, C., Zheleznyakov, D., Kharlamov, E.: Deliv- erable D5.1 – A semantics for temporal and stream-based query answering in an OBDA context. Deliverable FP7-318338, EU (October 2013) 21. Özçep, Özgür.L., Möller, R., Neuenstadt, C.: A stream-temporal query language for ontology based data access. In: KI 2014. LNCS, vol. 8736, pp. 183–194. Springer International Publishing Switzerland (2014) 22. Phuoc, D.L., Dao-Tran, M., Parreira, J.X., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: International Semantic Web Conference (1). pp. 370–388 (2011) 23. Schlatte, R., Möller, R., Giese, C.N.M.: Deliverable D1.1 – joint phase 1 eval- uation and phase 2 requirement analysis. Deliverable FP7-318338, EU (Oc- tober 2013), publicly available at http://www.optique-project.eu/results/ public-deliverables/ 24. Tsangaris, M.M., Kakaletris, G., Kllapi, H., Papanikos, G., Pentaris, F., Polydoras, P., Sitaridi, E., Stoumpos, V., Ioannidis, Y.E.: Dataflow processing and optimiza- tion on grid and cloud infrastructures. IEEE Data Eng. Bull. 32(1), 67–74 (2009)