OBDA for Temporal Querying and Streams Christian Neuenstadt, Ralf Möller, and Özgür L. Özçep Institute of Information Systems (Ifis) University of Lübeck Lübeck, Germany {moeller,neuenstadt,oezcep}@ifis.uni-luebeck.de Abstract. Data changes worldwide in size and over time and when new data arrives rapidly from different sources, an easy access to dynamic data becomes a keyfactor. Therefore, temporalizing and streamifying ontology-based data access (OBDA) is a very important topic today, where the industry still relies on algebraic queries. We contribute to the practical efforts in this field by showing how a specific ontology-based stream querying language can be transformed with respect to mappings into standard SQL queries. For that purpose we choose the stream and temporal reasoning query language STARQL. STARQL is motivated by industrial usecases and evaluated in the European research project Op- tique. It offers access to temporal and streaming data as well for reactive diagnosis or continous monitoring. 1 Introduction The following work contributes to research in the field of OBDA1 on tempo- ral data [4,3] and especially on streaming data. We show how we transform an ontology based stream data query language into equivalent relational alge- bra queries in SQL for timetagged data tables. Our approach extends previous work[9,11,10,8] and can be applied to the paradigm of streaming data as well as temporal data itself. Sensor data is playing an important role in the IP7 EU project Optique. Siemens, one of the industrial stakeholders, provides huge amounts of sensor data to the project [7]. Thus, in this project the idea of processing streams with a sliding window (e.g. in [2]) has been extended by defining a finite sequence of consistent ABoxes for each window, which represent a sequence of time and extends previous ideas of adopting OBDA to streams [6,5,12], where the window content was treated as a single ABox. The idea of a sliding window is used for reactive diagnostics on temporal sensor data and accessed by within the project. In addition, a query language for streams has been developed, which is called STARQL2 . STARQL is a SPARQL-like query languange with added window definition and a special clause for additional time constraints and querying a 1 Ontology-based Data Access 2 Streaming and Temporal ontology Access with a Reasoning-based Query Language sequence of ABoxes. It has been designed for advanced data access to work equally for stream and temporal reasoning. The idea of an ABox sequencing strategy makes the required steps, rewriting and unfolding for Ontology Based Data Access a challenging task. In the next section, we analyze the rules for STARQL transformations into algebraic queries like SQL and show that the rewriting and unfolding steps are indeed possible. 2 Transformation Rules for Streaming Data with STARQL As mentioned in the last section, rewriting and unfolding of the ABox sequencing strategy is the most challenging task. STARQL uses a clause especially designed for temporal data access called Having Clause. We will concentrate explicitly on transforming the STARQL Having-Clause into an algebraic query language in this paper. An example for the use of a Having Clause can be seen in Listing 1. Here we see the definition of a stream Sout (Line 1), its output is defined as sensors of the type MonInc, which stands for monotonically increasing (Line 2). In line three the input stream with sensor values and the sliding window operator is defined, which is of size three seconds and slides with exactly one second for each window evaluation. In the next part the sequence of ABoxes for each window is defined. StdSeq stands for a sequence where each data set, which is represented by another timestamp, is placed in different ABoxes by an increasing order. The important part to evaluate the content of each window is shown in lines 5 to 7. For STARQL a new so called Having Clause has been explored. Its syntax is defined in first order logic using quantors for binding variables, conjunctions, disjunctions or negations of atoms, which can be graph triples or logical and arithmetic expressions. In our specific example a filter for monotonically increasing sensor data is used. It can be read as follows: for all timepoints i smaller than j it is true that if in i a sensor has a value x and in j the same sensor has a value y, then x must be smaller than or equal i. An expression of the Having Clause from Listing 1 in first order logic is shown in formula 1. For an analysis of the STARQL syntax and semantics we refer to [9,10]. ∀i, j, x, y((R(i, s, x) ∧ S(j, s, y) ∧ i < j) → (x ≤ y)) (1) We will take this query as a running example in the further paper. 1 CREATE STREAM S_out AS 2 CONSTRUCT GRAPH NOW { ? s rdf : type MonInc } 3 FROM STREAM S_Msmt [ NOW -2 s , NOW ] - >"1 S "^^ xsd : duration 4 SEQUENCE BY StdSeq AS seq 5 HAVING FORALL ?i ,? j IN seq ,? x ,? y : 6 IF ( GRAPH ? i { ? s : val ? x } AND GRAPH ? j { ? s : val ? y } AND ? i < ? j ) 7 THEN ? x <= ? y Listing 1: Example query in STARQL Classical OBDA without querying time or time sequences has been researched for many years. In the non-temporal case the query language SPARQL is com- monly used. Query languages like SPARQL (or in our temporal case STARQL) that are transformed to relational algebra must be domain independent, which can be achieved by a syntactical criterion requiring that all variables have to be range restricted (see [1] for a definition of domain independence). An example for a non restricted variable would be in the STARQL case a Having Clause of “?y > 1” . In this case, the variable ?y could be evaluated to numbers be- tween one and infinity. We can range restrict the variable ?y by another atom like “?y > 1 AND GRAPH i { ?sens :hasVal ?y }”. Here the variable ?y is not range restricted for the atom “?y > 1”, but for the complete clause, because it is restricted to the concrete elements that actually exist in ABox i or if we stick to the example, all values that have been measured by a sensor. Thus, we see the Having Clause can be range restricted, although parts of it are not restricted itself. STARQL requires the Having Clause to be range restricted over all, which is guaranteed by its formulation rules. Thus, we can assume that the Having Clause is domain independent for our cases and can be indeed transformed into relational algebra. For reasons of space restriction, we refer for more specific information on STARQL domain independence to [10]. For our transformation of the STARQL Having Clause, we make two more assumptions. First, our database offers a view that represents time tagged data with a time sequence representation of the sliding window in two additional columns, where the schema consists of WindowID, ABoxID, Timestamp and [Datacolumns] for every temporal table. An example view for a sliding window with measured data is shown in Table 1. It includes measurements for a single sensor and describes two windows with WindowID 1 and 2, where the window width is of three minutes. Every single ABox in the sequence can be accessed by the ABoxID in the second column. Table 1: Example for a sliding window view in the measurement scenario WindowID ABoxID T imestamp Sensor Value 1 1 00:00 sens1 90 1 2 00:01 sens1 91 1 3 00:02 sens1 93 2 1 00:01 sens1 91 2 2 00:02 sens1 93 2 3 00:03 sens1 94 For the second assumption we assume additionally that the unfolding for a single graph triple “GRAPH i { subject predicate value}”, where i respresents the index variable of ABoxes, is trivial and can be unfolded with mappings to δaid→i πwid,aid,x1 ,...,xn (R), where wid represents the evaluated window, aid the ABox, x1 , ..., xn the specific data columns and R the described data view in sliding window format. A projection of the WindowID is necessary for temporal queries, where we explore several time windows in parallel. For streaming queries, where we evaluate the data of a single window per timepoint only, an index variable for comparing the ABoxes in a sequence would be enough. Making these assumptions, the Having Clause of STARQL can be trans- formed by following the algorithm below for transforming from Safe Range Nor- mal Form into Relational Algebra Normal Form (for detailed informations and proofs, see [1]). First, we normalize the Safe Range Form of the Having Clause by five steps: 1. Variable substitution: All variable names are substituted until every vari- able is bound only once and no variable occurs free and bound. 2. Remove universal quantifier: We replace all ∀xψ by ¬∃x¬ψ 3. Remove implications: We replace all ψ → ξ by ¬ψ ∨ ξ 4. Push negations: We push every negation down into the formula until every child of a negation is either an atom or an exists quantifier 5. Flattening: Flatten every and, which is the child of an and, do the same for or and exists quantifiers Second, we normalize again for the relational algebraic normal form by three additional steps: 1. Push into or: We push every or upwards until there is no or as a child of an and. 2. Push into quantifier: if ψ is of the form ψ1 ∧ ... ∧ ψn ∧ ∃xξ and not all free variables of ξ are range restricted, push a subset of ψ1 ∧ ... ∧ ψn as conjunct into ξ until all free variables of ξ are range restricted. 3. Push into negated quantifier: if ψ is of the form ψ1 ∧ ... ∧ ψn ∧ ¬∃xξ and not all free variables of ξ are range restricted, copy a subset of ψ1 ∧ ... ∧ ψn as conjunct into ξ until all free variables of ξ are range restricted. ¬∃i, j, x, y(R(i, s, x) ∧ S(j, s, y) ∧ i < j ∧ x > y) (2) After both normalization steps the Having Clause example from Listing 1 can now be written in RANF as shown in Formula 2. For transforming a formula ψ from Relational Algebraic Normal Form (RANF) into an algebraic query Eψ , we adopt the modified algorithm 1 with added win- dow and ABox information from [1]. Unfolding of the RANF formula yields to the algebraic query in Formula 3. {hi} – πwid,aid,s (σx>y (σiy (σi