A Stream-Temporal Query Language for Ontology Based Data Access? Özgür L. Özçep, Ralf Möller, and Christian Neuenstadt Institute for Softwaresystems (STS) Hamburg University of Technology Hamburg, Germany {oezguer.oezcep,moeller,christian.neuenstadt}@tu-harburg.de Abstract. The paper contributes to the recent efforts on temporalizing and streamifiying ontology based data access (OBDA) by discussing as- pects of rewritability, i.e., compilability of the TBox into ontology-level queries, and unfoldability, i.e., transformability of ontology-level queries to queries on datasource level, for the new query-language framework STARQL. The distinguishing feature of STARQL is its general stream windowing and ABox sequencing strategy which allows it to plugin well known query languages such as unions of conjunctive queries (UCQs) in combination with TBox languages such as DL-Lite and do temporal reasoning with a sorted first order logic on top of them. The paper dis- cusses safety aspects under which STARQL queries that embed UCQs over DL-Lite ontologies can be rewritten and unfolded to back-end re- lational stream query languages such as CQL. With these results, the adoption of description logic technology in industrially relevant applica- tion areas such as industrial monitoring is crucially fostered. Keywords: streams, OBDA, monitoring, unfolding, safety 1 Introduction The work described in this paper is part of recent efforts on streamifying OBDA [12,6,7,18] and, to some extent, also temporalizing OBDA [5,4]. Streams, as po- tentially infinite sequences of elements, cannot be processed as a whole. Hence blocking operators such as the classical grouping operator and aggregation oper- ators cannot be applied to it. The simple but fundamental idea of circumventing this problem is to apply on streams a (small) window the content of which is updated as new elements from the stream arrive at the query answering system. Stream window operators play an important role also in the new query lan- guage framework STARQL (Streaming and Temporal ontology Access with a Reasoning-based Query Language, pronounced Star-Q-L, [17,16]). Its framework character relies in the facts that 1) it can embed queries of various query lan- guages, 2) refer to ontologies in various DL languages, and 3) use a first order ? This work has been supported by the European Commission as part of the FP7 project Optique. logic (FOL) fragment for temporal reasoning over ABox sequences constructed within the query. In this paper, we focus on the latter aspect assuming for the first two unions of conjunctive queries (UCQs) w.r.t DL-Lite ontologies. In STARQL, the idea of processing over windows is pushed further by ex- tending these with sequencing operators that set up at every time point a finite sequence of ABoxes on which temporal reasoning can be applied. STARQL does not assume a stream of ABoxes which hold universally but rather modifies/ex- ploits the given ABox streams to build its own stream of finite ABox sequences. This sequencing strategy, among other things, distinguishes STARQL from the approaches in [12,6,7,18]. It is a natural addition to the window operators that sets up at every time point a context in which temporal reasoning can be applied. In this paper, we consider an instantiation of STARQL where Intra-ABox reasoning within sequences is handled by answering UCQs over DL-Lite ontolo- gies w.r.t. the certain answer semantics. (Which logic from the DL-Lite family to choose does not matter, as long as it allows for FOL rewriting.) Within Inter- ABox reasoning certain answers from the different ABoxes are related and con- strained with an outer temporal FOL formula. This is challenging if one allows in the FOL template negation, disjunction and all quantifiers in combination with concrete domains, as these, if not constrained, would immediately lead to infinite sets of answers, in particular w.r.t. concrete domain values. STARQL uses a new adornment technique for variables to guarantee safe- ness. We demonstrate the safety mechanism which will guarantee that the FOL template language is domain independent [1] and as such can be rewritten as SQL query. This opens the door for (rewriting and) unfolding STARQL queries into queries of domain independent languages such as the relational stream query language CQL [3]. Based on CQL, practical systems have been developed. Thus, this paper provides the foundation for expressive ODBA stream querying. 2 The STARQL framework We describe the syntax and the semantics for a fragment of STARQL (see [17] for the full version). Our running example for illustration purposes is a measurement scenario in which there is a (possibly virtual) stream SMsmt of ABox assertions. Its initial ≤5s part, called SMsmt here, contains timestamped ABox assertions giving the value of a temperature sensor s0 at 6 time points starting with 0s. ≤5s SMsmt = {val (s0 , 90◦ )h0si, val (s0 , 93◦ )h1si, val (s0 , 94◦ )h2si val (s0 , 92◦ )h3si, val (s0 , 93◦ )h4si, val (s0 , 95◦ )h5si} Assume further, that a static ABox contains knowledge on sensors telling, e.g., which sensor is of which type. In particular, let BurnerT ipT empSens(s0 ) be in the static ABox. Moreover, let there be a pure DL-Lite TBox with additional information such as BurnerT ipT empSens v T empSens saying that all burner tip temperature sensors are temperature sensors. We want to formalize the following information need: Starting with time point 0s, output every second those temperature sensors whose value grew monotoni- cally in the last 2 seconds. A possible STARQL representation of the information is illustrated in the following listing. CREATE STREAM S_out AS CREATE PULSE AS START = 0s , FREQUENCE = 1 s SELECT { ? s rdf : type MonInc } < NOW > FROM S_Msmt [ NOW -2 s , NOW ] - >1 s USING STATIC ABOX < http :// Astatic > , TBOX < http :// TBox > WHERE { ? s rdf : type TempSens } SEQUENCE BY StdSeq AS SEQ HAVING FORALL i < j IN SEQ ,? x ,? y : IF ({ ? s val ? x } AND { ? s val ? y } ) THEN ? x <= ? y Syntax The example demonstrates much of the syntactical possibilities within STARQL whose grammar is sketched in Fig. 1. The rules for the HAVING clause are not given there but are discussed in more in the following sections. After the create expressions for the stream and the output frequency the queries’ main contents are captured by the select expressions. The head of the select expression describes the output format of the stream giving a RDF basic graph pattern (BGP) with an attached time expression, here < N OW >, for the evolving time. The general motivation for this approach is similar to the CONSTRUCT operator in the SPARQL query language. So the actual result in the monotonicity example is a stream of ABox assertions of the form M onInc(s0 )hti. ≤5s Sout = {M onInc(s0 )h0si, M onInc(s0 )h1si, M onInc(s0 )h2si, M onInc(s0 )h5si} Within the WHERE clause one can bind variables w.r.t the KBs mentioned in the USING clause by using UCQs. We assume an underlying DL-Lite Logic for the static ABox, the TBox and the UCQs which allows for concrete domain values, e.g. DL-LiteA [8]. In this example, instantiations of the sensors ?s are fixed w.r.t to a static ABox and a TBox given by URIs. The heart of the STARQL queries is the window operator in combination with the sequencing mechanism. In the example, the operator [NOW-2s, NOW]->1s describes a sliding window operator, which collects the timestamped ABox as- sertions in the last two seconds and then slides 1s forward in time. Every tempo- ral ABox produced by the window operator is converted to a sequence of (pure) ABoxes. At every time point, one has a sequence of ABoxes on which temporal (state-based) reasoning can be applied. This is realized in STARQL by a sorted first order language template in which timestamped UCQs conditions (notated here as BGP) are embedded. In the example above, the HAVING clause expresses a monotonicity condition stating that for all values ?x that are values of sensor ?s w.r.t the ith ABox and for all values ?y that are values of the same sensor ?s w.r.t. the j t h ABox, it must be the case that ?x is less than or equal to ?y. createExp −→ CREATE STREAM name AS [pulseExp] selectExp | pulseExp pulseExp −→ CREATE PULSE AS START = startTime, FREQUENCE = freq selectExp −→ SELECT selectHead(x, y) FROM listWinStreamExp [USING listOfRessources] WHERE whereClause(x) SEQUENCE BY seqMethod [HAVING safeHavingClause(x, y)] selectHead (x, y) −→ sparqlTriple(x, y)