=Paper= {{Paper |id=Vol-2194/paper_7 |storemode=property |title=Ontology-Mediated Query Answering for Probabilistic Temporal Data with EL Ontologies |pdfUrl=https://ceur-ws.org/Vol-2194/paper_7.pdf |volume=Vol-2194 |authors=Patrick Koopmann |dblpUrl=https://dblp.org/rec/conf/ki/Koopmann18 }} ==Ontology-Mediated Query Answering for Probabilistic Temporal Data with EL Ontologies== https://ceur-ws.org/Vol-2194/paper_7.pdf

Ontology-Mediated Query Answering for Probabilistic Temporal Data with EL Ontologiesı Patrick Koopmann Institute for Theoretical Computer Science, Technische Universität Dresden, Germany patrick.koopmann@tu-dresden.de Abstract. Especially in the field of stream reasoning, there is an in- creased interest in reasoning about temporal data in order to detect situations of interest or complex events. Ontologies have been proved a useful way to infer missing information from incomplete data, or simply to allow for a higher order vocabulary to be used in the event descriptions. Motivated by this, ontology-based temporal query answering has been proposed as a means for the recognition of situations and complex events. But often, the data to be processed do not only contain temporal informa- tion, but also probabilistic information, for example because of uncertain sensor measurements. While there has been a plethora of research on ontology-based temporal query answering, only little is known so far about querying temporal probabilistic data using ontologies. This work addresses this problem by introducing a temporal query language that extends a well-investigated temporal query language with probability operators, and investigating the complexity of answering queries using this query language together with ontologies formulated in the description logic EL. Keywords: Ontology-Based Query Answering · Temporal Query An- swering · Probabilistic Reasoning 1 Introduction Ontology-mediated query answering (OMQA) recently attracted considerable attention as a technique to query incomplete data. In OMQA, queries are evalu- ated with respect to an ontology, which specifies background knowledge about the current domain using a formal language such as a description logic (DL), so that, using reasoning procedures, also implicit information can be queried from the data. In the standard OMQA setting, the data to be queried is assumed to be both static and precise. However, a lot of applications encounter situations where this assumption fails, yet using ontologies could prove useful. The inter- net has become highly dynamic, with information being frequently added and changed, and new data being generated from a variety of sources. In addition, new technologies such as smartphones and the internet of things (IoT) frequently encounter a data environment that is constantly changing. To make use of these ı Supported by the DFG within the collaborative research center SFB 912 (HAEC). 68 data, there has been an increasing interest in investigating semantic and reasoning techniques that process not only static data, but streams of data, such as in the semantic stream reasoning paradigm [24]. As [24] illustrate, frequently, the data encountered in stream reasoning applications is not only temporal, but also probabilistic in nature. As an example, consider a health or fitness monitoring application, for which one may want to use concepts from a medical ontology such as SNOMED CT [14] or Galen [25] to describe information about the health status of a patient. Specifically, such an application could be used on a smartphone in combination with a sensor that measures the diastolic blood pressure of the patient while he is exercising [21]. As the sensor might be imprecise in its measurements, it might report information about whether the blood pressure of the patient is high with an associated probability, and provide this information to the application in regular time intervals. If a too high blood pressure was observed for several times during a short period, the app should give a warning to the patient, and advise him to take a break from his exercise. In order to properly take both the temporal and the probabilistic aspects into account when querying streams of data, we propose a query language for OMQA that comes with both temporal and probabilistic operators. For this, we assume a representation of the data in form of a sequence of probabilistic data sets, which may have been obtained using further preprocessing and windowing operations. An ontology expressed in a description logic (DL) gives additional background information about the domain to be queried, so that implicit information can be queried from incomplete data through reasoning. In the above scenario, the following query could for example be used to detect whether the patients blood pressure was at least twice recorded as high during the last 10 minutes. P>.8 (#≠10 ⌃(HighBloodPressure(x) · ⌃HighBloodPressure(x))) While there has been a lot of research on querying temporal data [1] and probabilistic data [7,19] using ontologies, we are not aware of any research were both aspects are combined in the specific setting we described. In this work, we focus on the setting where the ontology is formulated in EL, a DL that is known for its good computational properties, such as polynomial decidability for most common reasoning problems. This DL, which underlies the OWL EL profile of the web ontology language standard OWL, is used for many large scale ontologies, especially in the bio-medical domain and for the semantic web, such as for the ontologies SNOMED CT and Galen mentioned above. However, our hardness results already apply for simpler description logics such as DL-Lite, as well as for the case where no ontology is used. Related Work Our language is an extension of the temporal query language investigated in [3,8], which extends conjunctive queries with LTL operators. Other authors considered using these operators also as part of the DL, either to describe temporal concepts [16], or to make the axioms of the ontology itself temporal [4]. Recently, this work has been extended also to metric temporal logics, 69 in which temporal operators are annotated with numerical time intervals [2,9,18]. Temporal reasoning for streams of data has recently also been considered in the context of datalog [26]. Surveys on temporal reasoning and query answering with ontologies can be found in [1,23]. Our probabilistic query-answering framework is based on the OMQA frame- work for probabilistic data presented in [19]. Since this publication, several authors investigated OMQA in similar settings [7,6,12]. To our knowledge, the only work that combines both temporal and probabilistic query answering in the presence of description logic ontologies is [11]. Albeit, the authors consider a different setting, in which the flow of time is modelled by a Markov-process. In contrast, we consider temporal data that are provided as a sequence of probabilistic ABoxes. In addition to settings based on probabilistic databases, there is also research on extending DLs with probability operators, such as in P-SHIF(D)/P-SHOIN (D) [22] or Prob-ALC/Prob-EL [17]. While our DL does not support probability operators, the probability operator used in our query language syntactically and semantically corresponds to the probability operator in Prob-ALC and Prob-EL. Formal details of the proofs can be found in the extended version of the paper [20]. 2 Preliminaries We recall the DL EL [5] studied here. Let NC ,NR and NI be countably infinite and pair-wise disjoint sets of respectively concept names, role names and individual names. An EL concept is of one of the forms € | A | C1 Ù C2 | ÷r.C where A œ NC , r œ NR , and C1 , C2 and C are EL concepts. An EL axiom is of the form C1 ı C2 , where C1 and C2 are EL concepts. An EL TBox is a set of EL axioms. An ABox A is a set of assertions of the form A(a) and r(a, b), where A œ NC , r œ NR and a, b œ NI . An EL knowledge base (KB) is a tuple K = ÈT , AÍ, where A is an ABox and T an EL TBox. The semantics of KBs is defined in terms of interpretations, which are tuples I = È∆I , ·I Í, ∆I being a set of domain elements, and ·I an interpretation function that maps each a œ NI to an element aI œ ∆I s.t. for a ”= b œ NI , aI ”= bI (unique name assumption, UNA), each A œ NC to a subset AI ™ ∆I , each r œ NR to a subset rI ™ ∆I ◊ ∆I . The interpretation function ·I is extended to concepts and roles as follows: €I = ∆I (C1 Ù C2 )I = C1I ﬂ C2I (÷r.C1 )I = {d œ ∆I | ÷(d, e) œ rI , e œ C1I }, where C1 , C2 are concepts and r œ NR . An interpretation is a model of a KB ÈT , AÍ (of an TBox) if for every C ı D œ T , C I ™ DI , for every A(a) œ A, aI œ AI , and for every r(a, b) œ A, (aI , bI ) œ rI . 70 A conjunctive query (CQ) takes the form q = ÷y.„(x, y), where x, y are vectors of variables and „(x, y) is a conjunction over atoms of the forms A(t1 ) and r(t1 , t2 ), where A œ NC , r œ NR , and t1 and t2 are terms taken from NI , x or y. x are the answer variables of q. Given an interpretation I and a CQ q with answer variables x1 , . . . , xn , the vector a1 . . . an ™ NI n is an answer of q in I if there exists a mapping ﬁ : term(q) æ ∆I s.t. ﬁ(xi ) = ai for i œ J1, nK ﬁ(b) = bI for b œ NI , ﬁ(t) œ AI for every A(t) in q, and Èﬁ(t1 ), ﬁ(t2 )Í œ rI for every r(t1 , t2 ) in q. a1 . . . an is a certain answer of q in a KB K if it is an answer in every model of K. If a query does not contain any answer variables, it is a Boolean CQ, and we say it is entailed by a KB K (an interpretation I) if it has the empty vector as answer. 3 Temporal Probabilistic Knowledge Bases and Queries We introduce temporal probabilistic knowledge bases (TPKBs) and temporal probabilistic queries (TPQs). Temporal Probabilistic Knowledge Bases. Probabilistic information about a single time point is represented using a probabilistic ABox as introduced in [19]. For simplicity, we focus on assertion-independent probabilistic ABoxes (ipAboxes), though all results should easily extend to the more general case. ipABoxes are the ABox equivalent of tuple-independent probabilistic databases [13]. An ipABox is a set of probabilistic ABox assertions of the form – : p, where – is an ABox assertion and p œ [0, 1]. Intuitively, – : p describes that the assertion – holds with a probability of at least p. Instead of – : 1, we may just write – if the meaning is clear from the context. ipABoxes only specify a lower bound on the probability, to conform with the open-world semantics common in ontology-based representations.1 An EL TPKB is now a tuple ÈT , (Ai )iœJ1,nK Í, where T is an EL TBox and (Ai )iœJ1,nK is a sequence of n ipABoxes. Given a TPKB K = ÈT , (Ai )iœJ1,nK Í, the set œK of possible worlds of K contains all sequences w = (AÕi )iœJ1,nK of classical ABoxes such that for every i œ J1, nK and – œ AÕi , Ai contains an axiom of the form – : p. Each TPKB uniquely defines a probability space ÈœK , µK Í, where µK : 2œK æ [0, 1] satisfies Ÿ Ÿ µK ({(AÕi )iœJ1,nK }) = p· (1 ≠ p) iœJ1,nK iœJ1,nK –:pœAi –:pœAi –œAÕi –”œAÕi q and for W ™ œK , µK (W ) = wœW µ({w}). Intuitively, µK (W ) gives the proba- bility of being in one of the possible worlds in W , by summing up the probabilities of each possible world. The definition of µK (W ) reflects the assumption that all probabilities in the TPKB are statistically independent. 1 Note that this is different to the open-world semantics for probabilistic databases suggested in [10], which assumes a fixed upper probability for facts absent in the data. 71 œK AÕ1 AÕ2 AÕ3 AÕ4 AÕ5 µK w1 {hasBP(p, b), HighBP(b)} ÿ {HighBP(b)} {HighBP(b)} ÿ 0.378 w2 {hasBP(p, b)} ÿ {HighBP(b)} {HighBP(b)} ÿ 0.162 w3 {hasBP(p, b), HighBP(b)} ÿ ÿ {HighBP(b)} ÿ 0.042 w4 {hasBP(p, b)} ÿ ÿ {HighBP(b)} ÿ 0.018 w5 {hasBP(p, b), HighBP(b)} ÿ {HighBP(b)} ÿ ÿ 0.252 w6 {hasBP(p, b)} ÿ {HighBP(b)} ÿ ÿ 0.108 w7 {hasBP(p, b), HighBP(b)} ÿ ÿ ÿ ÿ 0.028 w8 {hasBP(p, b)} ÿ ÿ ÿ ÿ 0.012 Table 1. Probability space of example TPKB. Example 1. We define the TPKB K = ÈT , (Ai )iœJ1,5K Í where T contains the GCI HighBloodPressurePatient © ÷hasBloodPressure.HighBloodPressure and the ABoxes A1 = {hasBP(p, b), HighBP(b) : 0.7}, A2 = ÿ, A3 = {HighBP(b) : 0.9}, A4 = {HighBP(b) : 0.6} and A5 = ÿ, where BP is short for BloodPressure. Every possible world w = (AÕi )iœJ1,5K with hasBP(p, b) ”œ AÕ1 has probability µK (w) = 0. The remaining possible worlds are shown in Table 1, with the probability measure µK shown in the last column. A model of a TPKB K = ÈT , (Ai )iœJ1,nK Í is a mapping ÿ from possible worlds w = (AÕi )iœJ1,nK œ œK to sequences (ÿ(w)i )i>0 of (classical) models of T s.t. for all i œ J1, nK, ÿ(w)i is a model of the classical knowledge base ÈT , AÕi Í, and all ÿ(w)i have the same set ∆ÿ of domain elements (constant domain assumption). Rigid Names. As typical for temporal knowledge bases, we may assume in addition a set Nrig of rigid names, containing the set NCrig ™ NC of rigid concept names and the set NRrig ™ NR of rigid role names. Rigid names denote names whose interpretation is independent of the flow of time. We say that a model ÿ of a TPKB K = ÈT , (Ai )iœJ1,nK Í respects rigid names iff for all w œ œK , i, j œ J1, nK and X œ Nrig , X ÿ(w)i = X ÿ(w)j . Allowing for rigid names often has a direct impact on complexity and decidability of common reasoning problems, which is why typically different cases based on whether NCrig = ÿ or NRrig = ÿ are studied for complexity. Example 2. In the above example, the relation hasBP is rigid, as its interpretation should be independent of time, while the concept HighBP is not rigid, as the blood pressure of a patient can change from high to not high. As a consequence, the individual p will be related to the blood pressure b at all time points, even though the assertion hasBP(p, b) is only placed in the ipABox A1 . Temporal Probabilistic Queries. A temporal probabilistic query (TPQ) is of one of the following forms, where q is a CQ, „1 and „2 are a TPQs and p œ [0, 1]. q | ¬„1 | „1 · „2 | „1 ‚ „2 | #„1 | ⌃„1 | 2„1 | „1 U„2 #≠ „1 | ⌃≠ „1 | 2≠ „1 | „1 S„2 | P>p „1 | P=p „1 | P

0 ⌃≠ „1 ÿ, w, j |= „1 for some j Æ i 2≠ „1 ÿ, w, j |= „1 for all j Æ i „1 S„2 ÿ, w, j |= „2 for some j Æ i and ÿ, w, k |= „1 for all k œ Jj + 1, iK P≥p „ µK ({w œ œK | ÿ, w, i |= „}) ≥ p, where ≥ œ {<, =, >} Table 2. Entailment of Boolean TPQs in the possible world w at time point i under interpretation ÿ. The operators # (next), ⌃ (eventually), U (until) and their inverses are temporal operators of LTL, while P> , P= and P< are the probability operators that we add to this language. TPQs without probability operators corresponds to temporal queries (TQs) investigated in [8]. Note that due the disjunction operator, we can also express unions of conjunctive queries (UCQs), which are simply disjunctions of CQs. The answer variables of a TPQ „ are the answer variables of the CQs occurring in „. A TPQ „ is Boolean if every variable in „ is bound by an existential quantifier. We define the semantics of TPQs. Note that each possible world w œ œK has its own time line, while a model of K contains a sequence of models for every possible world. For a given model, we define the semantics of temporal operators with respect to a single time line, that is, with respect to a current possible world. Probabilistic expressions P≥p „ are the only expressions that are interpreted with respect to other possible worlds. Let ÿ be a model of K, and „ a Boolean TPQ. For a single possible world w œ œK and a time point i, we say that „ is satisfied at w, i under ÿ, in symbols ÿ, w, i |= „ iff the conditions in Table 2 are satisfied. Note that the temporal operators refer to the time line of a single possible world, for which they are defined as in [8]. In contrast, the probabilistic operator refers to the current time point in multiple possible worlds, and is defined similar to the probabilistic concept constructor in the DL Prob-ALC [17]. A Boolean TPQ „ is satisfied in an interpretation ÿ at i, in symbols ÿ, i |= „, iff ÿ, w, i |= „ for all w œ œK . It is entailed by the TPKB K at i iff ÿ, i |= „ for all models ÿ of K. „ is satisfiable in K at i iff there exists a model ÿ of K s.t. ÿ, i |= „. Now given a TPKB K, a TPQ „ with answer variables x, a time point i > 0, and a mapping ‡ : x æ NI , ‡ is a certain answer for „ in K at i iff K, i |= „Õ , where „Õ is the result of applying ‡ on „. As common, since computing answers for TPQs can be seen as a search problem that uses Boolean TPQ entailment, 73 we focus on the decision problem of query entailment, and may refer to Boolean TPQs simply as TPQs. Example 3. If we consider a slight variation of the query from the introduction. P>.8 (#≠5 ⌃(HighBPPatient(x) · ⌃HighBPPatient(x))) For x = p and time point 5, the query below the probability operator is entailed in every model of the possible worlds w1 , w2 , w3 and w5 , which together have a probability of 0.834. Consequently, b is an answer to the query at time point 5. Now consider the variation where the probability operators are moved inside: #≠5 ⌃(P>.8 (HighBPPatient(x)) · ⌃P>.8 (HighBPPatient(x))) This corresponds to the situation where at least twice in the last 5 minutes, the probability of having a high blood pressure was above 0.8. As this probability is only once above this bound, this query is not entailed. 4 Lower Complexity Bound Temporal query answering without probabilities is PSpace-complete in combined complexity if NRrig = ÿ, and otherwise coNExpTime-complete [8]. On the other hand, computing the probability of a CQ from an ipABox is PPNP -complete (see extended version of the paper), and thus also in PSpace [27]. It turns out that, if both the temporal and the probabilistic dimension are combined, we obtain an increase to ExpSpace in complexity. This complexity increase already happens without any rigid symbols, and for TPKBs without TBox and with only one ABox, so that the DL is in fact irrelevant for this result. A query „ is entailed by a TPKB K iff ¬„ is not satisfiable in K. As the complexity class ExpSpace is closed under complement, we can therefore focus on the problem of query satisfiability. We obtain ExpSpace-hardness by reduction of the exponential variant of the corridor tiling problem [15]. In this problem, we are given a set T of tile types, two special tile types ts ,te œ T , a natural number n, and two functions v and h of compatibility constraints v : T æ 2T (vertical) and h : T æ 2T (horizontal). The input is an instance of the exponential corridor tiling problem if there exists a number m œ N and a tiling f : J0, mK ◊ J0, 2n ≠ 1Kæ T such that f (0, 0) = ts , f (m, 0) = te , and for all x œ J0, mK and y œ J0, 2n ≠ 1K, if x < m, f (x + 1, y) œ h(f (x, y)) and if y < 2n ≠ 1, f (x, y + 1) œ v(f (x, y)). We only sketch the idea of the construction here, and leave the details to the long version of the paper. We use n concept names Ai to mark the different possible worlds w œ œK with a counter, such that in interpretations ÿ that satisfy both the TPQ and the TPKB, ÿ, w, j |= Ai (a) iff the ith bit of the counter is 1 at time point j, and ÿ, w, j ”|= Ai (a) iff the ith bit is 0 at time point j. Furthermore, we make sure that each possible world is at each time point uniquely determined by its counter value. For this, we use the ipABox A1 = {Ai (a) : 0.5 | i œ J1, nK}. The query then makes sure that the counter values are increased for each time 74 0 1 2 3 0 1 2 3 1 2 3 0 1 2 3 0 2 3 0 1 2 3 0 1 ··· 3 0 1 2 3 0 1 2 Fig. 1. Illustration of the tilings represented by the possible worlds. point. Figure 1 illustrates this idea. Intuitively, each possible world corresponds to a row in the tiling, with its counter value at time point 1 denoting the row number. At each time point, there are two possible worlds that can be most easily recognised w by a query: the one whose counter value is 0 (which satisfies the query w i (a)), and the one whose counter value is 2 ≠ 1 (which satisfies the n 1ÆiÆn ¬A query 1ÆiÆn Ai (a)). Unless the latter one represents the last row, both these possible worlds correspond to neighbouring rows, which means at each time point we can recognise the vertical neighbour relation for two rows easily, and thus enforce tiling conditions in that direction with the following query, where L(a) is an assertion that marks the last row, and for a tile type t œ T , Bt (a) expresses that the current cell has a tile of type t. QQ R ﬁ ﬁ 2 aaBt1 (a) · Ai (a) · ¬L(a)b t1 œT iœJ1,nK QQ R RR ﬂ ﬁ æ P=1 aa ¬Ai (a)b æ Bt2 (a)bb t2 œv(t1 ) iœJ1,nK As we can only check the vertical tiling conditions for one pair of rows at a time, we represent each cell by up to 2n succeeding time points in each possible world, performing a switch only when the counter reaches 2n ≠ 1. The remaining reduction is described in the extended version of the paper. Lemma 1. Entailment of TPQs is ExpSpace-hard in combined complexity, even for TKBs ÈT , (Ai )iœJ1,nK Í where T = ÿ, n = 1 and NCrig = NRrig = ÿ. 5 Upper Complexity Bound We show that the complexity result presented in the last section is indeed tight, even if NRrig ”= ÿ. We sketch here only the case without rigid symbols. How rigid symbols are integrated is then discussed in the extended version of the paper. Our construction is based on an abstraction of a temporal probabilistic model, which we call quasi-model, which collects for each time point and possible world the CQs occurring in the input query that are entailed, as well as the CQs that are not entailed. We focus on satisfiability of a TPQ „ in a TPKB K = ÈT , (Ai )iœJ1,nK Í, 75 where we say „ is satisfiable in K iff „ is satisfiable in K at 1. In other words, we ignore the time point to make things simpler. Since „ is satisfiable in K at i iff #i≠1 „ is satisfiable, this is sufficient for our complexity analysis. We can assume without loss of generality that „ contains only the operators ·, ¬, U, S and P≥p , since the remaining operators can be linearly encoded using known equivalences. Denote by sub(„) the sub-queries of „ and set T („) = {Â, ¬Â | Â œ sub(„)}. A quasi-state is a mapping Q : œK æ T („) that satisfies the following conditions: S1 ¬Â œ Qi (w) iff Â ”œ Qi (w), S2 for all Â1 · Â2 œ T („): Â1 · Â2 œ Qi (w) iff Â1 œ Qi (w) and Â2 œ Qi (w), and S3 for all P≥p (Â) œ T („): P≥p (Â) œ Qi (w) iff µK ({w | Â œ Qi (w)}) ≥ p. The quasi-state abstracts probabilistic interpretations at a single time point by assigning queries to each possible world according to the semantics of the atempo- ral operators in our query language. To incorporate the temporal dimension, we consider unbounded sequences of quasi-states (Qi )iØ1 , which we call quasi-models for „ in K, and which have to satisfy the following conditions for i Ø 1 and w = (AÕi )iœJ1,nK œ œK . Q1 „ œ Q1 (w), w Q2 if i œ J1, nK, AÕi |= ÂœX Â, where X = {Â œ Qi (w) | Â is a CQ or a negated CQ}. Q3 for all #Â œ T („), #Â œ Qi (w) iff Â œ Qi+1 (w), Q4 for all #≠ Â œ T („), #≠ Â œ Qi+1 (w) iff Â œ Qi (w), Q5 for all Â1 UÂ2 œ T („), Â1 UÂ2 œ Qi iff there exists j Ø i s.t. Â2 œ Qj (w) and for all k œ Ji, j ≠ 1K, Â1 œ Qk (w), and Q6 for all Â1 SÂ2 œ T („), Â1 SÂ2 œ Qi iff there exists j Æ i s.t. Â2 œ Qj (w) and for all k œ Jj ≠ 1, iK, Â1 œ Qk (w). Again, the intuition behind these conditions follows directly from the semantics of the temporal operators. As we show in the extended version of the paper, quasi-models are indeed sufficient to witness the satisfiability of a TPQ in a TPKB. Moreover, it is sufficient to consider quasi-models that are of a certain regular form, which is the crucial element for our complexity bound. Lemma 2. „ is satisfiable in K with NCrig = NRrig = ÿ iff there exists a quasi- model for „ in K wrt. S and a which is of the form Q1 , . . . Qm (Qm+1 , . . . Qm+o )Ê , where m and o are both double-exponentially bounded in the size of K and „. Exploiting the fact that ExpSpace = NExpSpace, we obtain our space bounds by a non-deterministic decision procedure that can be roughly sketched as follows. We first guess the numbers m and o. While m and o are double- exponentially bounded, they can be stored in exponential space using binary encoding. We now guess the quasi-states Q1 , . . ., Qm+o one after the other, 76 where we carefully make sure that all conditions of quasi-models are satisfied. In particular, we keep track of U- and S-formulae that have to be satisfied, and we keep the state Qm+1 in memory to test that it is compatible to Qm+o , and that all U-formulae in Qm+1 are satisfied before we reach Qm+o . In the extended version of the paper, we present a refined version of quasi- models, which also have the above regularity property, but additionally take into consideration rigid predicates. The main idea is to use for each possible world an additional structure that determines which sets of CQs and their negations can be entailed at any time point under the rigidity constraints. This structure takes exponential space per possible world, and can be computed in non-deterministic exponential time. Theorem 1. Entailment of TPQs from EL-TPKBs can be decided in ExpSpace, even if NRrig ”= ÿ and NCrig ”= ÿ. 6 Removing Negation The complexity increase discussed in the last sections can be avoided if we restrict ourselves to positive TPQs, which are TPQs that do not use the operators ¬, P

p „ only expresses the positive entailment of „ in possible worlds. The example queries shown in this paper are all positive queries. In the absence of negation, it is possible to evaluate the probabilities of sub- queries “inside-out”, starting from queries of the form P>p „ where „ contains no probabilistic operators. For non-probabilistic temporal queries, it can be decided in P data and NP combined complexity whether they are entailed. This allows to decide the entailment of P>p „ at any time point in PP, by using a probabilistic Turing machine that guesses all possible worlds of the TPKB. Using closure properties of the complexity class PP, we can thus obtain tight complexity bounds for the case where the nesting depth of probability operators is bounded, and otherwise inclusion in PPP , a complexity class that is still contained in PSpace. Theorem 2. Entailment of positive TPQs from EL-TPKBs is PP-complete wrt. data complexity. Regarding combined complexity, it is PPNP -complete if the nesting depth of probability-operators in the query is bounded, and otherwise in PPP . The results already hold for NRrig ”= ÿ. 7 Conclusion We investigated the complexity of querying temporal probabilistic data using a combination of LTL and conjunctive queries with probability operators. While pure temporal and pure probabilistic query answering are both in PSpace for most cases, combining both dimensions yields completeness for ExpSpace. This increase in complexity already happens without TBoxes and just with a single 77 ABox, so that the hardness result is in fact independent of DL reasoning. This increase of complexity can be avoided if we restrict ourselves to positive TPQs, in which case the temporal dimension comes at no cost or almost no cost compared to pure probabilistic query answering. While this paper presented a theoretical study of the setting of temporal probabilistic query answering, the methods presented give no clear idea how a practical implementation would look like. For description logics that enjoy first-order rewritability such as DL-Lite, a solution could be to rewrite temporal queries into SQL and use a probabilistic database system to compute their probabilities. However, this approach would only work for queries that do not use negation, and it is not clear whether it can be used with rigid symbols [8]. Another open question is how the data complexity looks like in the case where we allow for negation, and whether the complexities further change if we admit more expressive DLs, or even DLs that support temporal and probabilistic operators themselves. References 1. Artale, A., Kontchakov, R., Kovtunova, A., Ryzhikov, V., Wolter, F., Za- kharyaschev, M.: Ontology-mediated query answering over temporal data: A survey (invited talk). In: Proceedings of TIME 2017. pp. 1:1–1:37 (2017). https://doi.org/10.4230/LIPIcs.TIME.2017.1 2. Baader, F., Borgwardt, S., Koopmann, P., Ozaki, A., Thost, V.: Metric temporal description logics with interval-rigid names. In: Proceedings of FroCoS 2017. pp. 60–76 (2017). https://doi.org/10.1007/978-3-319-66167-4 4 3. Baader, F., Borgwardt, S., Lippmann, M.: Temporal query entail- ment in the description logic SHQ. J. Web Sem. 33, 71–93 (2015). https://doi.org/10.1016/j.websem.2014.11.008 4. Baader, F., Ghilardi, S., Lutz, C.: LTL over description logic axioms. ACM Trans. Comput. Log. 13(3), 21:1–21:32 (2012). https://doi.org/10.1145/2287718.2287721 5. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description Logic. Cambridge University Press (2017) 6. Baader, F., Koopmann, P., Turhan, A.: Using ontologies to query proba- bilistic numerical data. In: Proceedings of FroCoS 2017. pp. 77–94 (2017). https://doi.org/10.1007/978-3-319-66167-4 5 7. Borgwardt, S., Ceylan, İ.İ., Lukasiewicz, T.: Ontology-mediated queries for probabilistic databases. In: Proceedings of AAAI 2017. pp. 1063–1069 (2017), http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14365 8. Borgwardt, S., Lippmann, M., Thost, V.: Temporalizing rewritable query languages over knowledge bases. J. Web Sem. 33, 50–70 (2015). https://doi.org/10.1016/j.websem.2014.11.007 9. Brandt, S., Kalayci, E.G., Kontchakov, R., Ryzhikov, V., Xiao, G., Zakharyaschev, M.: Ontology-based data access with a Horn fragment of metric temporal logic. In: Proceedings of AAAI 2017. pp. 1070–1076 (2017), http://aaai.org/ocs/index.php/ AAAI/AAAI17/paper/view/14881 10. Ceylan, İ.İ., Darwiche, A., den Broeck, G.V.: Open-world probabilistic databases: An abridged report. In: Proceedings of IJCAI 2017. pp. 4796–4800 (2017). https://doi.org/10.24963/ijcai.2017/669 78 11. Ceylan, İ.İ., Peñaloza, R.: Dynamic Bayesian description logics. In: Proceedings of DL 2015 (2015), http://ceur-ws.org/Vol-1350/paper-48.pdf 12. Ceylan, İ.İ., Peñaloza, R.: The Bayesian ontology language BEL. J. Autom. Rea- soning 58(1), 67–95 (2017). https://doi.org/10.1007/s10817-016-9386-0 13. Dalvi, N.N., Schnaitter, K., Suciu, D.: Computing query probability with incidence algebras. In: Proceedings of PODS 2010. pp. 203–214 (2010). https://doi.org/10.1145/1807085.1807113 14. Elkin, P.L., Brown, S.H., Husser, C.S., Bauer, B.A., Wahner-Roedler, D., Rosen- bloom, S.T., Speroff, T.: Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. Mayo Clin. Proc. 81(6), 741–748 (2006) 15. van Emde Boas, P.: The convenience of tilings. Lecture Notes in Pure and Applied Mathematics pp. 331–363 (1997) 16. Gabbay, D.M., Kurucz, A., Wolter, F., Zakharyaschev, M.: Many-dimensional modal logics: Theory and applications. Elsevier (2003) 17. Gutiérrez-Basulto, V., Jung, J.C., Lutz, C., Schröder, L.: Probabilistic descrip- tion logics for subjective uncertainty. J. Artif. Intell. Res. 58, 1–66 (2017). https://doi.org/10.1613/jair.5222 18. Gutiérrez-Basulto, V., Jung, J.C., Ozaki, A.: On metric temporal description logics. In: Proceedings of ECAI 2016. pp. 837–845 (2016). https://doi.org/10.3233/978-1- 61499-672-9-837 19. Jung, J.C., Lutz, C.: Ontology-based access to probabilistic data with OWL QL. In: The Semantic Web - ISWC 2012. Lecture Notes in Computer Science, vol. 7649, pp. 182–197. Springer (2012). https://doi.org/10.1007/978-3-642-35176-1 12 20. Koopmann, P.: Ontology-mediated query answering for probabilistic temporal data with EL ontologies (extended version). LTCS-Report 18-07, Chair of Automata Theory, Institute for Theoretical Computer Science, Technische Universität Dresden, Dresden, Germany (2018), see http://lat.inf.tu-dresden.de/research/reports.html 21. Kumar, N., Khunger, M., Gupta, A., Garg, N.: A content analysis of smartphone– based applications for hypertension management. Journal of the American Society of Hypertension 9(2), 130–136 (2015) 22. Lukasiewicz, T.: Expressive probabilistic description logics. Artif. Intell. 172(6-7), 852–883 (2008). https://doi.org/10.1016/j.artint.2007.10.017 23. Lutz, C., Wolter, F., Zakharyaschev, M.: Temporal description logics: A survey. In: Proceedings of TIME 2008. pp. 3–14. IEEE Press (2008). https://doi.org/10.1109/TIME.2008.14 24. Margara, A., Urbani, J., van Harmelen, F., Bal, H.: Streaming the web: Reasoning over dynamic data. Web Semantics: Science, Ser- vices and Agents on the World Wide Web 25, 24 – 44 (2014). https://doi.org/https://doi.org/10.1016/j.websem.2014.02.001 25. Rector, A., Gangemi, A., Galeazzi, E., Glowinski, A., Rossi-Mori, A.: The GALEN CORE model schemata for anatomy: Towards a re-usable application-independent model of medical concepts. In: Proc. MIE’94. pp. 229–233 (1994) 26. Ronca, A., Kaminski, M., Grau, B.C., Motik, B., Horrocks, I.: Stream reasoning in temporal datalog. CoRR abs/1711.04013 (2017), http://arxiv.org/abs/1711. 04013 27. Toda, S.: PP is as hard as the polynomial-time hierarchy. SIAM J. Comput. 20(5), 865–877 (1991). https://doi.org/10.1137/0220053 79