=Paper= {{Paper |id=Vol-1879/paper40 |storemode=property |title=Using Ontologies to Query Probabilistic Numerical Data (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-1879/paper40.pdf |volume=Vol-1879 |authors=Franz Baader,Patrick Koopmann,Anni-Yasmin Turhan |dblpUrl=https://dblp.org/rec/conf/dlog/BaaderKT17 }} ==Using Ontologies to Query Probabilistic Numerical Data (Extended Abstract)== https://ceur-ws.org/Vol-1879/paper40.pdf
        Using Ontologies to Query Probabilistic
        Numerical Data (Extended Abstract)?

           Franz Baader, Patrick Koopmann, and Anni-Yasmin Turhan

                      Theoretical Computer Science, TU Dresden



        Abstract. We consider ontology-based query answering in a setting
        where some of the data are numerical and of a probabilistic nature,
        such as data obtained from uncertain sensor readings. The uncertainty
        for such numerical values can be more precisely represented by continu-
        ous probability distributions than by discrete probabilities for numerical
        facts concerning exact values. For this reason, we extend existing ap-
        proaches using discrete probability distributions over facts by continuous
        probability distributions over numerical values. We determine the exact
        (data and combined) complexity of query answering in extensions of the
        well-known description logics EL and ALC with numerical comparison
        operators in this probabilistic setting.


1     Introduction
Ontology-based query answering (OBQA) has recently attracted considerable
attention since it dispenses with the closed world assumption of classical query
answering in databases and thus can deal with incomplete data. In addition,
background information stated in an appropriate ontology can be used to de-
duce more answers. OBQA is usually investigated in a setting where queries
are (unions of) conjunctive queries and ontologies are expressed using an ap-
propriate Description Logic (DL). Depending on the expressiveness of the DL,
the complexity of query answering may vary considerably, starting with data
complexity (i.e., complexity measured in the size of the data only) of AC0 for
members of the DL-Lite family [10, 2] to P for DLs of the EL family [30], all
the way up to intractable data complexity for expressive DLs such as ALC and
beyond [19].
    In many application scenarios for OBQA, however, querying just symbolic
data is not sufficient. One also wants to be able to query numerical data. For
example, in a health or fitness monitoring application, one may want to use con-
cepts from a medical ontology such as SNOMED CT [18] or Galen [31] to express
information about the health status of a patient, but also needs to store and re-
fer to numerical values such as the blood pressure or heart rate of this patient.
As an example, let us consider hypertension management using a smartphone
app [25]. What constitutes dangerously high blood pressure (HBP) depends on
?
    Supported by the DFG within the collaborative research center SFB 912 (HAEC)
    and the research unit FOR 1513 (HYBRIS).
2


                                               0.3




                        probability density
                                              0.25
                                               0.2
                                              0.15
                                               0.1
                                              0.05
                                                 0
                                                     70    75   80   85   90   95 100
                                                          diastolic blood pressure

                  Fig. 1. Measured blood pressure as normal distribution.


the measured values of the diastolic pressure, but also on other factors. For ex-
ample, if a patient suffers from diabetes, a diastolic blood pressure above 85
may already be classified as too high, whereas under normal circumstances it is
only considered to be too high above 90. This could, for example, be modelled
as follows by an ontology:

                                               ∃diastolicBloodPressure.>90 v PatientWithHBP           (1)
          ∃finding.Diabetes u ∃diastolicBloodPressure.>85 v PatientWithHBP                            (2)

Note that we have used a DL with concrete domains [6] to refer to numerical
values and predicates on these values within concepts. While there has been quite
some work on traditional reasoning (satisfiability, subsumption, instance) in DLs
with concrete domains [27], there is scant work on OBQA for such DLs. To the
best of our knowledge, the only work in this direction considers concrete domain
extensions of members of the DL-Lite family [3, 33, 4, 21]. In contrast, we consider
concrete domain extensions of EL and ALC and determine the (combined and
data) complexity of query answering.
    However, the main difference to previous work is that we do not assume the
numerical values in the data to be exact. In fact, a value of 84.5 for the dias-
tolic pressure given by a blood pressure sensor does not really mean that the
pressure is precisely 84.5, but rather that it is around 84.5. The actual value
follows a probability distribution—for example a normal distribution with ex-
pected value 84.5 and a variance of 2 as shown in Figure 1—which is determined
by the measured value and some known variance that is a characteristic of the
employed sensor. We can represent this in the knowledge base for example as
follows:

    finding(otto, f1)   Diabetes(f1)                        diastolicBloodPressure(otto) ∼ norm(84.5, 2)

From this information, we can derive that the minimal probability for the patient
Otto to have high blood pressure is slightly above 36%, which might be enough
to issue a warning. In contrast, if instead of using a probability distribution we
had asserted 84.5 as the exact value for Otto’s diastolic blood pressure, we could
not have inferred that Otto is in any danger.
                                                                                 3

    Continuous probability distributions as used in this example also emerge in
other potential applications of OBQA such as in robotics [36], tracking of object
positions in video analytics [37], and mobile applications using probabilistic sen-
sor data [16], to name a few. The interest in continuous probability distributions
is also reflected in the development of database systems that support these [35].
    In addition to using continuous probability distributions for sensor values, we
also consider discrete probability distributions for facts. For example, it might
be that the finding f1 for Otto is diabetes only with a certain probability. While
OBQA for probabilistic data with discrete probability distributions has been con-
sidered before for DL-Lite and EL without concrete domains [15, 23, 13], as well as
for datalog [12], OBQA for probabilistic data with both discrete and continuous
probability distributions is investigated here for the first time. A rather expres-
sive combination we consider is the DL ALC extended with a concrete domain in
which real numbers can be compared using the (binary) predicates > and =. A
less expressive combination we consider is the DL EL extended with a concrete
domain in which real numbers can be compared with a fixed number using the
(unary) predicates >n for n ∈ R. Since OBQA for classical knowledge bases (i.e.,
without probabilities) in these two DLs has not been investigated before, we first
determine their (data and combined) complexity of query answering. When con-
sidering probabilistic KBs with continuous probability distributions (modelled
as real-valued functions), the resulting probabilities may be numbers without a
finite representation. To overcome this problem, we define probabilistic query
entailment with respect to a given precision parameter. To allow a reasonable
complexity analysis, we define a set of feasibility conditions for probability dis-
tributions, based on the complexity theory of real functions [24], which capture
most typical probability distributions that appear in practical applications. For
probabilistic KBs that satisfy these conditions, we give tight bounds on the com-
plexity of probabilistic query answering w.r.t data and combined complexity for
all considered DLs.
    A more detailed version of this paper is published in in [7]. Proofs of all
results can be found in [8].


2   Description Logics with Numerical Domains

We focus on the description logics ALC(R) and EL(R> ), which extend the de-
scription logics ALC and EL with a concrete domain over the real numbers as
in [6]. We assume notions of ALC/EL TBox, ABox, and knowledge base, as well
as entailment, as in [34, 5]. ALC(R) and EL(R> ) allow to assign real numbers to
individuals using concrete features, which are interpreted as partial functions.
This is done using ABox statements of the form g(a, r) in the ABox, where a is
an individual, g a concrete feature and r ∈ R a real number.
    EL(R> ) further extends EL by concepts of the form ∃g.>r , where g is is a
concrete feature and r ∈ R, to describe objects whose concrete feature g has a
value larger than r. ALC(R) allows to compare different concrete features reach-
able via paths along abstract features, which are just functional roles. Namely,
4

                                            EL(R> )                    ALC(R)
                                         AQs     UCQs               AQs     UCQs
         Data complexity                  P          P             coNP     coNP
         Combined Complexity              P         NP            ExpTime ExpTime

                   Table 1. Complexity of classical query entailment.


ALC(R) extends ALC with concepts of the form ∃g.⊕r and ∃(u1 , u2 ).⊕, where g
is a concrete feature, ⊕ ∈ {<, =, >}, r ∈ R, and u1 ,u2 are paths of the form
s1 . . . sn g, where si are abstract features and g is a concrete feature. For exam-
ple, in ALC(R), we can give a characterisation of patients with HBP such as the
following:

    ∃(diastolicBP, belongsToAgeGroup maxDiastolicBP).> v PatientWithHBP.

This definition assumes patients to be related to an age group via the abstract
feature belongsToAgeGroup, which has an assigned maximal diastolic blood pres-
sure. If the patient has a diastolic blood pressure that is above this value, he is a
patient with HBP. The more expressive concrete domain R used by ALC(R) ad-
mits ExpTime reasoning, while even small extensions lead to undecidability [26].
In contrast, the concrete domain R> used in EL(R> ) is a convex domain, which
allows to perform standard reasoning tasks in polynomial time [5]. For more
details on EL(R> ) and ALC(R), we refer to to [6, 26, 5] or the extended version
of this paper. We assume both ABoxes and TBoxes to contain only a finite set
of axioms and ABoxes to contain no complex concepts, and we do not impose a
unique name assumption.
    As main reasoning task, we consider entailment of atomic queries (AQs),
conjunctive queries (CQs) and unions of conjunctive queries (UCQs), defined as
for description logics without concrete domains [19]. We allow for constants and
complex concepts in the query, but for concrete features only inside concepts.
This means that our queries can only express relations between concrete features
that can be captured by a concept in our language. For example, the FOL formula

      ∃y1 , y2 , z1 , z2 : s1 (x, y1 ) ∧ g1 (y1 , z1 ) ∧ s2 (x, y2 ) ∧ g2 (y2 , z2 ) ∧ z1 < z2 .

can be captured the query ∃(s1 g1 , s2 g2 ).<(x), but only given s1 , s2 are abstract
features, g1 , g2 concrete features, and < is a predicate of the concrete domain.
As a foundation for our results on probabilistic knowledge bases, we analyse in
the technical report the data and combined complexities of query entailment
for the logics considered. Here, we assume all numbers to be represented in
binary. Our complexity analysis only concerns knowledge bases that have a finite
representation, which by this assumption are those in which each number can
be represented with a finite number of bits.
    An overview of the complexities is shown in Table 1. The results are based
on and match the corresponding results on DLs without concrete domains [34,
32, 11, 28]. Since the corresponding lower bounds are the same for CQs as for
UCQs, we do not include the results for CQs in the table.
                                                                                   5

3   Probabilistic Knowledge Bases with Continuous
    Probability Distributions
We want to represent both, discrete probabilities of assertions and continuous
probability distributions of values of concrete features. As we can simply as-
sign a probability of 1 to assertions that are certain, there is no need to handle
certain assertions separately. A discrete probability assertion assigns a minimal
probability to a classical assertion. This corresponds to the approach taken by
tuple-independent probabilistic database systems [14], where probabilities are as-
signed to database and to ipABoxes introduced in [23]. For example, the fact
that “Otto has a finding that is Diabetes with a probability of at least 0.7” is
expressed by the two assertions finding(otto, f1) : 1 and Diabetes(f1) : 0.7.
    Note that discrete probability assertions state a lower bound on the probabil-
ity, rather than the actual probability, and that statistical independence is only
assumed on this lower bound. This way, it is consistent to have the assertions
A(a) : 0.5, B(a) : 0.5 together with the axiom A v B in the knowledge base.
Under our semantics, the probability of B(a) is then higher than 0.5, since this
assertion can be entailed due to two different, statistically independent state-
ments in the ABox. Namely, we would infer that the probability of B(a) is at
least 0.75 (compare also with [23]).
    While for symbolic facts, assigning discrete probabilities is sufficient, for nu-
merical values this is not necessarily the case. For example, if the blood pressure
of a patient follows a continuous probability distribution, the probability of it to
have any specific value is 0. For this reason, in a continuous probability assertion,
we connect the value of a concrete feature with a probability density function.
This way, the fact that “the diastolic blood pressure of Otto follows a normal dis-
tribution with an expected value of 84.5 and a variance of 2” can be expressed
by the assertion diastolicBloodPressure(otto) ∼ norm(84.5, 2). In addition to a
concrete domain D, the DLs introduced in this section are parametrised with a
set P of probability density functions (pdfs), i.e, Lebesgue-integrable    functions
f : A → R+ , with A ⊆ R being Lebesgue-measurable, such that A f (x) dx=1 [1].
                                                                  R

Example 1. As a typical set of probability density functions [1], we define the
set Pex that contains the following functions, which are parametrised with the
numerical constants µ, ω, λ, a, b ∈ Q, with λ > 0 and a > b:
normal distribution with mean µ and variance ω:
                                              2
   norm(µ, ω) : R → R+ , x 7→ √2πω  1
                                       e−(x−µ) /2ω ,
exponential distribution with mean λ:
   exp(λ) : R+ → R+ , x 7→ λe−λx ,
uniform distribution between a and b:
                                       1
   uniform(a, b) : [a, b] → R+ , x 7→ b−a .
Next, we define probabilistic KBs, which consist of a classical TBox and a set of
probability assertions.
Definition 1. Let L ∈ {EL(R> ), ALC(R)} and P be a set of pdfs. A probabilistic
LP ABox is a finite set of expressions of the form α : p and g(a) ∼ f , where α is
6

an L assertion, p ∈ [0, 1] ∩ D,1 g ∈ NcF , a ∈ Ni , and f ∈ P. A probabilistic LP
KB is a tuple K = (T , A), where T is an L TBox and A is a probabilistic LP
ABox. If P = ∅, K and A are called discrete, and if P =      6 ∅, they are called
continuous.

3.1     Semantics of Probabilistic Knowledge Bases
As typical for probabilistic DLs and databases, we define the semantics using a
possible worlds semantics. In probabilistic systems that only use discrete prob-
abilities, the possible world semantics can be defined based on finite sets of
non-probabilistic data sets, the possible worlds, each of which is assigned a prob-
ability [14, 23, 29]. The probability that a query q is entailed then corresponds
to the sum of the probabilities of the possible worlds that entail q. If continuous
probability distributions are used, this approach is insufficient. For example, if
the KB contains the assertion diastolicBP(p) ∼ norm(84.5, 2), the probability of
diastolicBP(p, x) should be 0 for every x ∈ R. Therefore, we cannot obtain the
probability of diastolicBP(p) > 85 by just adding the probabilities of the possible
worlds that entail diastolicBP(p, x) for some x > 85. To overcome this problem,
we assign probabilities to (possibly uncountable) sets of possible worlds, rather
than to single possible worlds. Specifically, we define the semantics using contin-
uous probability measure spaces [1]. A measure space is a tuple M = (Ω, Σ, µ)
with Σ ⊆ 2Ω and µ : Σ → R such that
 1. Ω ∈ Σ and Σ is closed under complementation, countable unions and count-
    able intersections,
 2. µ(∅)
      S = 0, andP
 3. µ( E∈Σ 0 ) = E∈Σ 0 µ(f ) for every countable set Σ 0 ⊆ Σ of pair-wise disjoint
    sets.
If additionally µ(Ω) = 1, M is a probability measure space.
    We define a probability measure space MA = (ΩA , ΣA , µA ) that captures the
relevant probabilities in a probabilistic ABox A, similar to how it is done in [23]
for discrete probabilistic ABoxes. For this, we introduce the three components
ΩA , ΣA and µA one after another. For simplicity, we assume all pdfs f : A →
R ∈ P to be extended to the full real line by setting f (x) = 0 for all x ∈ R \ A.
    Given a probabilistic ABox A, the set of possible worlds for A, in symbols ΩA ,
consists of all classical ABoxes w such that for every g(a) ∼ f ∈ A, w contains
g(a, x) for some x ∈ R, and for every axiom α ∈ w, either α : p ∈ A, or α is
of the form g(a, x) and g(a) ∼ f ∈ A. For w ∈ ΩA , we write w |= g(a)⊕x,
x ∈ R, ⊕ ∈ {<, ≤, =, ≥, >}, iff w |= g(a, y) and y⊕x. We write w |= g(a)⊕h(b)
iff w |= g(a, y), h(b, z) and y⊕z. We abbreviate w |= g(a) ≥ x, g(a) ≤ y by
w |= g(a) ∈ [x, y]. The event space over ΩA , in symbols ΣA , is now the smallest
subset ΣA ⊆ 2ΩA that satisfies the following conditions:
 1. ΩA ∈ ΣA ,
1
    Here, the set D ⊆ R denotes the dyadic rationals, that is, the set of all real numbers
    that have a finite number of bits after the binary point.
                                                                                    7

 2. for every α : p ∈ A, {w ∈ ΩA | α ∈ w} ∈ ΣA ,
 3. for every g(a) ∼ f ∈ A, x ∈ R, {w ∈ ΩA | w |= g(a) < x} ∈ ΣA ,
 4. for every g1 (a1 ) ∼ f1 , g2 (b) ∼ f2 ∈ A, {w ∈ ΩA | w |= g1 (a) < g2 (b)} ∈ ΣA ,
    and
 5. ΣA is closed under complementation, countable unions and countable inter-
    sections.
The conditions ensure that for every query q and TBox T , the set of possible
worlds w such that (T , w) |= q is included in ΣA . To complete the definition
of the measure space, we now assign probabilities to these sets via the measure
function µA . This function has to respect the probabilities expressed by the
discrete and continuous probability assertions in A, as well as the assumption
that these probabilities are statistically independent. We define µA explicitly for
sets of possible worlds that are selected by the assertions in them, and by upper
bounds on the concrete features occurring in continuous probability assertions.
By additionally requiring that Condition 3 in the definition of measure spaces is
satisfied for µA , this is sufficient to fix the probability for any set in ΣA .
    Given a probabilistic ABox A, we denote by cl-ass(A) = {α | α : p ∈ A} the
classical assertions occurring in A. A bound set for A is a set B of inequations of
the form g(a) < x, x ∈ R, where g(a) ∼ f ∈ A and every concrete feature g(a)
occurs at most once in B. Given a set E ⊆ cl-ass(A) of assertions from A and
                                                                E,B
a bound set B for A, we define the corresponding set ΩA             of possible worlds
in ΩA as
                     E,B
                   ΩA     = {w ∈ ΩA | w ∩ cl-ass(A) = E, w |= B}.
The probability measure space for A is now the probability measure space MA =
(ΩA , ΣA , µA ), such that for every E ⊆ cl-ass(A) and every bound set B for A,

                   E,B
                           Y        Y               Y Z x
            µA (ΩA     )=       p·      (1 − p) ·             f (y) dy.
                          α:p∈A    α:p∈A          g(a)∼f ∈A   −∞
                           α∈E      α6∈E          g(a)r )(a)) =
R∞
  r
     f (x) dy for all r ∈ R. Our feasibility conditions ensure that the complexity
of approximating integrals does not dominate the overall complexity of proba-
bilistic query entailment.
    We first recall some notions from the complexity theory of real functions by
Ker-I Ko [24], which identifies computability of real numbers x ∈ R and functions
f : A → R, A ⊆ R, with the computability of n-bit approximations hxin and
hf (x)in , where n is given in unary encoding. Since real function arguments have
no finite representation in general, computable real functions are modelled as
function oracle Turing machines T φ(x) , where the oracle φ(x) represents the
function argument x and can be queried for n-bit approximations hxin in time
linear in c + n, where c is the number of bits in x before the binary point.
Given a precision n in unary encoding on the input tape, T φ(x) then writes a
number hf (x)in on the output tape. This formalism leads to a natural definition
of computability and complexity of real numbers and real functions. Namely, a
real number x ∈ R is P-computable iff there is a polynomial time Turing machine
                                                                                     9

that computes a function φ : N 7→ D s.t. φ(n) = hxin . A function f : A → R,
A ⊆ R, is P-computable iff there is a function oracle Turing machine T φ(x) as
above that computes for all x ∈ A a function ψ : N 7→ D with ψ(n) = hf (x)in in
time polynomial in n and the number of bits in x before the binary point.
    An important property of P-computable functions f that we use in the next
section is that they have a monotone and polynomial modulus of continuity
(modulus), that is, a monotone, polynomial function ωf : N → N s.t. for all
n ∈ N and x, y ∈ [2−n , 2n ], |x − y| < 2−ωf (n) implies |f (x) − f (y)| < 2−n [22, 24,
Chapter 3].
                               R1
    Approximating integrals 0 f (x) dx of P-computable functions f : [0, 1] → R
is #·P-complete [24, Chapter 5]. To be able to integrate over unbounded integrals
in #·P, we introduce an additional condition.
Definition 3. A probability density function f is #·P-admissible iff it satisfies
the following conditions:
1. f is P-computable, and
2. there is a monotone polynomial function δf : N → N such that for all n ∈ N:
                                   Z 2δf (n)
                              1−               f (x) dx < 2−n .
                                    −2δf (n)

Condition 2 allows us to reduce integration over unbounded integrals to inte-
gration over bounded integrals: to obtain a precision of n bits, it is sufficient to
integrate inside the interval [−2δf (n) , 2δf (n) ]. Note that as a consequence of Con-
dition 1, there is also a polynomial ρf : N → N s.t. for all x ∈ [−2δf (n) , 2δf (n) ],
f (x) < 2ρf (n) . Otherwise, approximations of f (x) would require a number of bits
that is not polynomially bounded by the number of bits in x before the binary
point, and could thus not be computed in polynomial time. We call δf and ρf
in the definition above respectively bounding function and range function of f .
In the following, we assume that for any set P of #·P-admissible pdfs, their
moduli, bounding functions and range functions are known.
    The above properties are general enough to be satisfied by most common pdfs.
Specifically, we have the following lemma for the set Pex defined in Example 1:
Lemma 1. Every function in Pex is #·P-admissible.


5    Complexity of Probabilistic Query Answering
We study the complexity of probabilistic query answering for KBs with #·P-
admissible pdfs. As often in probabilistic reasoning, counting complexity classes
play a central role in our study. However, strictly speaking, these are defined
for computation problems for natural numbers. To get a characterisation for
probabilistic query answering, which computes real numbers, we consider corre-
sponding counting problems. Their solutions are obtained by, intuitively, shifting
the binary point of the query probability to the right to obtain a natural number.
We first recall counting complexity classes following [20].
10

Definition 4. Let C be a class of decision problems. Then, #·C describes the
class of functions f : A → N such that
                               
                       f (x) = y | R(x, y) ∧ |y| < p(|x|)

for some C-decidable relation R and polynomial function p.

Relevant for us are the counting complexity classes #·P, #·NP and #·coNP.
The class #·P is also known as #P. The following inclusions are known: #·P ⊆
#·NP ⊆ #·coNP ⊆ FPSpace [20]. In order to characterise the complexity of
probabilistic query answering using counting classes, we consider corresponding
counting problems, inspired by [24, Chapter 5] and [14]. For a function f : A →
D, we call g : A → N a corresponding counting problem if g(x) = 2p(x) f (x) for all
x ∈ A, where p : A → N and p can be computed in unary in polynomial time.2
    For discrete probabilistic KBs (T , A), we can define a counting problem cor-
responding to probabilistic query entailment that is solved by counting possible
worlds in the probability measure space for A. To obtain the probability of a
query q, we count those possible worlds w ∈ ΩA for which (T , w) |= q, and we
count each possible world several times depending on its probability. If C is the
respective complexity of classical query entailment, this counting problem can
be easily established as being in #·C.
    For continuous probabilistic KBs, this approach cannot be directly employed,
because here the set of possible worlds is uncountable. Namely, for every con-
tinuous probability assertion g(a) ∼ f ∈ A and every real number r ∈ R, there
is a possible world w ∈ ΩA s.t. g(a, r) ∈ w. To overcome this, we define an
approximated measure space, based on the precision parameter n, in which every
assertion g(a, r) uses at most a polynomial number of bits. As a result, the num-
ber of possible worlds becomes finite, and each possible world can be assigned
a positive probability. We can now compute the approximated probability of a
query by summing up the probabilities of each possible world that entails this
query. As shown in more detail in [7, 8], by carefully taking into account the
precision parameter n and properties of #·P-admissible pdfs, it is possible to
define such an approximated measure space so that the overall approximation
error on every query is bounded by 2−n , and that the probability of each pos-
sible world can be computed in polynomial time. Using this, one can define a
corresponding counting problem for probabilistic query answering that is in #·C,
where C is the respective complexity of classical query answering, and obtain the
complexity upper bounds shown in Table 2. The ExpTime-bounds follow from
the fact that the number of possible worlds in the approximated measure space
is exponentially bounded.
    Hardness for all complexities already holds for discrete probabilistic KBs,
so that continuous, #·P-admissible probability distributions do not increase the
complexity of probabilistic query answering. A general #·P-lower bound follows
2
     Note that the counting complexity classes considered here are all closed under this
     operation. To see this, consider f and g characterized by the relations R and R0 s.t.
     R0 = {(x, y#z) | R(x, y), z ∈ {0, 1}∗ , |z| = p(x)}. Clearly, g(x) = 2p(x) f (x).
                                                                                   11

                                      EL(R> )P               ALC(R)P
                                AQs           UCQs        AQs      UCQs
    Data complexity             #·P            #·P      #·coNP    #·coNP
    Combined Complexity         #·P           #·NP      ExpTime   ExpTime

Table 2. Complexities of counting problems corresponding to prob. query entailment.


from the corresponding complexity of probabilistic query entailment in prob-
abilistic databases [14], while for the combined complexities in ALC(R)P , the
lower bound follows from the non-probabilistic case. For the remaining complex-
ities, we provide matching lower bounds in the technical report using appropriate
reductions. Specifically, we show #·NP-hardness under subtractive reductions
for EL w.r.t. combined complexity, and #·coNP-hardness under parsimonious
reductions for ALC w.r.t data complexity [17].


6    Conclusion

When numerical data are of an uncertain nature, such as data obtained by sen-
sor readings or video tracking, they can often be more precisely represented us-
ing continuous probability distributions than using discrete distributions. While
there is work on OBQA for discrete probabilistic KBs in DL-Lite and EL [23], this
is the first work that considers KBs with concrete domains and continuous prob-
ability distributions. For our complexity analysis, we devised a set of feasibility
conditions for probability distributions based on the complexity theory of real
functions, which captures most typical distributions one might encounter in re-
alistic applications. We show that under these conditions, continuous probability
distributions do not increase the complexity of probabilistic query entailment.
Using a similar technique as in [24, Chapter 5], our results can likely be ex-
tended to a wider class of probability distributions, where the requirement of
P-computability is weakened to polynomial approximability.
    For light-weight description logics, it is often possible to rewrite queries w.r.t
the ontology, so that they can be answered directly by a corresponding database
system. As there are probabilistic database systems like Orion 2.0 that support
continuous probability distributions [35], query rewriting techniques for con-
tinuous probabilistic KBs could be employed in our setting as well. For more
expressive DLs, a practical implementation could be based on a less fine-grained
representation of measure spaces, for which relevant intervals for each concrete
feature value are determined based on the concrete domain predicates in the
TBox. Probabilities could then be computed using standard algorithms for nu-
merical integration. It might also be worth investigating whether Monte-Carlo
approximations can be used for practical implementations. However, as observed
in [23], this might be hard to accomplish already for discrete probabilistic EL
KBs. Another basis for practical implementations could be approximation tech-
niques developed for other logical frameworks involving continuous probability
distributions, such as the one presented in [9].
12

References

 1. Adams, M.R., Guillemin, V.: Measure theory and probability. Springer (1996)
 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
    and relations. J. Artif. Intell. Res. 36, 1–69 (2009)
 3. Artale, A., Ryzhikov, V., Kontchakov, R.: DL-Lite with attributes and datatypes.
    In: Proc. ECAI’12. pp. 61–66. IOS Press (2012)
 4. Baader, F., Borgwardt, S., Lippmann, M.: Query rewriting for DL-Lite with n-ary
    concrete domains (2017), to appear in Proc. IJCAI’17.
 5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. IJCAI’05. pp.
    364–369. Professional Book Center (2005)
 6. Baader, F., Hanschke, P.: A scheme for integrating concrete domains into concept
    languages. In: Proc. IJCAI’91. pp. 452–457 (1991)
 7. Baader, F., Koopmann, P., Turhan, A.Y.: Using ontologies to query probabilistic
    numerical data. In: Proc. FroCoS’17. Springer (2017), to appear
 8. Baader, F., Koopmann, P., Turhan, A.Y.: Using ontologies to query probabilis-
    tic numerical data (extended version). LTCS-Report 17-05, Chair for Automata
    Theory, Technische Universität Dresden, Germany (2017), see https://lat.inf.tu-
    dresden.de/research/reports.html
 9. Belle, V., Van den Broeck, G., Passerini, A.: Hashing-based approximate proba-
    bilistic inference in hybrid domains: An abridged report. In: Proc. IJCAI’16. pp.
    4115–4119 (2016)
10. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. Autom. Reas. 39(3), 385–429 (2007)
11. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Data com-
    plexity of query answering in description logics. Artificial Intelligence 195, 335 –
    360 (2013)
12. Ceylan, İ.İ., Lukasiewicz, T., Peñaloza, R.: Complexity results for probabilistic
    datalog±. In: Proc. ECAI’16. pp. 1414–1422. IOS Press (2016)
13. Ceylan, İ.İ., Peñaloza, R.: Probabilistic query answering in the Bayesian description
    logic BEL. In: Proc. SUM’15. pp. 21–35. Springer (2015)
14. Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges.
    In: Proc. SIGMOD’07. pp. 1–12. ACM (2007)
15. D’Amato, C., Fanizzi, N., Lukasiewicz, T.: Tractable reasoning with Bayesian de-
    scription logics. In: Proc. SUM’08. pp. 146–159. Springer (2008)
16. Dargie, W.: The role of probabilistic schemes in multisensor context-awareness. In:
    Proc. PerCom’07. pp. 27–32. IEEE (2007)
17. Durand, A., Hermann, M., Kolaitis, P.G.: Subtractive reductions and complete
    problems for counting complexity classes. Theoretical Computer Science 340(3),
    496–513 (2005)
18. Elkin, P.L., Brown, S.H., Husser, C.S., Bauer, B.A., Wahner-Roedler, D., Rosen-
    bloom, S.T., Speroff, T.: Evaluation of the content coverage of SNOMED CT:
    ability of SNOMED clinical terms to represent clinical problem lists. Mayo Clin.
    Proc. 81(6), 741–748 (2006)
19. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Conjunctive query answering for the
    description logic SHIQ. J. Artif. Intell. Res. (JAIR) 31, 157–204 (2008)
20. Hemaspaandra, L.A., Vollmer, H.: The satanic notations: counting classes beyond
    #P and other definitional adventures. ACM SIGACT News 26(1), 2–13 (1995)
                                                                                      13

21. Hernich, A., Lemos, J., Wolter, F.: Query answering in DL-Lite with datatypes: A
    non-uniform approach. In: Proc. AAAI’17 (2017)
22. Hoover, H.J.: Feasible real functions and arithmetic circuits. SIAM Journal on
    Computing 19(1), 182–204 (1990)
23. Jung, J.C., Lutz, C.: Ontology-based access to probabilistic data with OWL QL.
    In: Proc. ISWC’12. pp. 182–197. Springer (2012)
24. Ko, K.I.: Complexity Theory of Real Functions. Birkhäuser (1991)
25. Kumar, N., Khunger, M., Gupta, A., Garg, N.: A content analysis of smartphone–
    based applications for hypertension management. Journal of the American Society
    of Hypertension 9(2), 130–136 (2015)
26. Lutz, C.: Adding numbers to the SHIQ description logic—first results. In: Proc.
    KR’01. pp. 191–202. Citeseer (2001)
27. Lutz, C.: Description logics with concrete domains—a survey. In: Advances in
    Modal Logic 4. pp. 265–296. King’s College Publications (2002)
28. Lutz, C.: The complexity of conjunctive query answering in expressive description
    logics. In: Proc. IJCAR’08. pp. 179–193. Springer (2008)
29. Lutz, C., Schröder, L.: Probabilistic description logics for subjective uncertainty.
    In: Proc. KR’10. pp. 393–403. AAAI Press (2010)
30. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description
    logic EL using a relational database system. In: Proc. IJCAI’09. pp. 2070–2075.
    IJCAI/AAAI (2009)
31. Rector, A., Gangemi, A., Galeazzi, E., Glowinski, A., Rossi-Mori, A.: The GALEN
    CORE model schemata for anatomy: Towards a re-usable application-independent
    model of medical concepts. In: Proc. MIE’94. pp. 229–233 (1994)
32. Rosati, R.: On conjunctive query answering in EL. In: Proc. DL’07. pp. 451–458.
    CEUR-WS.org (2007)
33. Savković, O., Calvanese, D.: Introducing datatypes in DL-Lite. In: Proc. ECAI’12.
    pp. 720–725 (2012)
34. Schild, K.: A correspondence theory for terminological logics: Preliminary report.
    In: Mylopoulos, J., Reiter, R. (eds.) Proc. IJCAI’91. pp. 466–471. Morgan Kauf-
    mann (1991)
35. Singh, S., Mayfield, C., Mittal, S., Prabhakar, S., Hambrusch, S., Shah, R.: Orion
    2.0: native support for uncertain data. In: Proc. SIGMOD’08. pp. 1239–1242. ACM
    (2008)
36. Thrun, S., Burgard, W., Fox, D.: A probabilistic approach to concurrent mapping
    and localization for mobile robots. Autonomous Robots 5(3-4), 253–271 (1998)
37. Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM computing
    surveys (CSUR) 38(4), 13 (2006)