Probabilistic Ontologies in Datalog+/-

              Fabrizio Riguzzi, Elena Bellodi, and Evelina Lamma

        ENDIF – University of Ferrara, Via Saragat 1, I-44122, Ferrara, Italy
          {fabrizio.riguzzi,elena.bellodi,evelina.lamma}@unife.it


      Abstract. In logic programming the distribution semantics is one of the
      most popular approaches for dealing with uncertain information. In this
      paper we apply the distribution semantics to the Datalog+/- language
      that is grounded in logic programming and allows tractable ontology
      querying. In the resulting semantics, called DISPONTE, formulas of a
      probabilistic ontology can be annotated with an epistemic or a statistical
      probability. The epistemic probability represents a degree of confidence
      in the formula, while the statistical probability considers the populations
      to which the formula is applied. The probability of a query is defined in
      terms of finite set of finite explanations for the query. We also compare
      the DISPONTE approach for Datalog+/- ontologies with that of Prob-
      abilistic Datalog+/- where an ontology is composed of a Datalog+/-
      theory whose formulas are associated to an assignment of values for the
      random variables of a companion Markov Logic Network.


1   Introduction

Representing uncertain information is fundamental for ensuring that the Seman-
tic Web achieves its full potential [36, 29, 25]. Ontologies are a decisive component
of the Semantic Web and recently Datalog was extended for modeling ontologies
[5, 6]. Answering conjunctive queries in the resulting language, Datalog+/-, has
a polynomial data complexity, thus allowing tractable query answering.
     Probabilistic Datalog+/- [15, 14] has been proposed for representing uncer-
tainty in Datalog+/-. In this approach an ontology is composed of a Datalog+/-
theory and a Markov Logic Network (MLN) [30] and each Datalog+/- formula
is associated to an assignment of values to (a subset of) the random variables
that are modeled by the MLN. This assignment, called scenario, controls the ac-
tivation of the formulas: they hold only in worlds where the scenario is satisfied.
     In the field of logic programming, the distribution semantics [35] has emerged
as one of the most effective approaches for integrating logic and probability and
underlies many languages such as PRISM [35], ICL [28], Logic Programs with
Annotated Disjunctions [37] and ProbLog [11]. In this semantics the clauses of a
probabilistic logic program contain alternative choices annotated with probabil-
ities. Each grounding of a probabilistic clause represents a random variable that
can assume a value from the finite set of alternatives. In order to compute the
probability of a query, its explanations have to be found, where an explanation is
a set of choices that ensure the entailment of the query. The set of explanations
must be covering, i.e., it must represent all possible ways of entailing the query.
The probability is computed from a covering set of explanations by solving a
disjoint sum problem, either using an iterative splitting algorithm [28] or Binary
Decision Diagrams [19, 31].
    In Bellodi et al. [1] we have applied the distribution semantics to ontol-
ogy languages based on Description Logic. We called the approach DISPONTE
for “DIstribution Semantics for Probabilistic ONTologiEs” (Spanish for “get
ready”). In this paper we apply DISPONTE to Datalog+/-. The idea is to an-
notate formulas of a theory with a probability and assume that each formula
is independent of the others. Moreover, we extend [1] by allowing two types of
probabilistic annotations, an epistemic type, that represents a degree of belief in
the formula as a whole, and a statistical type, that considers the populations to
which the formula is applied. While in the first case the choice is whether to in-
clude or not a formula in an explanation, in the latter case the choice is whether
to include instantiations of the formula for specific individuals. The probability
of a query is again computed from a covering set of explanations by solving the
disjoint sum problem. We call the resulting language DISPONTE Datalog+/-.
    The paper is organized as follows. Section 2 provides some preliminaries
on Datalog+/-. Section 3 presents DISPONTE Datalog+/-. Section 4 describes
related work and Section 5 concludes the paper.


2    Datalog+/-

Datalog+/- extends Datalog by allowing existential quantifiers, the equality
predicate and the truth constant false in rule heads. Datalog+/- can be used for
representing lightweight ontologies and is able to express the DL-Lite family of
ontology languages [5]. By suitably restricting the language syntax, Datalog+/-
achieves tractability [4].
    In order to describe Datalog+/-, let us assume (i) an infinite set of data
constants ∆, (ii) an infinite set of labeled nulls ∆N (used as “fresh” Skolem
terms), and (iii) an infinite set of variables ∆V . Different constants represent
different values (unique name assumption), while different nulls may represent
the same value. We assume a lexicographic order on ∆∪∆N , with every symbol in
∆N following all symbols in ∆. We denote by X vectors of variables X1 , . . . , Xk
with k ≥ 0. A relational schema R is a finite set of relation names (or predicates).
A term t is a constant, null or variable. An atomic formula (or atom) has the form
p(t1 , . . . , tn ), where p is an n-ary predicate, and t1 , . . . , tn are terms. A database
D for R is a possibly infinite set of atoms with predicates from R and arguments
from ∆∪∆N . A conjunctive query (CQ) over R has the form q(X) = ∃YΦ(X, Y),
where Φ(X, Y) is a conjunction of atoms having as arguments variables X and
Y and constants (but no nulls). A Boolean CQ (BCQ) over R is a CQ having
head predicate q of arity 0 (i.e., no variables in X).
    We often write a BCQ as the set of all its atoms, having constants and
variables as arguments, and omitting the quantifiers. Answers to CQs and BCQs
are defined via homomorphisms, which are mappings µ : ∆ ∪ ∆N ∪ ∆V → ∆ ∪
∆N ∪∆V such that (i) c ∈ ∆ implies µ(c) = c, (ii) c ∈ ∆N implies µ(c) ∈ ∆∪∆N ,
and (iii) µ is naturally extended to term vectors, atoms, sets of atoms, and
conjunctions of atoms. The set of all answers to a CQ q(X) = ∃YΦ(X, Y) over
a database D, denoted q(D), is the set of all tuples t over ∆ for which there
exists a homomorphism µ : X ∪ Y → ∆ ∪ ∆N such that µ(Φ(X, Y)) ⊆ D and
µ(X) = t. The answer to a BCQ q = ∃YΦ(Y) over a database D, denoted q(D),
is Yes, denoted D |= q, iff there exists a homomorphism µ : Y → ∆ ∪ ∆N such
that µ(Φ(Y)) ⊆ D, i.e., if q(D) 6= ∅.
    Given a relational schema R, a tuple-generating dependency (or TGD) F is
a first-order formula of the form ∀X∀YΦ(X, Y) → ∃ZΨ (X, Z), where Φ(X, Y)
and Ψ (X, Z) are conjunctions of atoms over R, called the body and the head of
F , respectively. Such F is satisfied in a database D for R iff, whenever there
exists a homomorphism h such that h(Φ(X, Y)) ⊆ D, there exists an extension
h0 of h such that h0 (Ψ (X, Z)) ⊆ D. We usually omit the universal quantifiers in
TGDs. A TGD is guarded iff it contains an atom in its body that involves all
variables appearing in the body.
    Query answering under TGDs is defined as follows. For a set of TGDs T on R,
and a database D for R, the set of models of D given T , denoted mods(D, T ),
is the set of all (possibly infinite) databases B such that D ⊆ B and every
F ∈ T is satisfied in B. The set of answers to a CQ q on D given T , denoted
ans(q, D, T ), is the set of all tuples t such that t ∈ q(B) for all B ∈ mods(D, T ).
The answer to a BCQ q over D given T is Yes, denoted D ∪ T |= q, iff B |= q
for all B ∈ mods(D, T ).
    A Datalog+/- theory may contain also negative constraints (or NC), which
are first-order formulas of the form ∀XΦ(X) → ⊥, where Φ(X) is a conjunction
of atoms (not necessarily guarded). The universal quantifiers are usually left
implicit.
    Equality-generating dependencies (or EGDs) are the third component of a
Datalog+/- theory. An EGD F is a first-order formula of the form ∀XΦ(X) →
Xi = Xj , where Φ(X), called the body of F , is a conjunction of atoms, and
Xi and Xj are variables from X. We call Xi = Xj the head of F . Such F is
satisfied in a database D for R iff, whenever there exists a homomorphism h such
that h(Φ(X)) ⊆ D, it holds that h(Xi ) = h(Xj ). We usually omit the universal
quantifiers in EGDs.
    The chase is a bottom-up procedure for deriving atoms entailed by a database
and a Datalog+/- theory. The chase works on a database through the so-called
TGD and EGD chase rules. Given a relational database D for a schema R, and
a TGD F on R of the form ∀X∀YΦ(X, Y) → ∃ZΨ (X, Z), F is applicable to
D if there is a homomorphism h that maps the atoms of Φ(X, Y) to atoms of
D. Let F be applicable and h1 be a homomorphism that extends h as follows:
for each Xi ∈ X, h1 (Xi ) = h(Xi ); for each Zj ∈ Z, h1 (Zj ) = zj , where zj is
a “fresh” null, i.e., zj ∈ ∆N , zj 6∈ D, and zj lexicographically follows all other
labeled nulls already introduced. The result of the application of the TGD chase
rule for F is the addition to D of all the atomic formulas in h1 (Ψ (X, Z)) that
are not already in D.
     The EGD chase rule is defined as follows. An EGD F on R of the form
Φ(X) → Xi = Xj is applicable to a database D for R iff there exists a homo-
morphism h : Φ(X) → D such that h(Xi ) and h(Xj ) are different and not both
constants. If h(Xi ) and h(Xj ) are different constants in ∆, then there is a hard
violation of F . Otherwise, the result of the application of F to D is the database
h(D) obtained from D by replacing every occurrence of a non-constant element
e ∈ {h(Xi ), h(Xj )} in D by the other element e0 (if e and e0 are both nulls, then
e precedes e0 in the lexicographic order).
     The chase algorithm consists of an exhaustive application of the TGD and
EGD chase rules that may lead to an infinite result. The chase rules are applied
iteratively, in each iteration (1) a single TGD is applied once and then (2) the
EGDs are applied until a fix point is reached. EGDs are assumed to be separable
[7]. Intuitively, separability holds whenever: (i) if there is a hard violation of an
EGD in the chase, then there is also one on the database w.r.t. the set of EGDs
alone (i.e., without considering the TGDs); and (ii) if there is no hard violation,
then the answers to a BCQ w.r.t. the entire set of dependencies equals those
w.r.t. the TGDs alone (i.e., without the EGDs).
     The two problems of CQ and BCQ evaluation under TGDs and EGDs are
logspace-equivalent [6]. Moreover, query answering under TGDs is equivalent
to query answering under TGDs with only single atoms in their heads [4]. Hence-
forth, we focus only on the BCQ evaluation problem and we assume that every
TGD has a single atom in its head. A BCQ q on a database D, a set TT of TGDs
and a set TE of EGDs can be answered by performing the chase and checking
whether the query is entailed by the extended database that is obtained. In this
case we write D ∪ TT ∪ TE |= q.

Example 1. Let us consider the following ontology for a real estate information
extraction system, a slight modification of the one presented in Gottlob et al.
[15]:
    F1 = ann(X, label), ann(X, price), visible(X) → priceElem(X)
If X is annotated as a label, as a price and is visible, then it is a price element.
    F2 = ann(X, label), ann(X, priceRange), visible(X) → priceElem(X)
If X is annotated as a label, as a price range, and is visible, then it is a price
element.
    F3 = priceElem(E), group(E, X) → f orSale(X)
If E is a price element and is grouped with X, then X is for sale.
    F4 = f orSale(X) → ∃P price(X, P )
If X is for sale, then there exists a price for X.
    F5 = hasCode(X, C), codeLoc(C, L) → loc(X, L)
If X has postal code C, and C’s location is L, then X’s location is L.
    F6 = hasCode(X, C) → ∃LcodeLoc(C, L), loc(X, L)
If X has postal code C, then there exists L such that C has location L and so
does X.
    F7 = loc(X, L1), loc(X, L2) → L1 = L2
If X has the locations L1 and L2, then L1 and L2 are the same.
    F8 = loc(X, L) → advertised(X)
If X has a location L then X is advertised.
    Suppose we are given the database

    codeLoc(ox1, central), codeLoc(ox1, south), codeLoc(ox2, summertown)
    hasCode(prop1, ox2), ann(e1, price), ann(e1, label), visible(e1),
    group(e1, prop1)

The atomic BCQs priceElem(e1), f orSale(prop1) and advertised(prop1) eval-
uate to true, while the CQ loc(prop1, L) has answers q(L) = {summertown}.
In fact, even if loc(prop1, z1 ) with z1 ∈ ∆N is entailed by formula F5 , for-
mula F7 imposes that summertown = z1 . If F7 were absent then q(L) =
{summertown, z1 }.

Answering BCQs q over databases and ontologies containing NCs can be per-
formed by first checking whether the BCQ Φ(X) evaluates to false for each NC
of the form ∀XΦ(X) → ⊥. If one of these checks fails, then the answer to the
original BCQ q is positive, otherwise the negative constraints can be simply
ignored when answering the original BCQ q.
    A guarded Datalog+/- ontology is a quadruple (D, TT , TC , TE ) consisting of
a database D, a finite set of guarded TGDs TT , a finite set of negative constraints
TC and a finite set of EGDs TE that are separable from TT . The data complexity
(i.e., the complexity where both the query and the theory are fixed) of evaluating
BCQs relative to a guarded Datalog+/- theory is polynomial [4].
    In the case in which the EGDs are key dependencies and the TGDs are
inclusion dependencies, Calı̀ et al. [8] proposed a backward chaining algorithm
for answering BCQ. A key dependency κ is a set of EGDs of the form

              {r(X, Y1 , . . . , Ym ), r(X, Y10 , ..., Ym0 ) → Yi = Yi0 }1≤i≤m

A TGD of the form r1 (X, Y) → ∃Zr2 (X, Z), where r1 and r2 are predicate names
and no variable appears more than once in the body nor in the head, is called an
inclusion dependency. The key dependencies must not interact with the inclusion
dependencies, similarly to the semantic separability condition mentioned above
for TGDs and EGDs. In this case once it is known that no hard violation occurs,
queries can be answered by considering the inclusion dependencies only, ignoring
the key dependencies. A necessary and sufficient syntactic condition for non
interaction is based on the construction of CD-graphs [8].


3   The DISPONTE Semantics for Probabilistic Ontologies
The distribution semantics [35] is one of the most widely used semantics for
probabilistic logic programming. In this semantics a probabilistic logic program
defines a probability distribution over a set of normal logic programs (called
worlds). The distribution is extended to a joint distribution over worlds and
a query and the probability of the query is obtained from this distribution by
marginalization.
    In this section we discuss how we applied this approach to give a semantics
to a probabilistic version of Datalog+/- that we call DISPONTE Datalog+/-.
    A probabilistic DISPONTE Datalog+/- ontology (or simply probabilistic on-
tology) consists of a database D and a set of certain formulas, that take the form
of a Datalog+/- TGD, NC or EGD, of epistemic probabilistic formulas of the
form
                                     pi ::e Fi                                  (1)
where pi is a real number in [0, 1] and Fi is a TGD, NC or EGD, and of statistical
probabilistic formulas of the form

                                     pi ::s Fi                                  (2)

where pi is a real number in [0, 1] and Fi is a TGD.
     Let us call TT the set of all TGD formulas (certain, epistemic or statistical),
TC the set of NC formulas (certain or epistemic) and TE the set of EGD formulas
(certain or epistemic). Thus a probabilistic ontology O is a quadruple O =
(D, TT , TC , TE ). Let us indicate with T the union TT ∪ TC ∪ TE .
     In formulas of the form (1), pi is interpreted as an epistemic probability,
i.e., as the degree of our belief in formula Fi , while in formulas of the form
(2), pi is interpreted as a statistical probability, i.e., as information regarding
random individuals from certain populations. These two types of statements can
be related to the work [16]: an epistemic statement is a Type 2 statement and a
statistical statement is a Type 1 statement according to Halpern’s terminology.
     For example, an epistemic probabilistic concept inclusion TGD of the form

                                p ::e c(X) → d(X)                               (3)

represents the fact that we believe in the truth of c ⊆ d, where c and d are
interpreted as sets of individuals, with probability p. A statistical probabilistic
concept inclusion TGD of the form

                                p ::s c(X) → d(X)                               (4)

instead means that a random individual of class c has probability p of belonging
to d, thus representing the statistical information that a fraction p of the indi-
viduals of c belongs to d. In this way the overlap between c and d is quantified.
The difference between the two formulas is that, if two individuals belong to
class c, the probability that they both belong to d according to (3) is p, since p
represents the truth of the formula as a whole, while according to (4) is p × p,
since are now considered instantiations of the formula for specific individuals,
each one having the same probability p of beloging to class d.
    The idea of DISPONTE Datalog+/- is to associate independent Boolean
random variables to (instantiations of) the formulas. By assigning values to
every random variable we obtain a world, the set of logic formulas whose random
variable is assigned to 1.
    To clarify what we mean by instantiations, we now define substitutions. Given
a formula F , a substitution θ is a set of couples X/x where X is a variable
universally quantified in the outermost quantifier in F and x ∈ ∆ ∪ ∆N . The
application of θ to F , indicated by F θ, is obtained by replacing X with x in F
and by removing X from the external quantification for every couple X/x in θ.
An instantiation of a formula F is the result of applying a substitution to F .
    To obtain a world w of a probabilistic ontology O, we include every certain
formula in w. For each axiom of the form (1), we decide whether or not to include
it in w. For each axiom of the form (2), we generate all the substitutions for the
variables universally quantified in the outermost quantifier.
    There may be an infinite number of instantiations. For each instantiated
formula, we decide whether or not to include it in w. In this way we obtain a
Datalog+/- theory which can be assigned a semantics as seen in Section 2.
    To formally define the semantics of a probabilistic ontology we follow the
approach of Poole [28]. An atomic choice in this context is a triple (Fi , θj , k)
where Fi is the i-th formula, θj is a substitution and k ∈ {0, 1}. k indicates
whether Fi θj is chosen to be included in a world (k = 1) or not (k = 0). If Fi
is obtained from a certain formula, then θj = ∅ and k = 1. If Fi is obtained
from a formula of the form (1), then θj = ∅. If Fi is obtained from a formula
of the form (2), then θj instantiates the variables universally quantified in the
outermost quantifier.
    A composite choice κ is a consistent set of atomic choices, i.e., (Fi , θj , k) ∈
κ, (Fi , θj , m) ∈ κ ⇒ k = m (only one
                                     Q decision forQeach formula). The probability
of composite choice κ is P (κ) = (Fi ,θj ,1)∈κ pi (Fi ,θj ,0)∈κ (1 − pi ). A selection
σ is a total composite choice, i.e., it contains an atomic choice (Fi , θj , k) for
every instantiation Fi θj of formulas of the theory. Since the domain is infinite,
every selection is, too. Let us indicate with SO the set of all selections. SO is
infinite as well. A selection σ identifies a theory wσ called a world in this way:
wσ = {Fi θj |(Fi , θj , 1) ∈ σ}. Let us indicate with WO the set of all worlds. A
composite choice κ identifies a set of worlds ωκ = {wσ |σ ∈ SO , σ ⊇ κ}.SWe define
the set of worlds identified by a set of composite choices K as ωK = κ∈K ωκ .
    A composite choice κ is an explanation for a BCG query q if q is entailed by
the database and every world of ωκ . A set of composite choices K is covering
with respect to q if every world wσ in which q is entailed is such that wσ ∈ ωK .
Two composite choices κ1 and κ2 are incompatible if their union is inconsistent.
A set K of composite choices is mutually incompatible if for all κ1 ∈ K, κ2 ∈
K, κ1 6= κ2 ⇒ κ1 and κ2 are incompatible.
    Kolmogorov defined probability functions (or measures) as real-valued func-
tions over an algebra Ω of subsets of a set W called the sample space. The
set Ω is an algebra of W iff (1) W ∈ Ω, (2) Ω is closed under complementa-
tion, i.e., ω ∈ Ω → (W \ ω) ∈ Ω and (3) Ω is closed under finite union, i.e.,
ω1 ∈ Ω, ω2 ∈ Ω → (ω1 ∪ ω2 ) ∈ Ω. The elements of Ω are called measurable sets.
Not every subset of W need be present in Ω.
    Given a sample space W and an algebra Ω of subsets of W, a probability
measure is a function µ : Ω → R that satisfies the following axioms: (1) µ(ω) ≥ 0
for all ω ∈ Ω, (2) µ(W) = 1, (3) ω1 ∩ ω2 = ∅ → µ(ω1 ∪ ω2 ) = µ(ω1 ) + µ(ω2 ) for
all ω1 ∈ Ω, ω2 ∈ Ω. [28] proved the following results:
 – Given a finite set K of finite composite choices, there exists a finite set K 0
   of mutually incompatible finite composite choices such that ωK = ωK 0 ;
 – If K1 and K2 are both mutually incompatible
                                      P           finite
                                                       Psets of finite composite
   choices such that ωK1 = ωK2 then κ∈K1 P (κ) = κ∈K2 P (κ).
These results also hold for the probabilistic ontologies we consider so we can
define a unique probability measure µ : ΩO → [0, 1] where ΩO is defined as
the set of sets of worlds identified by finite sets of finite composite choices:
ΩO = {ωK |K is a finite set of finite composite choices}. It is easy to see that
ΩO is an algebra over WO .
    Then µ is defined by µ(ωK ) = κ∈K 0 P (κ) where K 0 is a finite mutually
                                     P
incompatible set of finite composite choices such that ωK = ωK 0 . hWT , ΩT , µi is
a probability space according to Kolmogorov’s definition.
    The probability of a BCQ query q = ∃YΦ(Y), where Φ(Y) is a conjunction
of atoms having as arguments variables Y and constants (but no nulls) is P (q) =
µ({w|w ∈ WO ∧ D ∪ w |= q}). If q has a finite set K of finite explanations such
that K is covering then {w|w ∈ WO ∧D ∪w |= q} ∈ ΩO and P (q) is well-defined.
    Let H(Q) be the set of all homomorphisms of the form h : Y → ∆ ∪ ∆N
such that {w|w ∈ WO ∧ D ∪ w |= h(Φ(Y))} is not empty. Then
                                     [
     {w|w ∈ WO ∧ D ∪ w |= q} =           {w|w ∈ WO ∧ D ∪ w |= h(Φ(Y))}.
                                   h∈H(q)

[28] also proposed an algorithm, called splitting algorithm, to obtain a set of
mutually incompatible composite choices from any set of composite choices.
Example 2. Let us consider the following probabilistic ontology, obtained from
the one presented in Example 1 by adding probabilistic annotations:

 0.4 ::s F1 = ann(X, label), ann(X, price), visible(X) → priceElem(X)
 0.5 ::s F2 = ann(X, label), ann(X, priceRange), visible(X) → priceElem(X)
 0.6 ::s F3 = priceElem(E), group(E, X) → f orSale(X)
        F4 = f orSale(X) → ∃P price(X, P )
        F5 = hasCode(X, C), codeLoc(C, L) → loc(X, L)
        F6 = hasCode(X, C) → ∃LcodeLoc(C, L), loc(X, L)
 0.8 ::e F7 = loc(X, L1), loc(X, L2) → L1 = L2
 0.7 ::s F8 = loc(X, L) → advertised(X)

and the database of Example 1:

    codeLoc(ox1, central), codeLoc(ox1, south), codeLoc(ox2, summertown),
    hasCode(prop1, ox2), ann(e1, price), ann(e1, label), visible(e1),
    group(e1, prop1)

A covering set of explanations for the query q = priceElem(e1) is K = {κ1 }
where κ1 = {(F1 , {X/e1}, 1)}. K is also mutually exclusive so P (Q) = 0.4.
    A covering set of explanations for the query q = f orSale(prop1) is K =
{κ1 , κ2 } where κ1 = {(F1 , {X/prop1}, 1), (F3 , {X/prop1}, 1)} and κ2 = {(F2 ,
{X/prop1}, 1), (F3 , {X/prop1}, 1)}.
    An equivalent mutually exclusive set of explanations obtained by applying
the splitting algorithm is K 0 = {κ01 , κ02 } where κ01 = {(F1 , {X/prop1}, 1), (F3 ,
{X/prop1}, 1), (F2 , {X/prop1}, 0)} and κ02 = {(F2 , {X/prop1}, 1),
(F3 , {X/prop1} , 1)} so P (q) = 0.4 · 0.6 · 0.5 + 0.5 · 0.6 = 0.42.
    A covering set of explanations for the query q = advertised(prop1) is K =
{κ1 , κ2 , κ3 } with

              κ1 = {(F8 , {X/prop1, L/summertown}, 1), (F7 , ∅, 1)}
              κ2 = {(F8 , {X/prop1, L/summertown}, 1), (F7 , ∅, 0)}
              κ3 = {(F8 , {X/prop1, L/z1 }, 1), (F7 , ∅, 0)}

where z1 ∈ ∆N and certain formulas have been omitted. A mutually exclusive
set of explanations is K 0 = {κ01 , κ02 , κ03 } where

κ01 = {(F8 , {X/prop1, L/summertown}, 1), (F7 , ∅, 1)}
κ02 = {(F8 , {X/prop1, L/summertown}, 1), (F7 , ∅, 0), (F8 , {X/prop1, L/z1 }, 0)}
κ03 = {(F8 , {X/prop1, L/z1 }, 1), (F7 , ∅, 0)}

so P (q) = 0.7 · 0.8 + 0.7 · 0.2 · 0.3 + 0.7 · 0.2 = 0.742.
Example 3. Let us consider the following ontology, inspired by the people+pets
ontology proposed in Patel-Schneider et al. [27]:

                     F1 = hasAnimal(X, Y ), pet(Y ) → petOwner(X)
             0.6 ::e F2 = cat(X) → pet(X)

and the database hasAnimal(kevin, fluffy), hasAnimal(kevin, tom), cat(fluffy),
cat(tom). A covering set of explanations for the query q = petOwner(kevin) is
K = {κ1 } where κ1 = {(F2 , ∅, 1)} and certain formulas have been omitted. This
is also a mutually exclusive set of explanations so P (q) = 0.6.
Example 4. If the axiom 0.6 ::e F2 = cat(X) → pet(X) in Example 3 is re-
placed by 0.6 ::s F2 = cat(X) → pet(X) then the query would have the
set of explanations K = {κ1 , κ2 } where κ1 = {(F2 , {X/fluffy}, 1)} and κ2 =
{(F2 , {X/tom}, 1)} which, after splitting, becomes K 0 = {κ01 , κ02 } with

                    κ01 = {(F1 , {X/fluffy}, 1), (F1 , {X/tom}, 0)}
                    κ02 = {(F1 , {X/tom}, 1)}

and certain formulas have been omitted, so P (q) = 0.6 · 0.4 + 0.6 = 0.84.
Example 5. Let us consider a slightly different ontology:

             0.5 ::s F1 = hasAnimal(X, Y ), pet(Y ) → petOwner(X)
             0.6 ::s F2 = cat(X) → pet(X)
and the database of Example 3. A covering set of explanations for the query
q = petOwner(kevin) is K = {κ1 , κ2 } where:

                   κ1 = {(F1 , {X/kevin}, 1), (F2 , {X/fluffy}, 1)}
                   κ2 = {(F1 , {X/kevin}, 1), (F2 , {X/tom}, 1)}


An equivalent mutually exclusive set of explanations is K 0 = {κ01 , κ02 } where:

         κ01 = {(F1 , {X/kevin}, 1), (F2 , {X/fluffy}, 1), (F2 , {X/tom}, 0)}
         κ02 = {(F1 , {X/kevin}, 1), (F2 , {X/tom}, 1)}


so P (q) = 0.5 · 0.6 · 0.4 + 0.5 · 0.6 = 0.42.
Example 6. Let us consider the following ontology:

                    F1 = ∃hasAnimal(X, Y ), pet(Y ) → petOwner(X)
             0.6 ::s F2 = cat(X) → pet(X)
             0.4 ::e F3 = cat(fluffy)
             0.3 ::e F4 = cat(tom)


and the database hasAnimal(kevin, fluffy), hasAnimal(kevin, tom). A covering
set of explanations for the query axiom q = petOwner(kevin) is:
K = {κ1 , κ2 } where

                         κ1 = {(F3 , ∅, 1), (F2 , {X/fluffy}, 1)}
                         κ2 = {(F4 , ∅, 1), (F2 , {X/tom}, 1)}

and certain formulas have been omitted. After splitting K becomes
K 0 = {κ01 , κ02 , κ03 } with:

         κ01 = {(F3 , ∅, 1), (F2 , {X/fluffy}, 1), (F4 , ∅, 1), (F2 , {X/tom}, 0)}
         κ02 = {(F3 , ∅, 1), (F2 , {X/fluffy}, 1), (F4 , ∅, 0)}
         κ03 = {(F4 , ∅, 1), (F2 , {X/tom}, 1)}

so P (q) = 0.4 · 0.6 · 0.3 · 0.4 + 0.4 · 0.6 · 0.7 + 0.3 · 0.6 = 0.3768.
Example 7. Let us consider a further ontology:

                   F1 = 0.7 ::s schoolchild(X) → european(X)
                   F2 = 0.3 ::s schoolchild(X) → onlyChild(X)
                   F3 = 0.6 ::s european(X) → goodInM ath(X)
                   F4 = 0.5 ::s onlyChild(X) → goodInM ath(X)
and the database schoolchild(anna). A covering set of explanations for the query
q = goodInM ath(anna) is K = {κ1 , κ2 } where:

                    κ1 = {(F1 , {X/anna}, 1), (F3 , {X/anna}, 1)}
                    κ2 = {(F2 , {X/anna}, 1), (F4 , {X/anna}, 1)}

After splitting we get K 0 = {κ01 , κ02 , κ03 } where:

         κ01 = {(F1 , {X/anna}, 1), (F3 , {X/anna}, 1), (F2 , {X/anna}, 1),
               (F4 , {X/anna}, 0)}
          0
         κ2 = {(F1 , {X/anna}, 1), (F3 , {X/anna}, 1), (F2 , {X/anna}, 0)}
         κ03 = {(F2 , {X/anna}, 1), (F4 , {X/anna}, 1)}

So P (q) = 0.7 · 0.6 · 0.3 · 0.5 + 0.7 · 0.6 · 0.7 + 0.3 · 0.5 = 0.507.


4    Related Work
This work builds on Bellodi et al. [1] that presented a version of DISPONTE ap-
plied to Description Logics and where only epistemic probabilities were present.
    Gottlob et al. [15, 14] presented Probabilistic Datalog+/-, a version of
Datalog+/- that allows the representation of probabilistic information by com-
bining Markov Logic Networks with Datalog+/-. Each Probabilistic Datalog+/-
formula F is annotated with a probabilistic scenario λ, an assignment of val-
ues to a set of random variables from the MLN associated to the ontology.
A full probabilistic scenario assigns a value to all the random variables of the
MLN. A probabilistic scenario represents an event that happens when the ran-
dom variables described by the MLN assume the values indicate in the scenario.
Probabilistic formulas then take the form F : λ.
    A Probabilistic Datalog+/- ontology is of the form Φ = (O, M ) where O is a
set of annotated formulas and M is an MLN. An annotated formula holds when
the event associated with their probabilistic annotation holds.
    If a is a ground atom, its probability in a Probabilistic Datalog+/- ontology
Φ = (O, M ), denoted P r(a), is obtained by summing the probabilities according
to M of all full scenarios such that the atom is entailed by the annotated formulas
that hold in the scenario.
Example 8. Let us consider the following probabilistic Datalog+/- ontology from
[14]

    F1 = visible(X) → priceElem(X) : {ann(X, label), ann(X, price)}
    F2 = visible(X) → priceElem(X) : {ann(X, label), ann(X, priceRange)}
    F3 = priceElem(E), group(E, X) → f orSale(X) : {sale}
    F4 = f orSale(E) → ∃P price(X, P )
    F5 = hasCode(X, C), codeLoc(C, L) → loc(X, L)
    F6 = hasCode(X, C) → ∃LcodeLoc(C, L), loc(X, L)
   F7 = loc(X, L1), loc(X, L2) → L1 = L2 : {uniqueLoc}


and the MLN

                    0.3 ann(X, label) ∧ ann(X, price)
                    0.4 ann(X, label) ∧ ann(X, priceRange)
                    0.8 sale
                    1.1 uniqueLoc

Suppose that this network is grounded with respect to the only constant e1. The
resulting ground network has 5 Boolean random variables, each corresponding
to a logical atom. Therefore, there are 25 full scenarios.
In this theory P (priceElem(e1)) = 0.492 and P (f orSale(prop1)) = 0.339.
DISPONTE differs from Probabilistic Datalog+/- because the probabilistic in-
teractions among the atoms are modeled directly by means of Datalog+/- formu-
las rather than by a separate entity. The parameters of DISPONTE Datalog+/-
are easier to interpret as they are probabilities (statistical or epistemic) while
MLN parameters are weights not directly interpretable as probabilities.
    Moreover, DISPONTE does not require the prior grounding of the probabilis-
tic atoms, for which the set of constants has to be defined by the user, but allows
an on demand grounding on the basis of the terms that are used for inference.
    Heinsohn [17] proposed an extension of the description logic ALC that is
able to express statistical information on the terminological knowledge such as
partial concept overlapping. Similarly, Koller et al. [21] presented a probabilistic
description logic based on Bayesian networks that deals with statistical termino-
logical knowledge. [17, 21] do not allow probabilistic assertional knowledge about
concept and role instances. Jaeger [18] allows assertional knowledge about con-
cept and role instances together with statistical terminological knowledge and
combines the resulting probability distributions using cross-entropy minimiza-
tion but does not allow epistemic statements.
    Ding et al. [12] proposed a probabilistic extension of OWL that admits a
translation into Bayesian networks. The semanticsP       that is proposed assigns a
probability distribution P (i) over individuals,
                                         P          i.e.  i P (i) = 1, and assigns a
probability to a class C as P (C) = i∈C P (i), while we assign a probability
measure to sets of worlds. PR-OWL [10, 9] is an upper ontology that provides
a framework for building probabilistic ontologies. It allows to use the first-order
probabilistic logic MEBN [22] for representing uncertainty in ontologies. The use
of a full fledged first-order probabilistic logic distinguishes this work from ours,
where we tried to provide a minimal extension to Datalog+/-.
    A different approach to the combination of ontologies with probability is
taken by Giugno et al. [13] and Lukasiewicz [23, 24], who use probabilistic lexi-
cographic entailment from probabilistic default reasoning. The description logics
proposed in these papers allows both terminological probabilistic knowledge as
well as assertional probabilistic knowledge about instances of concepts and roles.
PRONTO [20] is one of the systems that allows to perform inference in this se-
mantics. Similarly to Jaeger [18], the terminological knowledge is interpreted
statistically while the assertional knowledge is interpreted in an epistemic way
by assigning degrees of beliefs to assertions. Moreover it also allows to express
default knowledge about concepts that can be overridden in subconcepts and
whose semantics is given by Lehmann’s lexicographic default entailment. These
works are based on Nilsson’s probabilistic logic [26] where a probabilistic inter-
pretation P r defines a probability distribution over the set of interpretations
Int. The probability of a logic formula F according to P r, denoted P r(F ), is
the sum of all P r(I) such that I ∈ Int and I |= F .
    A probabilistic knowledge base K in Nilsson’s logic is a set of probabilistic
formulas of the form F ≥ p. A probabilistic interpretation P r satisfies F ≥ p iff
P r(F ) ≥ p. P r satisfies K, or P r is a model of K, iff P r satisfies all F ≥ p ∈ K.
We say P (F ) ≥ p is a tight logical consequence of K iff p is the infimum of
P r(F ) subject to all models P r of K. Computing tight logical consequences
from probabilistic knowledge bases can be done by solving a linear optimization
problem.
    In fact Nilsson’s logic allows weaker conclusions than the distribution seman-
tics. For example, consider a probabilistic ProbLog [11] program composed of
0.4 :: a. and 0.5 :: b. and a probabilistic knowledge base composed of a ≥ 0.4
and b ≥ 0.5. The distribution semantics allows to say that P (a ∨ b) = 0.7, while
with Nilsson’s logic the lowest p such that P r(a ∨ b) ≥ p holds is 0.5. This is due
to the fact that in the distribution semantics the probabilistic atoms are consid-
ered as independent, which allows to make stronger conclusions. However, note
that this does not restrict expressiveness as one can specify any joint probability
distribution over the atoms interpreted as Boolean random variables, possibly
introducing new random facts if needed.
    Alternative approaches to modeling imperfect and incomplete knowledge in
ontologies are based on fuzzy logic. A good survey of these approaches is pre-
sented in [25].


5    Conclusions

We have presented the DISPONTE semantics for probabilistic ontologies in
Datalog+/- that is inspired by the distribution semantics of probabilistic logic
programming and is a minimal extension of the underlying ontology semantics
to allow for representing and reasoning with uncertain knowledge.
    In the future we plan to develop inference algorithm for DISPONTE
Datalog+/-, both bottom-up, on the basis of the chase procedure, and top-down,
by applying PITA [34] to non interacting inclusion and key dependencies. More-
over, we will investigate the possibility of annotating also NCs and EGDs with
a statistical probability. Finally, we will work towards the automatic learning
of DISPONTE Datalog+/- ontologies, exploiting the techniques developed in
Riguzzi [32], Riguzzi et al. [33], Bellodi et al. [3, 2].
References

 1. Bellodi, E., Lamma, E., Riguzzi, F., Albani, S.: A distribution semantics for prob-
    abilistic ontologies. In: International Workshop on Uncertainty Reasoning for the
    Semantic Web. CEUR Workshop Proceedings, vol. 778. CEUR-WS.org (2011)
 2. Bellodi, E., Riguzzi, F.: Learning the structure of probabilistic logic programs. In:
    International Conference on Inductive Logic Programming (2011)
 3. Bellodi, E., Riguzzi, F.: Expectation Maximization over binary decision diagrams
    for probabilistic logic programs. Intelligent Data Analysis 16(6) (2012)
 4. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering un-
    der expressive relational constraints. In: International Conference on Principles of
    Knowledge Representation and Reasoning. pp. 70–80. AAAI Press (2008)
 5. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework
    for tractable query answering over ontologies. In: Symposium on Principles of
    Database Systems. pp. 77–86. ACM (2009)
 6. Calı̀, A., Gottlob, G., Lukasiewicz, T.: Tractable query answering over ontologies
    with Datalog+/-. In: International Workshop on Description Logics. CEUR Work-
    shop Proceedings, vol. 477. CEUR-WS.org (2009)
 7. Calı̀, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/-: A fam-
    ily of logical knowledge representation and query languages for new applications.
    In: IEEE Symposium on Logic in Computer Science. pp. 228–242. IEEE Computer
    Society (2010)
 8. Calı̀, A., Gottlob, G., Pieris, A.: Tractable query answering over conceptual
    schemata. In: International Conference on Conceptual Modeling. LNCS, vol. 5829,
    pp. 175–190. Springer (2009)
 9. Carvalho, R.N., Laskey, K.B., Costa, P.C.G.: PR-OWL 2.0 - bridging the gap to
    OWL semantics. In: International Workshops on Uncertainty Reasoning for the
    Semantic Web (2010)
10. Costa, P.C.G., Laskey, K.B., Laskey, K.J.: PR-OWL: A bayesian ontology language
    for the semantic web. In: International Workshops on Uncertainty Reasoning for
    the Semantic Web. vol. 5327, pp. 88–107. Springer (2008)
11. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and
    its application in link discovery. In: International Joint Conference on Artificial
    Intelligence. pp. 2462–2467 (2007)
12. Ding, Z., Peng, Y.: A probabilistic extension to ontology language OWL. In: Hawaii
    International Conference on System Sciences. IEEE (2004)
13. Giugno, R., Lukasiewicz, T.: P-SHOQ(D): A probabilistic extension of SHOQ(D)
    for probabilistic ontologies in the semantic web. In: European Conference on Logics
    in Artificial Intelligence. LNCS, vol. 2424, pp. 86–97. Springer (2002)
14. Gottlob, G., Lukasiewicz, T., Simari, G.I.: Answering threshold queries in prob-
    abilistic Datalog+/- ontologies. In: International Conference on Scalable Uncer-
    tainty Management. LNCS, vol. 6929, pp. 401–414. Springer (2011)
15. Gottlob, G., Lukasiewicz, T., Simari, G.I.: Conjunctive query answering in proba-
    bilistic Datalog+/- ontologies. In: International Conference on Web Reasoning and
    Rule Systems. LNCS, vol. 6902, pp. 77–92. Springer (2011)
16. Halpern, J.Y.: An analysis of first-order logics of probability. Artificial Intelligence
    46(3), 311–350 (1990)
17. Heinsohn, J.: Probabilistic description logics. In: Conference on Uncertainty in
    Artificial Intelligence. pp. 311–318. Morgan Kaufmann (1994)
18. Jaeger, M.: Probabilistic reasoning in terminological logics. In: International Con-
    ference on Principles of Knowledge Representation and Reasoning. pp. 305–316
    (1994)
19. Kimmig, A., Demoen, B., Raedt, L.D., Costa, V.S., Rocha, R.: On the imple-
    mentation of the probabilistic logic programming language ProbLog. Theory and
    Practice of Logic Programming 11(2-3), 235–262 (2011)
20. Klinov, P.: Pronto: A non-monotonic probabilistic description logic reasoner. In:
    European Semantic Web Conference. LNCS, vol. 5021, pp. 822–826. Springer
    (2008)
21. Koller, D., Levy, A.Y., Pfeffer, A.: P-classic: A tractable probablistic description
    logic. In: National Conference on Artificial Intelligence. pp. 390–397 (1997)
22. Laskey, K.B., Costa, P.C.G.: Of starships and klingons: Bayesian logic for the 23rd
    century. In: Conference in Uncertainty in Artificial Intelligence. pp. 346–353. AUAI
    Press (2005)
23. Lukasiewicz, T.: Probabilistic default reasoning with conditional constraints. An-
    nals of Mathematics and Artificial Intelligence 34(1-3), 35–88 (2002)
24. Lukasiewicz, T.: Expressive probabilistic description logics. Artificial Intelligence
    172(6-7), 852–883 (2008)
25. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description
    logics for the semantic web. Journal of Web Semantics 6(4), 291–308 (2008)
26. Nilsson, N.J.: Probabilistic logic. Artificial Intelligence 28(1), 71–87 (1986)
27. Patel-Schneider, P, F., Horrocks, I., Bechhofer, S.: Tutorial on OWL (2003), http:
    //www.cs.man.ac.uk/~horrocks/ISWC2003/Tutorial/
28. Poole, D.: Abducing through negation as failure: stable models within the inde-
    pendent choice logic. Journal of Logic Programming 44(1-3), 5–35 (2000)
29. Predoiu, L., Stuckenschmidt, H.: Probabilistic extensions of semantic web lan-
    guages - a survey. In: The Semantic Web for Knowledge and Data Management:
    Technologies and Practices. Idea Group Inc (2008)
30. Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2),
    107–136 (2006)
31. Riguzzi, F.: Extended semantics and inference for the Independent Choice Logic.
    Logic Journal of the IGPL 17(6), 589–629 (2009)
32. Riguzzi, F.: ALLPAD: Approximate learning of logic programs with annotated
    disjunctions. Machine Learning 70(2-3), 207–223 (Mar 2008)
33. Riguzzi, F., Di Mauro, N.: Applying the information bottleneck to statistical rela-
    tional learning. Machine Learning 86(1), 89–114 (2012)
34. Riguzzi, F., Swift, T.: The PITA system: Tabling and answer subsumption for
    reasoning under uncertainty. Theory and Practice of Logic Programming, Interna-
    tional Conference on Logic Programming Special Issue 11(4–5), 433–449 (2011)
35. Sato, T.: A statistical learning method for logic programs with distribution seman-
    tics. In: International Conference on Logic Programming. pp. 715–729. MIT Press
    (1995)
36. URW3-XG: Uncertainty reasoning for the World Wide Web, final report (2005)
37. Vennekens, J., Verbaeten, S., Bruynooghe, M.: Logic programs with annotated
    disjunctions. In: International Conference on Logic Programming. LNCS, vol. 3131,
    pp. 195–209. Springer (2004)