=Paper=
{{Paper
|id=Vol-1661/paper-01
|storemode=property
|title=The Structure and Complexity of Credal Semantics
|pdfUrl=https://ceur-ws.org/Vol-1661/paper-01.pdf
|volume=Vol-1661
|authors=Fabio Cozman,Denis Mauá
|dblpUrl=https://dblp.org/rec/conf/ilp/CozmanM16
}}
==The Structure and Complexity of Credal Semantics==
<pdf width="1500px">https://ceur-ws.org/Vol-1661/paper-01.pdf</pdf>
<pre>
        The Structure and Complexity of Credal
                      Semantics

                       Fabio G. Cozman and Denis D. Mauá

                           Universidade de São Paulo, Brazil


        Abstract. A flexible semantics has been proposed by Lukasiewicz for
        probabilistic logic programs where we have a normal logic program aug-
        mented with a set of independent probabilistic facts. That semantics,
        which we call credal semantics, is the set of all probability measures
        (over stable models) that are consistent with a total choice of proba-
        bilistic facts. When each total choice produces a definite program, credal
        semantics is identical to Sato’s distribution semantics. However, credal
        semantics is also defined for programs with cycles and negations. We
        show that the credal semantics always defines a set containing the prob-
        ability measures that dominate an infinite monotone Choquet capacity
        (also known as a belief function). We also show how this result leads to
        inference algorithms and to an analysis of the complexity of inferences.

        Keywords: Probabilistic logic programming, answer sets, stable model
        semantics, complexity theory.


1     Introduction
The term “probabilistic logic program” encompasses a significant number of
distinct syntactic proposals [9,17,22,23]. One particularly successful proposal
is jointly represented by Poole’s probabilistic Horn abduction [25] and Sato’s
distribution semantics [28]. There the idea is that a logic program is enlarged
with probabilistic facts that are associated with probabilities and that are usually
assumed stochastically independent. For instance, we may have a rule:

    up :− actionUp, not disturbanceUp.         with P(disturbanceUp = true) = 0.1.

Depending on disturbanceUp, actionUp may succeed or not in leading to up.
    Poole and Sato emphasized acyclic logic programs [25,26,28,29], even though
Sato did handle cyclic ones. Since then, there has been work on cyclic proba-
bilistic logic programs under variants of the distribution semantics [15,18,27,30].
In particular, a semantics for (acyclic and cyclic) probabilistic logic programs
can be extracted from the work of Lukasiewicz on probabilistic description log-
ics [18,19]. His proposal is that a probabilistic logic program defines a set of
probability measures: for a fixed truth assignment over probabilistic facts, we
take the set of all probabilities over stable models of the resulting normal logic
program. This approach may seem a little complex, but it stays within two-
valued logic and is directly related to answer set programming languages.
4

    Lukasiewicz refers to his approach as the “answer set semantics” for proba-
bilistic logic programs. However, as the approach is certainly distinct from the
usual answer set semantics (which does not employ probabilities), and as the
approach certainly applies in the presence of functions (usually not allowed in
answer set programming), we adopt the name credal semantics to refer to it.
    In this paper we study the structure and complexity of the credal seman-
tics. We show that the semantics of a consistent probabilistic logic program is,
surprisingly, the set of dominating measures of an infinitely monotone Choquet
capacity. Such capacities are relatively simple objects that appear in several
fields [21,31]. We obtain a straightforward derivation for inference algorithms,
and we study the complexity of inferences, obtaining interesting novel results.
    The paper is organized as follows. Basic terminology is reviewed in Section 2,
and the credal semantics is presented in Section 3. The structure of the credal
semantics is derived in Section 4, and its consequences concerning inference
algorithms and complexity are discussed in Section 5.


2    A very short review: some syntax, some semantics

An atom is written as r(t1 , . . . , tk ), where r is a predicate and each ti is a term;
that is, each ti is a constant or a logical variable. A normal logic program is a
finite set of rules such as A0 :− A1 , . . . , Am , not Am+1 , . . . , not An ., where each
Ai is an atom. Atom A0 is the head of the rule; the right hand side is the body.
The body consists of a set of subgoals separated by commas (A1 is a subgoal,
not An is a subgoal). If there are no subgoals, the rule is called a fact and
written simply as A0 .. Programs without not are definite. An atom is ground if
it does not contain any logical variable; a program without logical variables is
propositional. The Herbrand base of a program is the set of all ground atoms built
from constants and predicates mentioned in the program. By grounding every
rule with atoms in the Herbrand base, we obtain the (propositional) grounding
of a program. The grounded dependency graph of a normal logic program is a
directed graph where each grounded predicate is a node, and where there is an
edge from node B to node C if there is a grounded rule where C appears in
the head and B appears in the body; if B is preceded by not in the grounded
rule, then the edge between B and C is negative. A program is acyclic when
its grounded dependency graph is acyclic. An interpretation for a normal logic
program P is a subset of the Herbrand base of P; the idea is that atoms in the
interpretation are true, and atoms not in the interpretation are false. A model is
an interpretation such that every grounded rule is satisfied (that is, either the
head is true or there is a subgoal that is false in the interpretation).
    In this paper we do not allow functions to be used in programs, so as to stay
within finite Herbrand bases. We leave the study of functions to future work.
    One strategy to define the semantics of normal logic programs is to translate
programs into some well-known logic, using a completion, and to take the models
of the completion as the semantics. Another strategy is to select some “canonical”
models as “the” semantics. Two canonical models have attracted most attention
                                                                                       5

due to their favorable properties: the well-founded [33] and the stable model
semantics [14]. In this paper we are concerned with the latter (stable model)
semantics. To define it, take normal logic program P. For interpretation I, define
the reduct PI to be the definite program obtained by (i) grounding P, (ii)
removing all rules that contain a subgoal not A in the body such that A is in
I, (iii) removing all remaining subgoals of the form not A from the remaining
rules. An interpretation I is a stable model if I is a minimal model for PI
(minimal according to set inclusion). Note that a definite program always has a
single minimal model, so the whole construction is well defined. An answer set
is exactly a stable model.


3     Probabilistic logic programs

A probabilistic logic program is a pair hP, PFi where P is a normal logic program
and PF is a set of probabilistic facts (we abbreviate “probabilistic logic program”
by plp). A probabilistic fact is an atom A that does not unify with any rule head
(hence with any fact either), and that is associated with a probability assessment
P(A) = α. We write a probabilistic fact as α :: A., where we have adopted the
syntax of the ProbLog system1 [13]. If A is a ground atom, then α :: A is a
ground probabilistic fact. A probabilistic fact with logical variables is interpreted
through its groundings (for instance, α :: r(X1 , . . . , Xm ) is interpreted as the set
of α :: r(a1 , . . . , ak ) for all groundings of r). A truth assignment for all grounded
probabilistic facts in PF is a total choice. If θ is a total choice for PF, we denote
by PF↓θ the set of atoms assigned true by θ.
    We assume that all probabilistic facts are independent, so the probability of
total choice θ = {A1 = t1 , . . . , Ak = tk }, where each ti is true or false, is

        P(θ) = P(A1 = t1 , . . . , Ak = tk ) = P(A1 = t1 ) × · · · × P(An = tn ) ,   (1)

where P(Ai = true) = αi and P(Ai = false) = 1 − αi given αi :: Ai .
    Once a total choice is fixed, we can form a normal logic program consisting
of P ∪ PF↓θ . Poole’s and Sato’s original work focused respectively on acyclic
and definite programs, and for these programs the semantics of P ∪ PF↓θ is
rather uncontroversial, and is given by their unique stable model. Hence, every
such probabilistic logic program induces a unique probability distribution over
all atoms. For cyclic programs, Sato, Kameya and Zhou later adopted Fitting’s
three-valued semantics for each fixed total choice [30], while Hadjichristodoulou
and Warren [15] and Riguzzi [27] have explored the well-founded semantics.
    An alternative semantics can be extracted from the work on probabilistic
description logic programs by Lukasiewicz [19], as noted in Section 1. To describe
that proposal, a few definitions are needed. A plp hP, PFi is consistent if there
is at least one stable model for each fixed total choice of PF. A probability model
for a consistent plp hP, PFi is a probability measure P over interpretations of
P, such that:
1
    Available at https://dtai.cs.kuleuven.be/problog/index.html.
6

(i) every interpretation I with P(I) > 0 is a stable model of P ∪ PF↓θ for the
total choice θ that agrees with I on the probabilistic facts; and
(ii) for every total choice θ = {A1 = t1 , . . . , Ak = tk }, where ti is true or false,
                  Qk
we have P(θ) = i=1 P(Ai = ti ).
The set of all probability models for a plp is the semantics of the program.
     Lukasiewicz calls his proposed semantics the answer set semantics for prob-
abilistic description logic programs. As we mentioned before, we prefer the term
credal semantics, which we adopt. The reason for this latter name is that a set
of probability measures is often called a credal set [2].
     Now given a consistent plp, we may be interested in the smallest possible
value of P(A) for some ground atom A, with respect to the set K of all probability
models of the plp. This is conveyed by the lower probability of A, P(A) =
inf P∈K P(A). Similarly, we have the upper probability of A, P(A) = supP∈K P(A).
     It is perhaps time for a few much needed examples. We start with two ex-
amples where each total choice produces a single stable model, and then move
to two examples where some total choices lead to several stable models.
Example 1. First, consider an acyclic probabilistic logic program hP, PFi. When
P is acyclic, each total choice for PF yields an acyclic program (with a unique
stable model that is also its well-founded semantics [1]), hence a single probability
model is obtained (a singleton credal set).
    For example, consider a fragment of the famous Bayesian network Asia
[16] with three atoms, smoking, cancer, and bronchitis, with P(smoking) = 1/2,
P(cancer|smoking) = 1/10, P(cancer|¬smoking) = 1/100, P(bronchitis|smoking) =
3/5, and P(bronchitis|¬smoking) = 3/10. These probabilities can be expressed
through the following plp:
      0.5 :: smoking.      0.1 :: a1.   0.01 :: a2.    0.6 :: a3.  0.3 :: a4.
               cancer :− smoking, a1.     cancer :− not smoking, a2.
             bronchitis :− smoking, a3. bronchitis :− not smoking, a4.
This plp has a single probability model, yielding for instance P(smoking|cancer) =
P(smoking|cancer) = 1/11. In fact, any Bayesian network over binary variables
can be encoded using an acyclic probabilistic logic program [25,26]. 2
Example 2. Sometimes a cyclic probabilistic logic program hP, PFi still has a
single probability model. This happens for instance if P is locally stratified; that
is, if its grounded dependency graph has no cycles containing a negative edge.
If P is a locally stratified program, then any total choice of probabilistic facts
produces a locally stratified normal logic program. And for locally stratified
programs, most existing semantics, and particularly the well-founded and the
stable model semantics, coincide and produce a single stable model.
     For example, consider a plp where p means “path” and e means “edge”
(based on an example in the ProbLog distribution):
            p(X, Y ) :− e(X, Y ).             p(X, Y ) :− e(X, Y ), p(X, Y ).
             0.6 :: e(1, 2). 0.1 :: e(1, 3). 0.4 :: e(2, 5). 0.3 :: e(2, 6).        (2)
                      0.3 :: e(3, 4). 0.8 :: e(4, 5). 0.2 :: e(5, 6).
                                                                                      7


 1       3      5        1       3      5        1       3   5      1      3      5


 2       4               2       4               2       4          2      4

         Fig. 1. Graph described in Example 4 (left), and the stable models.


That is, we have a random graph with nodes 1, . . . , 6, and probabilities attached
to edges. The query P(p(1, 6) = true) yields the probability that there is a path
between nodes 1 and 6. Using ProbLog one obtains P(p(1, 6) = true) = 0.217. 2

Example 3. A common cyclic pattern in the literature is the pair of rules a :−
not b. and b :− not a. [33]. Here is a more elaborated version, adapted from
Ref. [12]. Take probabilistic fact 0.9 :: man(dilbert). and rules:

single(X) :− man(X), not husband(X). husband(X) :− man(X), not single(X).

When man(dilbert) is false, there is a single stable model s1 = {man(dilbert) =
false, husband(dilbert) = false, single(dilbert) = false}. When man(dilbert) is true,
there are two stable models, s2 = {man(dilbert) = true, husband(dilbert) =
true, single(dilbert) = false}, and s3 = {man(dilbert) = true, husband(dilbert) =
false, single(dilbert) = true}. Now, there is a probability model (that is, a prob-
ability measure over the stable models) such that P(s1 ) = 0.1 and P(s2 ) = 0.9
(hence P(s3 ) = 0), and a probability set model such that P(s1 ) = 0.1 and
P(s3 ) = 0.9. In fact, any probability measure such that P(s1 ) = 0.1, P(s2 ) = 0.9γ,
and P(s3 ) = 0.9(1 − γ), for γ ∈ [0, 1], is also a probability model. 2


Example 4. Consider a graph coloring problem consisting of the rules:

coloredBy(V, red) :− not coloredBy(V, yellow), not coloredBy(V, green), vertex(V ).
coloredBy(V, yellow) :− not coloredBy(V, red), not coloredBy(V, green), vertex(V ).
coloredBy(V, green) :− not coloredBy(V, red), not coloredBy(V, yellow), vertex(V ).
      noClash :− not noClash, edge(V, U ), coloredBy(V, C), coloredBy(U, C).

and the facts: for i ∈ {1, . . . , 5}, vertex(i)., and

             coloredBy(2, red). coloredBy(5, green). 0.5 :: edge(4, 5).
     edge(1, 3). edge(1, 4). edge(2, 1). edge(2, 4). edge(3, 5). edge(4, 3).

The facts mentioning vertex and edge encode the graph in Figure 1 (left); the
probabilistic fact is indicated as a dashed edge. For a fixed total choice, the sta-
ble models of the program are the 3-colorings of the graph. Thus, if edge(4, 5)
is true, there is a single stable model; otherwise, there are two stable mod-
els. We obtain P(coloredBy(1, yellow)) = 0 and P(coloredBy(1, yellow)) = 1/2,
while P(coloredBy(4, yellow)) = 1/2 and P(coloredBy(4, yellow)) = 1, and finally
P(coloredBy(3, yellow)) = P(coloredBy(3, yellow)) = 1. 2
8

Example 5. Consider a game where one wins whenever the opponent has no
moves [33], with rule wins(X) :− move(X, Y ), not wins(Y ). and the following
moves, one of which is probabilistic: move(a, b). move(b, a). move(b, c). 0.3 ::
move(c, d).. If move(c, d) is false, there is a single stable model (where wins(b) is
the only winning position); otherwise, there are two stable models (wins(c) is true
in both of them; wins(a) is true in one, while wins(b) is true in the other). Thus
P(wins(a)) = 0.0 and P(wins(a)) = 0.3; P(wins(b)) = 0.7 and P(wins(b)) = 1.0;
and P(wins(c)) = 0.3 and P(wins(c)) = 0.3. 2

    Finally, consider a program without credal semantics:

Example 6. If the barber shaves every villager who does not shave himself, does
the barber shave himself? The program

            shaves(X, Y ) :− barber(X), villager(Y ), not shaves(X, Y ).
                    0.5 :: barber(bob).     0.5 :: villager(bob).

does not have a stable model when both probabilistic facts are true (we obtain
the pattern p :− not p). Note that even in this case the resulting normal logic
program has well-founded semantics (assigning undefined to shaves(bob, bob)). 2


4    The structure of credal semantics
Given the generality of plps, one might think that credal sets generated by the
credal semantics can have an arbitrarily complex structure. Surprisingly, the
structure of the credal semantics of a plp is a relatively simple object:

Theorem 1. Given a consistent probabilistic logic program, its credal semantics
is a set of probability measures that dominate an infinitely monotone Choquet
capacity.

    Before we present a proof of this theorem, let us pause and define a few
terms. An infinitely monotone Choquet capacity is a set function P from an
algebra A on a set Ω to the real interval [0, 1] such that [2, Definition 4.2]:
P(Ω) = 1 − P(∅) = 1 and, for any A1 , . . . , An in the algebra, P(∪i Ai ) ≥
                    |J|+1
P
   J⊆{1,...,n} (−1)       P(∩j∈J Aj ). Infinitely monotone Choquet capacities appear
in several formalisms; for instance, they are the belief functions of Dempster-
Shafer theory [31], and summaries of random sets [21]. Given a set-function P, we
can construct a set of measures that dominate P as {P : ∀A ∈ A : P(A) ≥ P(A)}.
We abuse language and say that this set itself is infinitely monotone. If a
credal set K is infinitely monotone, then the lower probability P, defined as
P(A) = inf P∈K P(A), is the generating infinitely monotone Choquet capacity.
We also have the upper probability P(A) = supP∈K P(A) = 1 − P(Ac ).

Proof (Theorem 1). Consider a set Θ containing as states the posssible total
choices of the probabilistic facts. Over this space we have a product measure that
is completely specified by the probabilities attached to probabilistic facts. Now
                                                                                  9

consider a multi-valued mapping Γ between Θ and the space Ω of all possible
models of our probabilistic logic program. For each element θ ∈ Θ, define Γ (θ) to
be the set of stable models associated with the total choice θ of the probabilistic
facts. Now we use the fact that a probability space and a multi-valued mapping
induce an infinite monotone Choquet capacity over the range of the mapping
(that is, over Ω) [21].                                                          2
    Infinitely monotone credal sets have several useful properties; for one thing
they are closed and convex. Convexity here means that if P1 and P2 are in the
credal set, then αP1 + (1 − α)P2 is also in the credal set for α ∈ [0, 1]. Thus, as
illustrated by Example 3:
Corollary 1. Given a consistent probabilistic logic program, its credal semantics
is a closed and convex set of probability measures.
    There are several additional results concerning the representation of infinitely
monotone capacities using their Möbius transforms [2,31]; we refrain from men-
tioning every possible corollary we might produce by rehashing those results.
Instead, we focus on a few important results that can be used to great effect
in future applications. First, as we have a finite Herbrand base, we can use the
symbols in the proof of Theorem 1 to write, for any event A [2, Section 5.3.2]:
                   P                              P
           P(A) = θ∈Θ:Γ (θ)⊆A P(θ) , P(A) = θ∈Θ:Γ (θ)∩A6=∅ P(θ) .                (3)
This property directly leads to an inference algorithm described later. In fact,
for infinitely monotone credal sets we can find easy expressions for lower and up-
per conditional probabilities (that is, the infimum and supremum of conditional
probabilities): if A and B are events, then lower and upper probabilities of A
given B are (where the superscript c denotes complement) [2]:
                     P(A ∩ B)                         P(A ∩ B)
    P(A|B) =                  c
                                   and P(A|B) =                      . (4)
               P(A ∩ B) + P(A ∩ B)              P(A ∩ B) + P(Ac ∩ B)
We also note that the computation of lower and upper expected values, and
even conditional expected values, with respect to infinitely monotone Choquet
capacities admits relatively simple expressions [34].
    It is convenient to pause here and mention the constraint logic programming
language of Michels et el. [20], a significant contribution that is based on credal
sets (even though it uses a different syntax and semantics, for instance allowing
continuous variables but not allowing multiple stable models per total choice).
In that work expressions are (conditional) lower and upper probabilities are
derived; those expressions directly mirror the ones presented in this section.

5     The complexity of inferences with credal semantics
In this section we focus on the computation of P(Q) and P(Q), where Q is a
conjunction of assignments to some ground atoms in its Herbrand base. Recall
that we preclude functions, so as to stay with finite Herbrand bases. Hence a
direct translation of Expression (3) into an algorithm leads to:
10

 • Given a plp hP, PFi and Q, initialize a and b with 0.
 • For each total choice θ of probabilistic facts, compute the set S of all stable
   models of P ∪ PF↓θ , and:
          if Q is true in every stable model in S, then a ← a + P(C);
          if Q is true in some stable model of S, then b ← b + P(C).
 • Return a and b (the value of a is P(Q), the value of b is P(Q)).

Note that to find whether Q is true in every stable model of a program, we
must run cautious inference, and to find whether Q is true in some stable model
of a program, we must run brave inference; these logical procedures have been
studied in depth in the literature [11].
    In fact the algorithm above has already been derived by Cali et al. [4], using
clever optimization techniques (Cali et al. actually obtain lower and upper con-
ditional probabilities by combining Expressions (4) with (3)). The advantage of
our approach is that the algorithm is a transparent consequence of known facts
about capacities; other than that, Cali et al. have already presented the algo-
rithm so we do not need to dwell on it. Rather, we wish to focus on the specific
question of how complex is the computation of the lower probability P(Q).
    To be precise: as input, we have a plp whose probabilities are rational num-
bers, and an assignment Q to some ground atoms in the (finite) Herbrand base;
as output, we want (the rational number) P(Q). We refer to this complexity as
the inferential complexity of plps.
    One may also be interested in the complexity of inferences when the plp
is fixed, and the only input is the query Q. This is the query complexity of
the program: input is Q, output is P(Q). This concept is based on analogous
concepts found in database theory [5]: one may face situations where the program
may be small compared to the query, and query complexity is the concept of
practical interest.
    So, we are ready to state our main results on complexity.

Theorem 2. In this theorem, assume that input plps are consistent. Inferential
complexity is #NP-equivalent for propositional plps, and #P-equivalent when
restricted to stratified propositional plps. For plps where all predicates have a
bound on arity, inferential complexity is #NPNP -equivalent, and #NP-equivalent
when restricted to stratified programs. Query complexity is #P-hard; it is in
#NP (up to multiplication by a polynomial number), and is #P-equivalent when
restricted to stratified programs.

    Before we prove this theorem, we must define a number of terms that appear
in it. We deal with usual complexity classes such as NP, and with
                                                                S the concept  of
oracles [24]. The polynomial hierarchy is the class of languages i ∆Pi = i ΠiP =
                                                                        S
                         P                              P
                       Σi−1
                            , ΠiP = coΣiP , ΣiP = NPΣi−1 and Σ0P = P. The com-
S P              P
  i Σi , where ∆i = P
plexity class #P is the class of integer-valued functions computed by a counting
Turing machine in polynomial time; a counting Turing machine is a standard
non-deterministic Turing machine that prints in binary notation, on a separate
tape, the number of accepting computations induced by the input [32]. Similarly
                                                                                                            11

to the polynomial hierarchy, the counting hierarchy is defined by oracles. The
class #ΣkP contains the integer-valued functions computed by a counting Turing
machine with access to a ΣkP oracle. The counting hierarchy is the union of all
#ΣkP (over all integers k). The problems with which we are concerned involve
the computation of rational numbers and do not precisely fit into the class of #P
problems and its extensions, so we resort to weighted reductions [3], which are
parsimonious reductions (Karp reductions that preserve the number of accepting
paths) scaled by a positive rational number. The canonical complete problem
for the class #ΣkP via parsimonious reductions is #Πk sat [10], that gets, as
input, a formula ϕ(Y1 , . . . , Yn ) = ∀ X1 ∃ X2 . . . Qk Xk ψ(X1 , . . . , Xk , Y1 , . . . , Yn ),
where ψ is a CNF formula over variables X1 , . . . , Xk , Y1 , . . . , Yn , and produces,
as output, the number of assignments to the variables Y1 , . . . , Yn that make the
formula ϕ evaluate to true. For k odd, the problem remains complete even if the
formula is specified in disjunctive normal form [10]. For a complexity class #C
of integer-valued functions computed by some (oracle) counting Turing machine,
we say that a problem is #C-hard if any problem in #C can be reduced to it
by a weighted reduction. If a problem is #C-hard and can be solved with one
call to a #C oracle and a multiplication by a rational obtained with polynomial
effort, then the problem is said to be #C-equivalent (as inspired by [8]).

Proof (Theorem 2). All results for stratified programs are available in a compan-
ion publication on the complexity of distribution semantics [6]. So the remainder
of this proof focuses on the general (possibly cyclic) case.
For propositional programs: Membership follows as cautious logical reasoning is
coNP-complete [11, Table 2]; inference obtains by counting the result of cautious
inference, so we obtain membership in #NP. Hardness is shown by a reduction
from #Π1 SAT: Count the number of assignments of x1 , . . . , xn such that, for ϕ a
CNF formula with clauses c1 , . . . , ck , ∀y1 , . . . , ym ϕ(x1 , . . . , xn , y1 , . . . , ym ). Write
rules ti :− not fi . and fi :− not ti . for every variable yi (thus there are 2m stable
models running through assignments of y1 , . . . , ym ). For each xi introduce xi and
probabilistic fact 0.5 :: xi . For every clause cj write cj :− `. for each one of the
literals in cj , where ` is xi or not xi or ti or fi respectively if the literal is xi or
¬xi or yi or ¬yi . Finally, write sat :− c1 , . . . , ck . The probability of a total choice
of x1 , . . . , xn adds to P(sat) iff every stable model satisfies all clauses. Hence the
desired counting is 2n P(sat).
For programs where predicates have bounded arity: Membership follows as cau-
tious logical reasoning is Π2P -complete [11, Table 5]; inference obtains by count-
ing the result of cautious inference, so we obtain membership in #NPNP . Hard-
ness is shown by a reduction from #Π2 SAT: Count the number of x1 , . . . , xn
s.t. ∀y1 , . . . , ym ∃z1 , . . . , zs ϕ(x1 , . . . , xm , y1 , . . . , ym , z1 , . . . , zs ), where ϕ is a 3-
CNF formula with clauses c1 , . . . , ck . The reduction is a combination of the
previous reduction with the one used by Eiter et al. [11]. Introduce, as before,
predicates xi , ti and fi , and probabilistic facts 0.5 :: xi and rules ti :− not fi . and
fi :− not ti .. Introduce predicates cj but now the arity of cj equals the num-
ber of variables zi in cj . Denote by Zi a logical variable that corresponds to
the propositional variable zi , and by Zj the logical variables corresponding to
12

propositional variables zi appearing in clause cj . For each clause cj , go over the
possible assignments zj of the propositional variables associated with Zj . If this
assignment makes the clause true, add the fact cj (zj ).. If instead the assignment
leaves the clause dependent on some propositional variables xi or yi , then for
every literal li (over a variable xi or yi ) introduce the rule: cj (zj ) :− `., where `
is xi or not xi or ti or fi exactly as in the previous reduction. Finally, introduce
sat :− c1 (Z1 ), . . . , ck (Zk ).. As before, the desired count is obtained by 2n P(sat).
Concerning query complexity, membership follows from the fact that data com-
plexity of logical cautious reasoning is coNP [7, Theorem 5.8]. 2
    We leave to future work a number of questions. First, we conjecture that
the complexity of computing P(Q) is the same as computing P(Q) (as we only
need to change from cautious to brave reasoning). Also, we did not present
results on the complexity of probabilistic logic programs without a bound on
arity (without such a bound, logical cautious reasoning is coNEXP-complete,
so we conjecture that exponentially bounded counting Turing machines will be
needed here). Finally, our complexity results were obtained assuming that plps
were consistent: of course, in practice one must consider the problem of checking
consistency. We just indicate a few facts about consistency checking:
Proposition 1. Consistency checking is in Π2P for propositional plps and in
Π3P for plps where predicates have a bound on arity.
Proof. Consistency checking of a propositional plp obtains by verifying whether
logical consistency holds for each total choice of probabilistic facts, and this
can be accomplished by deciding whether all total choices satisfy consistency;
because logical consistency checking for this language is NP-complete [11, Table
1], we obtain membership to Π2P . An analogue reasoning leads to Π3P as logical
consistency checking with bounded arity is Σ2P -complete [11, Table 4].

6     Discussion: moving to answer set programming
We have studied the credal semantics for probabilistic logic program, as pro-
posed by Lukasiewicz [19]. This semantics is quite attractive and is not as well
known as it deserves. We have shown that the credal semantics of a plp is an
infinite monotone credal set, and we have derived novel complexity results (using
non-trivial classes in the counting hierarchy). In future work we plan to look at
consistency checking, and also at the consequences of allowing functions. More-
over, we must include in the analysis a number of useful constructs adopted
by answer set programming [12]. There, classic negation, such as ¬wins(X), is
allowed on top of not. Also, constraints, such as :− φ, are allowed to mean that
φ is false. More substantial is the presence, in answer set programming, of dis-
junctive heads. With such a machinery, we can for instance rewrite the rules in
Example 3 as a single rule single(X) ∨ husband(X) :− man(X)., and the rules in
Example 4 as the pair:
     coloredBy(V, red) ∨ coloredBy(V, yellow) ∨ coloredBy(V, green) :− vertex(V ).
                   :− edge(V, U ), coloredBy(V, C), coloredBy(U, C).
                                                                                    13

   Now the point to be made is this. Suppose we have a probabilistic logic
program hP, PFi, where as before we have independent probabilistic facts, but
where P is now a logic program with classic negation, constraints, disjuctive
heads, and P is consistent in that it has stable models for every total choice of
probabilistic facts. The proof of Theorem 1 can be reproduced in this setting,
and hence the credal semantics (the set of measures over stable models) of these
probabilistic answer set programs is again an infinite monotone credal set. The
complexity of inference with these constructs is left for future investigation.


Acknowledgements
The first author is partially supported by CNPq, grant 308433/2014-9. The sec-
ond author received financial support from the São Paulo Research Foundation
(FAPESP), grant 2016/01055-1.


References
 1. K. R. Apt and M. Bezem. Acyclic programs. New Generation Computing, 9:335–
    363, 1991.
 2. T. Augustin, F. P. A. Coolen, G. de Cooman, and M. C. M. Troffaes. Introduction
    to Imprecise Probabilities. Wiley, 2014.
 3. A. Bulatov, M. Dyer, L. Ann Goldberg, M. Jalsenius, M. Jerrum, and D. Richerby.
    The complexity of weighted and unweighted #CSP. J. of Computer and System
    Sciences, 78:681–688, 2012.
 4. A, Calı̀, T. Lukasiewicz, L. Predoiu, and H. Stuckenschmidt. Tightly coupled prob-
    abilistic description logic programs for the semantic web. In J. on Data Semantics
    XII, pages 95–130. Springer, 2009.
 5. F. G. Cozman and D. D. Mauá. Bayesian networks specified using propositional
    and relational constructs: Combined, data, and domain complexity. In AAAI Conf.
    on Artificial Intelligence, 2015.
 6. F. G. Cozman and D. D. Mauá. Probabilistic graphical models specified by prob-
    abilistic logic programs: semantics and complexity. In Int. Conf. on Probabilistic
    Graphical Models, 2016.
 7. E. Dantsin, T. Eiter, and A. Voronkov. Complexity and expressive power of logic
    programming. ACM Computing Surveys, 33(3):374–425, 2001.
 8. C. P. de Campos, G. Stamoulis, and D. Weyland. A structured view on weighted
    counting with relations to quantum computation and applications. Technical Re-
    port 133, Electronic Colloquium on Computational Complexity, 2013.
 9. L. de Raedt, P. Frasconi, K. Kersting, and S. H. Muggleton. Probabilistic Inductive
    Logic Programming. Springer, 2010.
10. A. Durand, M. Hermann, and P. G. Kolaitis. Subtractive reductions and com-
    plete problems for counting complexity classes. Theoretical Computer Science,
    340(3):496–513, 2005.
11. T. Eiter, W. Faber, M. Fink, and S. Woltran. Complexity results for answer set
    programming with bounded predicate arities and implications. Annals of Mathe-
    matics and Artificial Intelligence, 5:123–165, 2007.
12. T. Eiter, G. Ianni, and T. Krennwalner. Answer set programming: a primer. In
    Reasoning Web, pages 40–110. Springer-Verlag, 2009.
14

13. D. Fierens, G. van den Broeck, J. Renkens, D. Shrerionov, B. Gutmann, Gerda
    Janssens, and Luc de Raedt. Inference and learning in probabilistic logic programs
    using weighted Boolean formulas. Theory and Practice of Logic Programming,
    15(3):358–401, 2014.
14. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming.
    In Int. Logic Programming Conf. and Symp., pages 1070–1080, 1988.
15. S. Hadjichristodoulou and D. S. Warren. Probabilistic logic programming with
    well-founded negation. In Symp. on Multiple-Valued Logic, pages 232–237, 2012.
16. S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on
    graphical structures and their application to expert systems. J. Royal Statistical
    Society B, 50(2):157–224, 1988.
17. T. Lukasiewicz. Probabilistic logic programming. In European Conf. on Artificial
    Intelligence, pages 388–392, 1998.
18. T. Lukasiewicz. Probabilistic description logic programs. In European Conf. on
    Symbolic and Quantitative Approaches to Reasoning with Uncertainty, pages 737–
    749, Barcelona, Spain, July 2005. Springer.
19. T. Lukasiewicz. Probabilistic description logic programs. Int. J. of Approximate
    Reasoning, 45(2):288–307, 2007.
20. S. Michels, A. Hommersom, P. J. F. Lucas, M. Velikova. A new probabilistic con-
    straint logic programming language based on a generalised distribution semantics.
    Artificial Intelligence Journal, 228:1–44, 2015.
21. I. Molchanov. Theory of Random Sets. Springer, 2005.
22. R. Ng and V. S. Subrahmanian. Probabilistic logic programming. Information and
    Computation, 101(2):150–201, 1992.
23. L. Ngo and P. Haddawy. Answering queries from context-sensitive probabilistic
    knowledge bases. Theoretical Computer Science, 171(1–2):147–177, 1997.
24. C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
25. D. Poole. Probabilistic Horn abduction and Bayesian networks. Artificial Intelli-
    gence, 64:81–129, 1993.
26. D. Poole. The Independent Choice Logic and beyond. In Probabilistic Inductive
    Logic Programming, pages 222–243. Springer, 2008.
27. F. Riguzzi. The distribution semantics is well-defined for all normal programs. Int.
    Workshop on Probabilistic Logic Programming, pages 69–84, 2015.
28. T. Sato. A statistical learning method for logic programs with distribution seman-
    tics. In Int. Conf. on Logic Programming, pages 715–729, 1995.
29. T. Sato and Y. Kameya. Parameter learning of logic programs for symbolic-
    statistical modeling. J. of Artificial Intelligence Research, 15:391–454, 2001.
30. T. Sato, Y. Kameya, and N.-Fa Zhou. Generative modeling with failure in PRISM.
    In Int. Joint Conf. on Artificial Intelligence, pages 847–852, 2005.
31. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976.
32. L. G. Valiant. The complexity of enumeration and reliability problems. SIAM J.
    of Computing, 8(3):410–421, 1979.
33. A. van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general
    logic programs. Association for Computing Machinery, 38(3):620–650, 1991.
34. L. Wasserman and J. B. Kadane. Computing bounds on expectations. J. of the
    American Statistical Association, 87(418):516–522, 1992.

</pre>