=Paper=
{{Paper
|id=Vol-1661/paper-01
|storemode=property
|title=The Structure and Complexity of Credal Semantics
|pdfUrl=https://ceur-ws.org/Vol-1661/paper-01.pdf
|volume=Vol-1661
|authors=Fabio Cozman,Denis Mauá
|dblpUrl=https://dblp.org/rec/conf/ilp/CozmanM16
}}
==The Structure and Complexity of Credal Semantics==
The Structure and Complexity of Credal Semantics Fabio G. Cozman and Denis D. Mauá Universidade de São Paulo, Brazil Abstract. A flexible semantics has been proposed by Lukasiewicz for probabilistic logic programs where we have a normal logic program aug- mented with a set of independent probabilistic facts. That semantics, which we call credal semantics, is the set of all probability measures (over stable models) that are consistent with a total choice of proba- bilistic facts. When each total choice produces a definite program, credal semantics is identical to Sato’s distribution semantics. However, credal semantics is also defined for programs with cycles and negations. We show that the credal semantics always defines a set containing the prob- ability measures that dominate an infinite monotone Choquet capacity (also known as a belief function). We also show how this result leads to inference algorithms and to an analysis of the complexity of inferences. Keywords: Probabilistic logic programming, answer sets, stable model semantics, complexity theory. 1 Introduction The term “probabilistic logic program” encompasses a significant number of distinct syntactic proposals [9,17,22,23]. One particularly successful proposal is jointly represented by Poole’s probabilistic Horn abduction [25] and Sato’s distribution semantics [28]. There the idea is that a logic program is enlarged with probabilistic facts that are associated with probabilities and that are usually assumed stochastically independent. For instance, we may have a rule: up :− actionUp, not disturbanceUp. with P(disturbanceUp = true) = 0.1. Depending on disturbanceUp, actionUp may succeed or not in leading to up. Poole and Sato emphasized acyclic logic programs [25,26,28,29], even though Sato did handle cyclic ones. Since then, there has been work on cyclic proba- bilistic logic programs under variants of the distribution semantics [15,18,27,30]. In particular, a semantics for (acyclic and cyclic) probabilistic logic programs can be extracted from the work of Lukasiewicz on probabilistic description log- ics [18,19]. His proposal is that a probabilistic logic program defines a set of probability measures: for a fixed truth assignment over probabilistic facts, we take the set of all probabilities over stable models of the resulting normal logic program. This approach may seem a little complex, but it stays within two- valued logic and is directly related to answer set programming languages. 4 Lukasiewicz refers to his approach as the “answer set semantics” for proba- bilistic logic programs. However, as the approach is certainly distinct from the usual answer set semantics (which does not employ probabilities), and as the approach certainly applies in the presence of functions (usually not allowed in answer set programming), we adopt the name credal semantics to refer to it. In this paper we study the structure and complexity of the credal seman- tics. We show that the semantics of a consistent probabilistic logic program is, surprisingly, the set of dominating measures of an infinitely monotone Choquet capacity. Such capacities are relatively simple objects that appear in several fields [21,31]. We obtain a straightforward derivation for inference algorithms, and we study the complexity of inferences, obtaining interesting novel results. The paper is organized as follows. Basic terminology is reviewed in Section 2, and the credal semantics is presented in Section 3. The structure of the credal semantics is derived in Section 4, and its consequences concerning inference algorithms and complexity are discussed in Section 5. 2 A very short review: some syntax, some semantics An atom is written as r(t1 , . . . , tk ), where r is a predicate and each ti is a term; that is, each ti is a constant or a logical variable. A normal logic program is a finite set of rules such as A0 :− A1 , . . . , Am , not Am+1 , . . . , not An ., where each Ai is an atom. Atom A0 is the head of the rule; the right hand side is the body. The body consists of a set of subgoals separated by commas (A1 is a subgoal, not An is a subgoal). If there are no subgoals, the rule is called a fact and written simply as A0 .. Programs without not are definite. An atom is ground if it does not contain any logical variable; a program without logical variables is propositional. The Herbrand base of a program is the set of all ground atoms built from constants and predicates mentioned in the program. By grounding every rule with atoms in the Herbrand base, we obtain the (propositional) grounding of a program. The grounded dependency graph of a normal logic program is a directed graph where each grounded predicate is a node, and where there is an edge from node B to node C if there is a grounded rule where C appears in the head and B appears in the body; if B is preceded by not in the grounded rule, then the edge between B and C is negative. A program is acyclic when its grounded dependency graph is acyclic. An interpretation for a normal logic program P is a subset of the Herbrand base of P; the idea is that atoms in the interpretation are true, and atoms not in the interpretation are false. A model is an interpretation such that every grounded rule is satisfied (that is, either the head is true or there is a subgoal that is false in the interpretation). In this paper we do not allow functions to be used in programs, so as to stay within finite Herbrand bases. We leave the study of functions to future work. One strategy to define the semantics of normal logic programs is to translate programs into some well-known logic, using a completion, and to take the models of the completion as the semantics. Another strategy is to select some “canonical” models as “the” semantics. Two canonical models have attracted most attention 5 due to their favorable properties: the well-founded [33] and the stable model semantics [14]. In this paper we are concerned with the latter (stable model) semantics. To define it, take normal logic program P. For interpretation I, define the reduct PI to be the definite program obtained by (i) grounding P, (ii) removing all rules that contain a subgoal not A in the body such that A is in I, (iii) removing all remaining subgoals of the form not A from the remaining rules. An interpretation I is a stable model if I is a minimal model for PI (minimal according to set inclusion). Note that a definite program always has a single minimal model, so the whole construction is well defined. An answer set is exactly a stable model. 3 Probabilistic logic programs A probabilistic logic program is a pair hP, PFi where P is a normal logic program and PF is a set of probabilistic facts (we abbreviate “probabilistic logic program” by plp). A probabilistic fact is an atom A that does not unify with any rule head (hence with any fact either), and that is associated with a probability assessment P(A) = α. We write a probabilistic fact as α :: A., where we have adopted the syntax of the ProbLog system1 [13]. If A is a ground atom, then α :: A is a ground probabilistic fact. A probabilistic fact with logical variables is interpreted through its groundings (for instance, α :: r(X1 , . . . , Xm ) is interpreted as the set of α :: r(a1 , . . . , ak ) for all groundings of r). A truth assignment for all grounded probabilistic facts in PF is a total choice. If θ is a total choice for PF, we denote by PF↓θ the set of atoms assigned true by θ. We assume that all probabilistic facts are independent, so the probability of total choice θ = {A1 = t1 , . . . , Ak = tk }, where each ti is true or false, is P(θ) = P(A1 = t1 , . . . , Ak = tk ) = P(A1 = t1 ) × · · · × P(An = tn ) , (1) where P(Ai = true) = αi and P(Ai = false) = 1 − αi given αi :: Ai . Once a total choice is fixed, we can form a normal logic program consisting of P ∪ PF↓θ . Poole’s and Sato’s original work focused respectively on acyclic and definite programs, and for these programs the semantics of P ∪ PF↓θ is rather uncontroversial, and is given by their unique stable model. Hence, every such probabilistic logic program induces a unique probability distribution over all atoms. For cyclic programs, Sato, Kameya and Zhou later adopted Fitting’s three-valued semantics for each fixed total choice [30], while Hadjichristodoulou and Warren [15] and Riguzzi [27] have explored the well-founded semantics. An alternative semantics can be extracted from the work on probabilistic description logic programs by Lukasiewicz [19], as noted in Section 1. To describe that proposal, a few definitions are needed. A plp hP, PFi is consistent if there is at least one stable model for each fixed total choice of PF. A probability model for a consistent plp hP, PFi is a probability measure P over interpretations of P, such that: 1 Available at https://dtai.cs.kuleuven.be/problog/index.html. 6 (i) every interpretation I with P(I) > 0 is a stable model of P ∪ PF↓θ for the total choice θ that agrees with I on the probabilistic facts; and (ii) for every total choice θ = {A1 = t1 , . . . , Ak = tk }, where ti is true or false, Qk we have P(θ) = i=1 P(Ai = ti ). The set of all probability models for a plp is the semantics of the program. Lukasiewicz calls his proposed semantics the answer set semantics for prob- abilistic description logic programs. As we mentioned before, we prefer the term credal semantics, which we adopt. The reason for this latter name is that a set of probability measures is often called a credal set [2]. Now given a consistent plp, we may be interested in the smallest possible value of P(A) for some ground atom A, with respect to the set K of all probability models of the plp. This is conveyed by the lower probability of A, P(A) = inf P∈K P(A). Similarly, we have the upper probability of A, P(A) = supP∈K P(A). It is perhaps time for a few much needed examples. We start with two ex- amples where each total choice produces a single stable model, and then move to two examples where some total choices lead to several stable models. Example 1. First, consider an acyclic probabilistic logic program hP, PFi. When P is acyclic, each total choice for PF yields an acyclic program (with a unique stable model that is also its well-founded semantics [1]), hence a single probability model is obtained (a singleton credal set). For example, consider a fragment of the famous Bayesian network Asia [16] with three atoms, smoking, cancer, and bronchitis, with P(smoking) = 1/2, P(cancer|smoking) = 1/10, P(cancer|¬smoking) = 1/100, P(bronchitis|smoking) = 3/5, and P(bronchitis|¬smoking) = 3/10. These probabilities can be expressed through the following plp: 0.5 :: smoking. 0.1 :: a1. 0.01 :: a2. 0.6 :: a3. 0.3 :: a4. cancer :− smoking, a1. cancer :− not smoking, a2. bronchitis :− smoking, a3. bronchitis :− not smoking, a4. This plp has a single probability model, yielding for instance P(smoking|cancer) = P(smoking|cancer) = 1/11. In fact, any Bayesian network over binary variables can be encoded using an acyclic probabilistic logic program [25,26]. 2 Example 2. Sometimes a cyclic probabilistic logic program hP, PFi still has a single probability model. This happens for instance if P is locally stratified; that is, if its grounded dependency graph has no cycles containing a negative edge. If P is a locally stratified program, then any total choice of probabilistic facts produces a locally stratified normal logic program. And for locally stratified programs, most existing semantics, and particularly the well-founded and the stable model semantics, coincide and produce a single stable model. For example, consider a plp where p means “path” and e means “edge” (based on an example in the ProbLog distribution): p(X, Y ) :− e(X, Y ). p(X, Y ) :− e(X, Y ), p(X, Y ). 0.6 :: e(1, 2). 0.1 :: e(1, 3). 0.4 :: e(2, 5). 0.3 :: e(2, 6). (2) 0.3 :: e(3, 4). 0.8 :: e(4, 5). 0.2 :: e(5, 6). 7 1 3 5 1 3 5 1 3 5 1 3 5 2 4 2 4 2 4 2 4 Fig. 1. Graph described in Example 4 (left), and the stable models. That is, we have a random graph with nodes 1, . . . , 6, and probabilities attached to edges. The query P(p(1, 6) = true) yields the probability that there is a path between nodes 1 and 6. Using ProbLog one obtains P(p(1, 6) = true) = 0.217. 2 Example 3. A common cyclic pattern in the literature is the pair of rules a :− not b. and b :− not a. [33]. Here is a more elaborated version, adapted from Ref. [12]. Take probabilistic fact 0.9 :: man(dilbert). and rules: single(X) :− man(X), not husband(X). husband(X) :− man(X), not single(X). When man(dilbert) is false, there is a single stable model s1 = {man(dilbert) = false, husband(dilbert) = false, single(dilbert) = false}. When man(dilbert) is true, there are two stable models, s2 = {man(dilbert) = true, husband(dilbert) = true, single(dilbert) = false}, and s3 = {man(dilbert) = true, husband(dilbert) = false, single(dilbert) = true}. Now, there is a probability model (that is, a prob- ability measure over the stable models) such that P(s1 ) = 0.1 and P(s2 ) = 0.9 (hence P(s3 ) = 0), and a probability set model such that P(s1 ) = 0.1 and P(s3 ) = 0.9. In fact, any probability measure such that P(s1 ) = 0.1, P(s2 ) = 0.9γ, and P(s3 ) = 0.9(1 − γ), for γ ∈ [0, 1], is also a probability model. 2 Example 4. Consider a graph coloring problem consisting of the rules: coloredBy(V, red) :− not coloredBy(V, yellow), not coloredBy(V, green), vertex(V ). coloredBy(V, yellow) :− not coloredBy(V, red), not coloredBy(V, green), vertex(V ). coloredBy(V, green) :− not coloredBy(V, red), not coloredBy(V, yellow), vertex(V ). noClash :− not noClash, edge(V, U ), coloredBy(V, C), coloredBy(U, C). and the facts: for i ∈ {1, . . . , 5}, vertex(i)., and coloredBy(2, red). coloredBy(5, green). 0.5 :: edge(4, 5). edge(1, 3). edge(1, 4). edge(2, 1). edge(2, 4). edge(3, 5). edge(4, 3). The facts mentioning vertex and edge encode the graph in Figure 1 (left); the probabilistic fact is indicated as a dashed edge. For a fixed total choice, the sta- ble models of the program are the 3-colorings of the graph. Thus, if edge(4, 5) is true, there is a single stable model; otherwise, there are two stable mod- els. We obtain P(coloredBy(1, yellow)) = 0 and P(coloredBy(1, yellow)) = 1/2, while P(coloredBy(4, yellow)) = 1/2 and P(coloredBy(4, yellow)) = 1, and finally P(coloredBy(3, yellow)) = P(coloredBy(3, yellow)) = 1. 2 8 Example 5. Consider a game where one wins whenever the opponent has no moves [33], with rule wins(X) :− move(X, Y ), not wins(Y ). and the following moves, one of which is probabilistic: move(a, b). move(b, a). move(b, c). 0.3 :: move(c, d).. If move(c, d) is false, there is a single stable model (where wins(b) is the only winning position); otherwise, there are two stable models (wins(c) is true in both of them; wins(a) is true in one, while wins(b) is true in the other). Thus P(wins(a)) = 0.0 and P(wins(a)) = 0.3; P(wins(b)) = 0.7 and P(wins(b)) = 1.0; and P(wins(c)) = 0.3 and P(wins(c)) = 0.3. 2 Finally, consider a program without credal semantics: Example 6. If the barber shaves every villager who does not shave himself, does the barber shave himself? The program shaves(X, Y ) :− barber(X), villager(Y ), not shaves(X, Y ). 0.5 :: barber(bob). 0.5 :: villager(bob). does not have a stable model when both probabilistic facts are true (we obtain the pattern p :− not p). Note that even in this case the resulting normal logic program has well-founded semantics (assigning undefined to shaves(bob, bob)). 2 4 The structure of credal semantics Given the generality of plps, one might think that credal sets generated by the credal semantics can have an arbitrarily complex structure. Surprisingly, the structure of the credal semantics of a plp is a relatively simple object: Theorem 1. Given a consistent probabilistic logic program, its credal semantics is a set of probability measures that dominate an infinitely monotone Choquet capacity. Before we present a proof of this theorem, let us pause and define a few terms. An infinitely monotone Choquet capacity is a set function P from an algebra A on a set Ω to the real interval [0, 1] such that [2, Definition 4.2]: P(Ω) = 1 − P(∅) = 1 and, for any A1 , . . . , An in the algebra, P(∪i Ai ) ≥ |J|+1 P J⊆{1,...,n} (−1) P(∩j∈J Aj ). Infinitely monotone Choquet capacities appear in several formalisms; for instance, they are the belief functions of Dempster- Shafer theory [31], and summaries of random sets [21]. Given a set-function P, we can construct a set of measures that dominate P as {P : ∀A ∈ A : P(A) ≥ P(A)}. We abuse language and say that this set itself is infinitely monotone. If a credal set K is infinitely monotone, then the lower probability P, defined as P(A) = inf P∈K P(A), is the generating infinitely monotone Choquet capacity. We also have the upper probability P(A) = supP∈K P(A) = 1 − P(Ac ). Proof (Theorem 1). Consider a set Θ containing as states the posssible total choices of the probabilistic facts. Over this space we have a product measure that is completely specified by the probabilities attached to probabilistic facts. Now 9 consider a multi-valued mapping Γ between Θ and the space Ω of all possible models of our probabilistic logic program. For each element θ ∈ Θ, define Γ (θ) to be the set of stable models associated with the total choice θ of the probabilistic facts. Now we use the fact that a probability space and a multi-valued mapping induce an infinite monotone Choquet capacity over the range of the mapping (that is, over Ω) [21]. 2 Infinitely monotone credal sets have several useful properties; for one thing they are closed and convex. Convexity here means that if P1 and P2 are in the credal set, then αP1 + (1 − α)P2 is also in the credal set for α ∈ [0, 1]. Thus, as illustrated by Example 3: Corollary 1. Given a consistent probabilistic logic program, its credal semantics is a closed and convex set of probability measures. There are several additional results concerning the representation of infinitely monotone capacities using their Möbius transforms [2,31]; we refrain from men- tioning every possible corollary we might produce by rehashing those results. Instead, we focus on a few important results that can be used to great effect in future applications. First, as we have a finite Herbrand base, we can use the symbols in the proof of Theorem 1 to write, for any event A [2, Section 5.3.2]: P P P(A) = θ∈Θ:Γ (θ)⊆A P(θ) , P(A) = θ∈Θ:Γ (θ)∩A6=∅ P(θ) . (3) This property directly leads to an inference algorithm described later. In fact, for infinitely monotone credal sets we can find easy expressions for lower and up- per conditional probabilities (that is, the infimum and supremum of conditional probabilities): if A and B are events, then lower and upper probabilities of A given B are (where the superscript c denotes complement) [2]: P(A ∩ B) P(A ∩ B) P(A|B) = c and P(A|B) = . (4) P(A ∩ B) + P(A ∩ B) P(A ∩ B) + P(Ac ∩ B) We also note that the computation of lower and upper expected values, and even conditional expected values, with respect to infinitely monotone Choquet capacities admits relatively simple expressions [34]. It is convenient to pause here and mention the constraint logic programming language of Michels et el. [20], a significant contribution that is based on credal sets (even though it uses a different syntax and semantics, for instance allowing continuous variables but not allowing multiple stable models per total choice). In that work expressions are (conditional) lower and upper probabilities are derived; those expressions directly mirror the ones presented in this section. 5 The complexity of inferences with credal semantics In this section we focus on the computation of P(Q) and P(Q), where Q is a conjunction of assignments to some ground atoms in its Herbrand base. Recall that we preclude functions, so as to stay with finite Herbrand bases. Hence a direct translation of Expression (3) into an algorithm leads to: 10 • Given a plp hP, PFi and Q, initialize a and b with 0. • For each total choice θ of probabilistic facts, compute the set S of all stable models of P ∪ PF↓θ , and: if Q is true in every stable model in S, then a ← a + P(C); if Q is true in some stable model of S, then b ← b + P(C). • Return a and b (the value of a is P(Q), the value of b is P(Q)). Note that to find whether Q is true in every stable model of a program, we must run cautious inference, and to find whether Q is true in some stable model of a program, we must run brave inference; these logical procedures have been studied in depth in the literature [11]. In fact the algorithm above has already been derived by Cali et al. [4], using clever optimization techniques (Cali et al. actually obtain lower and upper con- ditional probabilities by combining Expressions (4) with (3)). The advantage of our approach is that the algorithm is a transparent consequence of known facts about capacities; other than that, Cali et al. have already presented the algo- rithm so we do not need to dwell on it. Rather, we wish to focus on the specific question of how complex is the computation of the lower probability P(Q). To be precise: as input, we have a plp whose probabilities are rational num- bers, and an assignment Q to some ground atoms in the (finite) Herbrand base; as output, we want (the rational number) P(Q). We refer to this complexity as the inferential complexity of plps. One may also be interested in the complexity of inferences when the plp is fixed, and the only input is the query Q. This is the query complexity of the program: input is Q, output is P(Q). This concept is based on analogous concepts found in database theory [5]: one may face situations where the program may be small compared to the query, and query complexity is the concept of practical interest. So, we are ready to state our main results on complexity. Theorem 2. In this theorem, assume that input plps are consistent. Inferential complexity is #NP-equivalent for propositional plps, and #P-equivalent when restricted to stratified propositional plps. For plps where all predicates have a bound on arity, inferential complexity is #NPNP -equivalent, and #NP-equivalent when restricted to stratified programs. Query complexity is #P-hard; it is in #NP (up to multiplication by a polynomial number), and is #P-equivalent when restricted to stratified programs. Before we prove this theorem, we must define a number of terms that appear in it. We deal with usual complexity classes such as NP, and with S the concept of oracles [24]. The polynomial hierarchy is the class of languages i ∆Pi = i ΠiP = S P P Σi−1 , ΠiP = coΣiP , ΣiP = NPΣi−1 and Σ0P = P. The com- S P P i Σi , where ∆i = P plexity class #P is the class of integer-valued functions computed by a counting Turing machine in polynomial time; a counting Turing machine is a standard non-deterministic Turing machine that prints in binary notation, on a separate tape, the number of accepting computations induced by the input [32]. Similarly 11 to the polynomial hierarchy, the counting hierarchy is defined by oracles. The class #ΣkP contains the integer-valued functions computed by a counting Turing machine with access to a ΣkP oracle. The counting hierarchy is the union of all #ΣkP (over all integers k). The problems with which we are concerned involve the computation of rational numbers and do not precisely fit into the class of #P problems and its extensions, so we resort to weighted reductions [3], which are parsimonious reductions (Karp reductions that preserve the number of accepting paths) scaled by a positive rational number. The canonical complete problem for the class #ΣkP via parsimonious reductions is #Πk sat [10], that gets, as input, a formula ϕ(Y1 , . . . , Yn ) = ∀ X1 ∃ X2 . . . Qk Xk ψ(X1 , . . . , Xk , Y1 , . . . , Yn ), where ψ is a CNF formula over variables X1 , . . . , Xk , Y1 , . . . , Yn , and produces, as output, the number of assignments to the variables Y1 , . . . , Yn that make the formula ϕ evaluate to true. For k odd, the problem remains complete even if the formula is specified in disjunctive normal form [10]. For a complexity class #C of integer-valued functions computed by some (oracle) counting Turing machine, we say that a problem is #C-hard if any problem in #C can be reduced to it by a weighted reduction. If a problem is #C-hard and can be solved with one call to a #C oracle and a multiplication by a rational obtained with polynomial effort, then the problem is said to be #C-equivalent (as inspired by [8]). Proof (Theorem 2). All results for stratified programs are available in a compan- ion publication on the complexity of distribution semantics [6]. So the remainder of this proof focuses on the general (possibly cyclic) case. For propositional programs: Membership follows as cautious logical reasoning is coNP-complete [11, Table 2]; inference obtains by counting the result of cautious inference, so we obtain membership in #NP. Hardness is shown by a reduction from #Π1 SAT: Count the number of assignments of x1 , . . . , xn such that, for ϕ a CNF formula with clauses c1 , . . . , ck , ∀y1 , . . . , ym ϕ(x1 , . . . , xn , y1 , . . . , ym ). Write rules ti :− not fi . and fi :− not ti . for every variable yi (thus there are 2m stable models running through assignments of y1 , . . . , ym ). For each xi introduce xi and probabilistic fact 0.5 :: xi . For every clause cj write cj :− `. for each one of the literals in cj , where ` is xi or not xi or ti or fi respectively if the literal is xi or ¬xi or yi or ¬yi . Finally, write sat :− c1 , . . . , ck . The probability of a total choice of x1 , . . . , xn adds to P(sat) iff every stable model satisfies all clauses. Hence the desired counting is 2n P(sat). For programs where predicates have bounded arity: Membership follows as cau- tious logical reasoning is Π2P -complete [11, Table 5]; inference obtains by count- ing the result of cautious inference, so we obtain membership in #NPNP . Hard- ness is shown by a reduction from #Π2 SAT: Count the number of x1 , . . . , xn s.t. ∀y1 , . . . , ym ∃z1 , . . . , zs ϕ(x1 , . . . , xm , y1 , . . . , ym , z1 , . . . , zs ), where ϕ is a 3- CNF formula with clauses c1 , . . . , ck . The reduction is a combination of the previous reduction with the one used by Eiter et al. [11]. Introduce, as before, predicates xi , ti and fi , and probabilistic facts 0.5 :: xi and rules ti :− not fi . and fi :− not ti .. Introduce predicates cj but now the arity of cj equals the num- ber of variables zi in cj . Denote by Zi a logical variable that corresponds to the propositional variable zi , and by Zj the logical variables corresponding to 12 propositional variables zi appearing in clause cj . For each clause cj , go over the possible assignments zj of the propositional variables associated with Zj . If this assignment makes the clause true, add the fact cj (zj ).. If instead the assignment leaves the clause dependent on some propositional variables xi or yi , then for every literal li (over a variable xi or yi ) introduce the rule: cj (zj ) :− `., where ` is xi or not xi or ti or fi exactly as in the previous reduction. Finally, introduce sat :− c1 (Z1 ), . . . , ck (Zk ).. As before, the desired count is obtained by 2n P(sat). Concerning query complexity, membership follows from the fact that data com- plexity of logical cautious reasoning is coNP [7, Theorem 5.8]. 2 We leave to future work a number of questions. First, we conjecture that the complexity of computing P(Q) is the same as computing P(Q) (as we only need to change from cautious to brave reasoning). Also, we did not present results on the complexity of probabilistic logic programs without a bound on arity (without such a bound, logical cautious reasoning is coNEXP-complete, so we conjecture that exponentially bounded counting Turing machines will be needed here). Finally, our complexity results were obtained assuming that plps were consistent: of course, in practice one must consider the problem of checking consistency. We just indicate a few facts about consistency checking: Proposition 1. Consistency checking is in Π2P for propositional plps and in Π3P for plps where predicates have a bound on arity. Proof. Consistency checking of a propositional plp obtains by verifying whether logical consistency holds for each total choice of probabilistic facts, and this can be accomplished by deciding whether all total choices satisfy consistency; because logical consistency checking for this language is NP-complete [11, Table 1], we obtain membership to Π2P . An analogue reasoning leads to Π3P as logical consistency checking with bounded arity is Σ2P -complete [11, Table 4]. 6 Discussion: moving to answer set programming We have studied the credal semantics for probabilistic logic program, as pro- posed by Lukasiewicz [19]. This semantics is quite attractive and is not as well known as it deserves. We have shown that the credal semantics of a plp is an infinite monotone credal set, and we have derived novel complexity results (using non-trivial classes in the counting hierarchy). In future work we plan to look at consistency checking, and also at the consequences of allowing functions. More- over, we must include in the analysis a number of useful constructs adopted by answer set programming [12]. There, classic negation, such as ¬wins(X), is allowed on top of not. Also, constraints, such as :− φ, are allowed to mean that φ is false. More substantial is the presence, in answer set programming, of dis- junctive heads. With such a machinery, we can for instance rewrite the rules in Example 3 as a single rule single(X) ∨ husband(X) :− man(X)., and the rules in Example 4 as the pair: coloredBy(V, red) ∨ coloredBy(V, yellow) ∨ coloredBy(V, green) :− vertex(V ). :− edge(V, U ), coloredBy(V, C), coloredBy(U, C). 13 Now the point to be made is this. Suppose we have a probabilistic logic program hP, PFi, where as before we have independent probabilistic facts, but where P is now a logic program with classic negation, constraints, disjuctive heads, and P is consistent in that it has stable models for every total choice of probabilistic facts. The proof of Theorem 1 can be reproduced in this setting, and hence the credal semantics (the set of measures over stable models) of these probabilistic answer set programs is again an infinite monotone credal set. The complexity of inference with these constructs is left for future investigation. Acknowledgements The first author is partially supported by CNPq, grant 308433/2014-9. The sec- ond author received financial support from the São Paulo Research Foundation (FAPESP), grant 2016/01055-1. References 1. K. R. Apt and M. Bezem. Acyclic programs. New Generation Computing, 9:335– 363, 1991. 2. T. Augustin, F. P. A. Coolen, G. de Cooman, and M. C. M. Troffaes. Introduction to Imprecise Probabilities. Wiley, 2014. 3. A. Bulatov, M. Dyer, L. Ann Goldberg, M. Jalsenius, M. Jerrum, and D. Richerby. The complexity of weighted and unweighted #CSP. J. of Computer and System Sciences, 78:681–688, 2012. 4. A, Calı̀, T. Lukasiewicz, L. Predoiu, and H. Stuckenschmidt. Tightly coupled prob- abilistic description logic programs for the semantic web. In J. on Data Semantics XII, pages 95–130. Springer, 2009. 5. F. G. Cozman and D. D. Mauá. Bayesian networks specified using propositional and relational constructs: Combined, data, and domain complexity. In AAAI Conf. on Artificial Intelligence, 2015. 6. F. G. Cozman and D. D. Mauá. Probabilistic graphical models specified by prob- abilistic logic programs: semantics and complexity. In Int. Conf. on Probabilistic Graphical Models, 2016. 7. E. Dantsin, T. Eiter, and A. Voronkov. Complexity and expressive power of logic programming. ACM Computing Surveys, 33(3):374–425, 2001. 8. C. P. de Campos, G. Stamoulis, and D. Weyland. A structured view on weighted counting with relations to quantum computation and applications. Technical Re- port 133, Electronic Colloquium on Computational Complexity, 2013. 9. L. de Raedt, P. Frasconi, K. Kersting, and S. H. Muggleton. Probabilistic Inductive Logic Programming. Springer, 2010. 10. A. Durand, M. Hermann, and P. G. Kolaitis. Subtractive reductions and com- plete problems for counting complexity classes. Theoretical Computer Science, 340(3):496–513, 2005. 11. T. Eiter, W. Faber, M. Fink, and S. Woltran. Complexity results for answer set programming with bounded predicate arities and implications. Annals of Mathe- matics and Artificial Intelligence, 5:123–165, 2007. 12. T. Eiter, G. Ianni, and T. Krennwalner. Answer set programming: a primer. In Reasoning Web, pages 40–110. Springer-Verlag, 2009. 14 13. D. Fierens, G. van den Broeck, J. Renkens, D. Shrerionov, B. Gutmann, Gerda Janssens, and Luc de Raedt. Inference and learning in probabilistic logic programs using weighted Boolean formulas. Theory and Practice of Logic Programming, 15(3):358–401, 2014. 14. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Int. Logic Programming Conf. and Symp., pages 1070–1080, 1988. 15. S. Hadjichristodoulou and D. S. Warren. Probabilistic logic programming with well-founded negation. In Symp. on Multiple-Valued Logic, pages 232–237, 2012. 16. S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, 50(2):157–224, 1988. 17. T. Lukasiewicz. Probabilistic logic programming. In European Conf. on Artificial Intelligence, pages 388–392, 1998. 18. T. Lukasiewicz. Probabilistic description logic programs. In European Conf. on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, pages 737– 749, Barcelona, Spain, July 2005. Springer. 19. T. Lukasiewicz. Probabilistic description logic programs. Int. J. of Approximate Reasoning, 45(2):288–307, 2007. 20. S. Michels, A. Hommersom, P. J. F. Lucas, M. Velikova. A new probabilistic con- straint logic programming language based on a generalised distribution semantics. Artificial Intelligence Journal, 228:1–44, 2015. 21. I. Molchanov. Theory of Random Sets. Springer, 2005. 22. R. Ng and V. S. Subrahmanian. Probabilistic logic programming. Information and Computation, 101(2):150–201, 1992. 23. L. Ngo and P. Haddawy. Answering queries from context-sensitive probabilistic knowledge bases. Theoretical Computer Science, 171(1–2):147–177, 1997. 24. C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994. 25. D. Poole. Probabilistic Horn abduction and Bayesian networks. Artificial Intelli- gence, 64:81–129, 1993. 26. D. Poole. The Independent Choice Logic and beyond. In Probabilistic Inductive Logic Programming, pages 222–243. Springer, 2008. 27. F. Riguzzi. The distribution semantics is well-defined for all normal programs. Int. Workshop on Probabilistic Logic Programming, pages 69–84, 2015. 28. T. Sato. A statistical learning method for logic programs with distribution seman- tics. In Int. Conf. on Logic Programming, pages 715–729, 1995. 29. T. Sato and Y. Kameya. Parameter learning of logic programs for symbolic- statistical modeling. J. of Artificial Intelligence Research, 15:391–454, 2001. 30. T. Sato, Y. Kameya, and N.-Fa Zhou. Generative modeling with failure in PRISM. In Int. Joint Conf. on Artificial Intelligence, pages 847–852, 2005. 31. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976. 32. L. G. Valiant. The complexity of enumeration and reliability problems. SIAM J. of Computing, 8(3):410–421, 1979. 33. A. van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general logic programs. Association for Computing Machinery, 38(3):620–650, 1991. 34. L. Wasserman and J. B. Kadane. Computing bounds on expectations. J. of the American Statistical Association, 87(418):516–522, 1992.