=Paper= {{Paper |id=Vol-3799/paper2PLP24 |storemode=property |title=A First Journey into the Complexity of Statistical Statements in Probabilistic Answer Set Programming |pdfUrl=https://ceur-ws.org/Vol-3799/paper2PLP24.pdf |volume=Vol-3799 |authors=Damiano Azzolini,Markus Hecher |dblpUrl=https://dblp.org/rec/conf/iclp/AzzoliniH24 }} ==A First Journey into the Complexity of Statistical Statements in Probabilistic Answer Set Programming== https://ceur-ws.org/Vol-3799/paper2PLP24.pdf
                         A First Journey into the Complexity of Statistical
                         Statements in Probabilistic Answer Set Programming
                         Damiano Azzolini1 , Markus Hecher2
                         1
                             University of Ferrara, Ferrara, Italy
                         2
                             Massachusetts Institute of Technology, Cambridge, United States


                                        Abstract
                                        Probabilistic Answer Set Programming is an efficient formalism to express uncertain information with an answer
                                        set program (PASP). Recently, this formalism has been extended with statistical statements, i.e., statements that
                                        can encode a certain property of the considered domain, within the PASTA framework. To perform inference,
                                        these statements are converted into answer set rules and constraints with aggregates. The complexity of PASP
                                        has been studied in depth, with results regarding both membership and completeness. However, a complexity
                                        analysis of programs with statements is missing. In this paper, we close this gap by studying the complexity of
                                        PASTA statements.

                                        Keywords
                                        Probabilistic Answer Set Programming, Computational Complexity, Statistical Statements




                         1. Introduction
                        Representing uncertain information with an interpretable language is currently one of the main goal of
                        Statistical Relational Artificial Intelligence [1]. Among the different languages, we have ProbLog [2],
                        Logic Programs with Annotated Disjunctions [3], and Stochastic Logic Programs [4] while some of
                        the possible semantics are the Distribution Semantics [5] and the Credal Semantics [6]. Answer Set
                        Programming [7] is a logic-based language that has been successfully applied to model combinatorial
                        domains. When the information is uncertain, we can adopt the Credal Semantics, that gives a meaning
                        to Answer Set Programs extended with probabilistic facts [2]. These programs are called Probabilistic
                        Answer Set Programs (PASP).
                           More than 30 years ago, Halpern [8] introduced the so-called “Type 1 statments”, also known as
                        “probabilistic conditionals” or “statistical statements”, that allow expressing statistical information about
                        a domain of interest. An example of such a statement is “X% of elements of a domain have the property
                        Y”. Recently, the authors of [9] introduced the possibility to express Type 1 statements with Probabilistic
                        Answer Set Programming, that are converted into a disjunctive rule and a constraint with aggregates,
                        and provided a framework called PASTA.
                           Inference in PASP has been proved hard [6, 10, 11]. For example, the inference task (i.e., the compu-
                        tation of the probability of a query) for propositional programs lies in the PPNP complexity class for
                        programs allowing disjunctive rules only, where PP represents the class of the decision problems that
                        can be solved in polynomial time with a probabilistic Turing machine [12]. If we add stratified negation
                                                            p
                        the complexity jumps to the PPΣ2 class of the polynomial hierarchy [13]. Also in practice, inference in
                        such programs is very expensive [9, 14].
                           In this paper, we focus on PASP composed by a set of probabilistic facts and a set of statistical
                        statements only, and we study the complexity of inference in such programs. Theoretical results show
                        that, in such simple programs, inference can be efficiently computed.
                           The paper is structured as follows: Section 2 discusses some background knowledge related to Answer
                        Set Programming, Probabilistic Answer Set Programming and statistical statements, and Computational
                        Complexity. In Section 3 we study the complexity of programs with statistical statements. Section 4
                        surveys related work and Section 5 concludes the paper.

                          PLP 2024: 11th Workshop on Probabilistic Logic Programming, October, 2024, Dallas, USA.
                                        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Background
2.1. Answer Set Programming
Answer Set Programming [7] (ASP) is one of the possible logic-based languages suitable to model
combinatorial domains. An answer set program is composed by a set of disjunctive rules of the form
                                           ℎ0 ; . . . ; ℎ𝑚 : − 𝑏0 , . . . , 𝑏𝑛
where ℎ0 ; . . . ; ℎ𝑚 is a disjunction of atoms called head and 𝑏0 , . . . , 𝑏𝑛 is a conjunction of literals called
body. We use 𝑛𝑜𝑡 to denote negation. Head and body may be empty: if the heady is empty (but the
body is not), the rule is called constraint; conversely, if the head has only one atom and the body is
empty, it is called fact. An example of disjunctive rule is 𝑎; 𝑏 : − 𝑐, where 𝑎; 𝑏 is the head and 𝑐 is the
body. Another construct often adopted in answer set programs are aggregates [15]. An aggregate has
the form 𝑡0 𝜓0 #𝑓 {𝑒0 ; . . . ; 𝑒𝑛 }𝜓1 𝑡1 , where the 𝑒𝑖 s are aggregate terms of the form 𝑡0 , . . . , 𝑡𝑘 : 𝜙 where
each 𝑡0 is a term and 𝜙 is a conjunction of literals, 𝑓 is an aggregation function such as 𝑐𝑜𝑢𝑛𝑡 or 𝑠𝑢𝑚,
𝜓0 and 𝜓1 are arithmetic comparison operators, and 𝑡0 and 𝑡1 are constants. An example of aggregate
is #𝑐𝑜𝑢𝑛𝑡{𝑋 : 𝑎(𝑋)} > 3.
   We consider the Stable Model Semantics [16] as semantics for answer set programs. Under this
semantics, each program is associated with zero (in this case the program is often termed unsatisfiable)
or more stable models. A program is called ground when there are no variables in it. A non ground
program can be transformed into a ground program with a process called grounding. The Herbrand base
of a ground program 𝑃 is the set of all possible ground atoms that can be constructed using the symbols
defined in 𝑃 , often indicated with 𝐵𝑃 . An interpretation is a subset of 𝐵𝑃 . The reduct of a ground
program 𝑃 w.r.t. a subset 𝐼 of 𝐵𝑃 (this subset is called interpretation), denoted with 𝑃 𝐼 , is computed by
removing from 𝑃 the rules having at least one literal in the body is false in 𝐼. An interpretation 𝐼 is
called stable model or answer set of a program 𝑃 if it is a minimal model under set inclusion of 𝑃 𝐼 . We
use 𝐴𝑆(𝑃 ) to denote the set of answer sets of 𝑃 . Let us clarify these definitions with an example.
Example 1. The following answer set program has two facts and three rules.
a.
b.

qa:- a.
qb:- b, not nqb.
nqb:- b, not qb.

  It has two answer sets: {𝑎, 𝑞𝑎, 𝑏, 𝑛𝑞𝑏} and {𝑎, 𝑞𝑎, 𝑏, 𝑞𝑏}.

2.2. Probabilistic Answer Set Programming
To describe uncertain scenarios with Answer Set Programming, several semantics have been proposed,
such as LPMLN [17], P-log [18], and the credal semantics [11]. Here we focus on the credal semantics,
which extends an answer set program with probabilistic facts [2] of the form
                                                       Π :: 𝑎.
where Π ∈]0, 1[ and 𝑎 is an atom. Intuitively, Π is the probability that 𝑎 is included in the program
and 1 − Π that it is excluded. In the remaining part of the paper, we use the acronym PASP to indicate
both Probabilistic Answer Set Programming under the credal semantics and a probabilistic answer set
program interpreted under the credal semantics. Each PASP with 𝑛 probabilistic fact defines 2𝑛 world,
obtained by considering each possible selection of probabilistic facts. Let us call 𝑊 the set of all the
worlds. The probability of a world 𝑤 ∈ 𝑊 is computed as [5]
                                           ∏︁            ∏︁
                                  𝑃 (𝑤) =      𝑃 (𝑎𝑖 ) ·     1 − 𝑃 (𝑎𝑖 )
                                               𝑎𝑖 ∈𝑤             𝑎𝑖 ̸∈𝑤
that is, by considering the probabilistic facts 𝑎𝑖 present or absent in 𝑤. In each world 𝑤 the set of
probabilistic facts is fixed, so each world is an answer set program. Thus, 𝑤 may have 0 more answer
sets but the credal semantics requires that it has at least one. In this paper, we will always assume that
every PASP has at least one answer set per world. Given a conjunction of ground literals 𝑞, often called
query, its probability 𝑃 (𝑞) is described by a range [P(𝑞), P(𝑞)] whose bounds depend on the number of
answer sets of every world with the query included. A world 𝑤 contributes to both the lower (P(𝑞))
and upper (P(𝑞)) bound for a query 𝑞 if 𝑞 is present in every answer set. Otherwise, if the query is true
only in some answer sets, 𝑤 contributes only to the upper probability bound. If the query is present in
none of the answer sets of 𝑤, 𝑤 contributes to none of the bounds. In formulas,
                                            ∑︁                                 ∑︁
       𝑃 (𝑞) = [P(𝑞), P(𝑞)] = [                             𝑃 (𝑤𝑖 ),                         𝑃 (𝑤𝑖 )]
                                𝑤𝑖 ∈𝑊 𝑠.𝑡. ∀𝑚∈𝐴𝑆(𝑤𝑖 ), 𝑚|=𝑞         𝑤𝑖 ∈𝑊 𝑠.𝑡. ∃𝑚∈𝐴𝑆(𝑤𝑖 ), 𝑚|=𝑞

  Let us discuss a clarifying example.

Example 2. The following program has two probabilistic facts, 𝑎 and 𝑏.
0.4::a.
0.3::b.

qa:- a.
qb:- b, not nqb.
nqb:- b, not qb.

q:- qa.
q:- qb.

   It has 22 = 4 worlds: 𝑤0 , where neither 𝑎 nor 𝑏 are present, whose probability is (1 − 0.4) · (1 − 0.3) =
0.6 · 0.7 = 0.42 and 𝐴𝑆(𝑤0 ) = {{}}; 𝑤1 , where 𝑎 is present and 𝑏 is absent, whose probability is
0.4 · (1 − 0.3) = 0.4 · 0.7 = 0.28 and 𝐴𝑆(𝑤1 ) = {{𝑎, 𝑞𝑎, 𝑞}}; 𝑤2 , where 𝑎 is absent and 𝑏 is present,
whose probability is (1 − 0.4) · 0.3 = 0.6 · 0.3 = 0.18 and 𝐴𝑆(𝑤2 ) = {{𝑏, 𝑞𝑏, 𝑞}, {𝑏, 𝑛𝑞𝑏}}; and
𝑤3 , where both 𝑎 and 𝑏 are present, whose probability is 0.4 · 0.3 = 0.4 · 0.3 = 0.12 and 𝐴𝑆(𝑤3 ) =
{{𝑎, 𝑞𝑎, 𝑏, 𝑞𝑏, 𝑞}, {𝑎, 𝑞𝑎, 𝑏, 𝑛𝑞𝑏, 𝑞}}. If we are interested in computing the probability of 𝑞, we have that
𝑤0 contributes to none of the bounds since 𝑞 is present in none of its answer sets, 𝑤1 contributes to both the
lower and upper bound since 𝑞 is present in every answer set (there is only one with 𝑞 in it), 𝑤2 contributes
to only the upper bound since 𝑞 is present in only one of the two answer sets, and 𝑤3 contributes to both the
lower and upper bound since 𝑞 is present in every answer set (both have 𝑞 in it). Overall, [P(𝑞), P(𝑞)] =
[𝑃 (𝑤1 ) + 𝑃 (𝑤3 ), 𝑃 (𝑤1 ) + 𝑃 (𝑤2 ) + 𝑃 (𝑤3 )] = [0.28 + 0.12, 0.28 + 0.18 + 0.12] = [0.4, 0.58].

   Halpern’s statistical statements [8], also known as probabilistic conditionals, allow representing
statistical information about a domain, such as “X% of domain elements have the property Y”. The
authors of [9, 19] introduced the possibility of encoding such statements within a PASP with the syntax:

                                              (𝐶 | 𝐴)[Π𝑙 , Π𝑢 ]

where 𝐶 and 𝐴 are (conjunction of) atoms and Π𝑙 , Π𝑢 ∈ [0, 1], Π𝑙 ≤ Π𝑢 . We call 𝐶 consequent and 𝐴
antecedent. The meaning is “the fraction of 𝐴s that have the property 𝐶 is between Π𝑙 and Π𝑢 ”. To
perform inference, a statement is converted into a disjunctive rule with the structure 𝐶; 𝑛𝑜𝑡_𝐶 : − 𝐴,
describing that the property 𝐶 may or may not hold for each element 𝐴, and a constraint with counting
aggregates imposing the specified probability bound, i.e., imposing

                                        #𝑐𝑜𝑢𝑛𝑡{X : 𝐴(X), 𝐶(X)}
                                 Π𝑙 ≤                          ≤ Π𝑢 .
                                          #𝑐𝑜𝑢𝑛𝑡{X : 𝐴(X)}

Let us clarify it by discussing the following example from [9].
Example 3 (from [9]). The following program has 3 probabilistic facts and one statistical statement.
0.4::bird(1). 0.4::bird(2). 0.4::bird(3).
(fly(X) | bird(X))[0.6,1].

   The probabilistic facts indicate that the animals indexed with 1 up to 4 are birds with a probability of
0.4. The statement describes that at least 60% of the birds fly (and at most 100%, all of them). To perform
inference, this program is converted into;
0.4::bird(1). 0.4::bird(2). 0.4::bird(3).
fly(X) ; not_fly(X) :- bird(X).
:- #count{X:fly(X),bird(X)} = FB, #count{X:bird(X)} = B, 10*FB < 6*B.

  If we want to compute the probability of the query fly(1), we get 0.336 for the lower and 0.4 for the upper
probability.

   We can convert the program above into a program without counting aggregates, by enumerating all
the possible results of the aggregates. For examples, the program shown in Example 3 can be converted
into:
0.4::bird(1). 0.4::bird(2). 0.4::bird(3).
fly(X) ; not_fly(X) :- bird(X).
b1 :- bird(1).
b1 :- bird(2).
b1 :- bird(3).
:- b1, not fb1 .
b2 :- bird(1), bird(2).
b2 :- bird(1), bird(3).
b2 :- bird(2), bird(3).
:- b2, not fb2 .
b3 :- bird (1), bird (2), bird (3).
:- b3, not fb2.
fb1 :- fly(1).
fb1 :- fly(2).
fb1 :- fly(3).
fb2 :- fly(1), fly(2).
fb2 :- fly(1), fly(3).
fb2 :- fly(2), fly(3).

  In the remaining part of the paper we refer to “PASTA programs” by programs comprising a set of
probabilistic facts as well as a set of disjunctive rules and constraints with counting aggregates.

2.3. Computational Complexity
We assume familiarity with computational complexity and use complexity classes as usual, see, e.g., [20,
21]. In particular, we use the complexity class PP for probabilistic polynomial-time problems [22] and
we rely on oracle notations as usual. A complete canonical problem for PP is to decide whether a
propositional formula has at least 𝑞 ∈ N satisfying assignments. More formally, the goal is to evaluate
whether the formula 𝜓 = #≥𝑞 𝑋.𝜙(𝑋) evaluates to true, where 𝜙(𝑋) is a 3-CNF formula over variables
in 𝑋. This is the case if there are at least 𝑞 satisfying assignments of 𝜙. Analogously, evaluating formulas
of the form 𝜓 = #≥𝑞 𝑋.∀𝑌.𝜙(𝑋, 𝑌 ) with 𝜙 being a Boolean formula in 3-DNF over variables in 𝑋 ∪𝑌 , is
PPNP -complete. Intuitively, this requires witnessing at least 𝑞 assignments over 𝑋, where any extension
of these assignments over variables 𝑌 , ensures that 𝜙 evaluates to true. For a detailed introduction over
this formalism and a survey on complexity results for probabilistic reasoning, see [10].
3. The Complexity of PASTA Inference
We obtain the following complexity results, which underline the simplicity of inference for PASTA
programs. As in [10], we focus here on the complexity of cautious reasoning, i.e., given a set of
ground literals 𝑄 (query), a set of ground atoms 𝐸 (evidence), and a real number 𝛾, deciding whether
P(𝑄 | 𝐸) ≥ 𝛾. As also noted in [10], the computation of the upper probability reduces to the
computation of the lower probability on the complement set of 𝑄. To lighten the notation, we simply
refer to the cautious reasoning task as inference task.

Theorem 1 (Complexity). Inference in PASTA programs under credal semantics is PPNP -complete.

Proof. Hardness: We reduce from #≥𝑞 𝑋.∀𝑌.𝜙(𝑋, 𝑌 ) with 𝜙 being a Boolean formula in 3-DNF over
variables in 𝑋 = {𝑥1 , . . . , 𝑥𝑛 } and 𝑌 = {𝑦1 , . . . , 𝑦𝑚 }. We construct a PASTA program Π, where for
every variable 𝑥 occurring in 𝑋, we add probabilistic facts
0.5::t(𝑥).

  We care about satisfied worlds, so we guess the state of satisfiability for every DNF term 𝑐
sat(𝑐) ; nsat(𝑐).            sat ; nsat.

  We guess the truth assignments for every universally quantified variable 𝑦 ∈ 𝑌 :
t(y) ; f(y).

  For every DNF term 𝑐, every positive variable 𝑣𝑖 in 𝑐 and every negative occurrence ¬𝑣𝑗 add:
pos(𝑐, 𝑣𝑖 ). neg(𝑐, 𝑣𝑗 ).

  We have satisfiability if and only if at least one DNF term is satisfied:
:- #count {C : sat(C)}=T, #count {1: sat}=S, TT.

  For the satisfied DNF terms 𝑐 we add the following constraint:
:- #count {V : t(V), pos(𝑐,V); V : not t(V), neg(𝑐,V)}=T,
   #count {1 : sat(𝑐)}=S, T<3S.

  For the unsatisfied DNF terms 𝑐 we do the following
:- #count {V : t(V), pos(𝑐,V); V : not t(V), neg(𝑐,V)}=T,
   #count {1 : nsat(𝑐)}=S, T=3S.

   It is easy to see that #≥𝑞 {𝑣1 , . . . , 𝑣𝑛 }.𝜙 holds, i.e., there are at least 𝑞 satisfying assignments of 𝜙 if
and only if P({𝑠𝑎𝑡}) ≥ 2𝑞𝑛 .
   Note that since 𝜙 is in 3-CNF, all constraint sizes are constant. Indeed, the aggregate body sizes are
at most 3 and we could even slightly adapt the proof to only use ground rules (i.e., without first-order
variables).
  Membership: Follows directly from previous work [10, Table 1], which shows PPNP membership for
aggregates and default negation. Indeed, disjunction in PASTA programs can be directly translated to
default negation [23], as by definition there can not be a cyclic dependency between disjunctive head
atoms.

  Note that disjunction or, alternatively, default negation is crucial in PASTA programs. Indeed, if we
did not allow disjunction, such programs can only capture co-NP decision complexity.

Theorem 2 (Complexity without disjunction). Inference in PASTA programs under credal semantics
without disjunctions is co-NP-complete.
Proof. Hardness: We reduce from ∀𝑋.𝜙(𝑋), where 𝜙(𝑋) is in 3-DNF over variables in 𝑋 =
{𝑥1 , . . . , 𝑥𝑛 }. We construct a PASTA program Π, where for every variable 𝑥 occurring in 𝑋, we
add probabilistic facts
0.5::t(𝑥).

   For every 3-DNF term 𝑐 we assume an arbitrary but fixed order among its literals 𝑙1 ∧ 𝑙2 ∧ 𝑙3 . Then,
for the 𝑖-th variable 𝑣𝑖 such that 𝑙𝑖 = 𝑣𝑖 (positive occurrence in 𝑐) and for the 𝑗-th variable 𝑣𝑗 such that
𝑙𝑗 = ¬𝑣𝑗 , we add the facts:
pos𝑖 (𝑐, 𝑣𝑖 ). neg𝑗 (𝑐, 𝑣𝑗 ).   sat.

  For every 3-DNF term 𝑐, we add the following constraint which counts all satisfied 3-DNF terms
(going over all valid 23 cases):
:- #count {
    C: t(V1), t(V2), t(V3), pos1(𝑐,V1), pos2(𝑐,V2), pos3(𝑐,V3);
    C: not t(V1), t(V2), t(V3), neg1(𝑐,V1), pos2(𝑐,V2), pos3(𝑐,V3);
    C: t(V1), not t(V2), t(V3), pos1(𝑐,V1), neg2(𝑐,V2), pos3(𝑐,V3);
    C: t(V1), t(V2), not t(V3), pos1(𝑐,V1), pos2(𝑐,V2), neg3(𝑐,V3);
    C: not t(V1), not t(V2), t(V3), neg1(𝑐,V1), neg2(𝑐,V2), pos3(𝑐,V3);
    C: not t(V1), t(V2), not t(V3), neg1(𝑐,V1), pos2(𝑐,V2), neg3(𝑐,V3);
    C: t(V1), not t(V2), not t(V3), pos1(𝑐,V1), neg2(𝑐,V2), neg3(𝑐,V3);
    C: not t(V1), not t(V2), not t(V3), neg1(𝑐,V1), neg2(𝑐,V2), neg3(𝑐,V3)
}=T, #count {1 : sat}=S, T