<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>F. G. Cozman, D. D. Mauá, On the Semantics and Complexity of Probabilistic</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.ijar.2020.07.004</article-id>
      <title-group>
        <article-title>A Semantics For Probabilistic Answer Set Programs With Incomplete Stochastic Knowledge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Tuckey</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krysia Broda</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Russo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Imperial College London</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>London UK</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>0</volume>
      <issue>2017</issue>
      <fpage>91</fpage>
      <lpage>101</lpage>
      <abstract>
        <p>Some probabilistic answer set programs (PASP) semantics assign probabilities to sets of answer sets and implicitly assume these answer sets to be equiprobable. While this is a common choice in probability theory, it leads to unnatural behaviours with PASPs. We argue that the user should have a level of control over what assumption is used to obtain a probability distribution when the stochastic knowledge is incomplete. To this end, we introduce the Incomplete Knowledge Semantics (IKS) for probabilistic answer set programs. We take inspiration from the field of decision making under ignorance. Given a cost function, represented by a user-defined ordering over answer sets through weak constraints, we use the notion of Ordered Weighted Averaging (OWA) operator to distribute the probability over a set of answer sets accordingly to the user's level of optimism. The more optimistic (or pessimistic) a user is, the more (or less) probability is assigned to the more optimal answer sets. We present an implementation and showcase the behaviour of this semantics on simple examples. We also highlight the impact that diferent OWA operators have on weight learning, showing that the equiprobability assumption is not always the best option.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Answer Set Programming</kwd>
        <kwd>Probabilistic</kwd>
        <kwd>Weight learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Probabilistic logic programming is a seamless method to incorporate structured knowledge
together with probabilistic inference. It provides an easy formalism to represent complex
relational models [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ] and is now the foundation of a class of neural-symbolic models [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
In the family of probabilistic logic programs (PLP) presented in this paper, a PLP can be defined
by a logic program and a set of independent probabilistic facts which help assign probabilities to
sets of models (also referred to as possible worlds). When the logic program in a PLP is stratified,
meaning it only accepts one minimum model, each set of models mentioned previously is a
singleton and we give the probability of the set to its only member. This corresponds to the
setting of the Distribution Semantics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this paper we will focus on the case where the logic
programs are not stratified and can therefore have multiple (minimal) models. We consider
PLPs composed of probabilistic facts and answer set programs, thus we talk of probabilistic
answer set programming. Two examples of PASPs are given in Listing 1 and Listing 2.
      </p>
      <p>In Listing 1, the PASP includes the independent probabilistic facts rain and wind, with
probability 0.12 and 0.65 of being true, respectively. This program represents four diferent</p>
      <p>Listing 1: Run/Walk example: two probabilistic facts and a non-stratified ASP
scenarios: rain and wind true, rain true and wind false, rain false and wind true and finally
rain and wind false. For the first two scenarios, the corresponding ASP (obtained by adding
the true facts to the non probabilistic part of Listing 1) have only one answer set. However
in the other two cases, the corresponding ASP programs have multiple answer sets. For
instance, in the fourth scenario we have the following three answer sets: {sun, run}, {sun,
walk, listen(rock)} and {sun, walk, listen(classical)}. This scenario associates a probability of
4 = (1 − 0.12) * (1 − 0.65) = 0.308 to these three answer sets. However, to obtain a probability
distribution over answer sets, we need to somehow distribute the probability 4 among them.
Unfortunately the program itself does not provide any information about how to proceed.
Informally, the stochastic knowledge contained in the program is insuficient to naturally obtain
a probability distribution over answer sets.</p>
      <p>
        An arbitrary choice must be introduced to choose how to distribute this probability to the
multiple answer sets. A method used by some PASP semantics consists of equally dividing the
probability among the answer sets of a specific scenario [
        <xref ref-type="bibr" rid="ref4 ref7">4, 7</xref>
        ], in the example above we would
assign 4/3 to each of the three answer sets. The Credal Semantics however [8, 9] accepts all
possible redistribution of probability, meaning that there exist an infinite amount of probability
distributions in our example. The equiprobability assumption, seemingly natural, can lead
to unexpected behaviours even on simple programs (as we show in Section 6) and doesn’t
ofer any flexibility or information about the underlying uncertainty. In contrast, the Credal
Semantics highlights the full range of possible probability distributions but is less intuitive to
use in learning settings where the data needs to come in a very specific shape [ 10]. A PASP can
represent an infinity of probability distributions and choosing the appropriate one for a given
application has to be context dependent, guided by expert knowledge on the application domain.
The same program, depending on the way chosen to distribute the probability in scenarios
where there are multiple answer sets, might yield any probability between 0 and 1 for a given
query (an example is given in Section 6). As so, tools are required to allow users to make use
of their own chosen assumption on how to distribute probabilities when faced with multiple
answer sets for a specific scenario.
      </p>
      <p>In this paper we propose a novel method that allows users to encode the arbitrary assumptions
they want to use to compute the end probability. As such the user controls which probability
distribution, out of the potentially infinite set of probability distributions described by the
program, will be used to answer their queries. We take inspiration from the field of decision
making under ignorance which deals with this exact problem: how to assign a probability
distribution over a set of elements based on a user-defined preference. We propose a novel
semantics, called the Incomplete Knowledge Semantics (IKS), that applies the notion of Ordered
Weighted Averaging (OWA) operators [11], which are user-defined distributions over an ordered
set of elements, to assign probabilities to multiple answer sets. The IKS uses weak constraints
added to the PASP to order the answer sets in a given scenario, thus declaring the ordering for
the OWA operator to work on. We show on simple examples that this new semantics allows
the exploration of the diferent probability distributions that can come from a single program in
an intuitive way, of which the semantics based on equiprobability are a special case.</p>
      <p>
        The paper makes the following contributions:
• Define a novel semantics for PASPs by using user-defined OWA operators in place of a
single arbitrary assumption.
• Present an implementation1 of the IKS which allows the user to declare OWA operators
using a single number in [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ].
• Demonstrate the behaviour of this semantics, using simple examples, when predicting
marginal distribution and learning parameters.
      </p>
      <p>We start with explaining in Section 2 the field of decision making under ignorance and
introducing the notion of OWA operator. We then present in Section 3 the Incomplete Knowledge
Semantics, and explain in Section 4 how we approached the problem of generating OWA operator
in our implementation. In Section 5, we explain the diferences between our proposed IKS
and other probabilistic answer set programming semantics, and show, in Sections 6 and 7, the
diferent behaviours of the IKS both in predicting marginal likelihood and parameter learning.
We conclude the paper with a summary and directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Decision making under ignorance</title>
      <p>A decision making problem can be represented as shown in equation (1) (from [11]), where a
decision maker is faced with  choices  but the "state of nature" could be any of the  worlds
 .  corresponds to the reward of choosing  in the state of nature  .
A decision making problem consists in choosing the best action to maximise the reward. In
the framework of decision making under uncertainty, we are given a probability distribution
over the worlds  . A solution is to calculate the expected reward for each action  using
∑︀   ( ) and select the action with the highest expected reward.</p>
      <p>1The library and binary are available at https://github.com/David-Tuc/IKS</p>
      <p>In the context of decision making under ignorance, no distribution over the possible worlds
is given. Thus the decision maker needs to assume a specific decision attitude in order to
associate to each action the equivalent of an expected reward. For example, with a pessimistic
attitude the decision maker associates to each action  its worst possible outcome  and
then chooses the action with the best one (the best worse outcome). In the same way, a decision
maker with an optimistic attitude associates the best possible outcome to each action, and
then selects the best action. Decision strategies can also be more nuanced and in between the
optimistic and pessimistic attitudes, where the outcome associated to an action is a function
of its diferent rewards. Mathematically, the decision maker defines an aggregate function
 : R → R, calculates  =  (1, ..., ) for each action  and then selects the best
action * =   .</p>
      <p>A formulation for aggregate functions has been given by Yager [11] in the form of Ordered
Weighted Averaging (OWA) operators.</p>
      <p>
        Definition 2.1 (OWA operator). An OWA operator is a function  : R → R with 
parameters w such as ∑︀ w = 1 and ∀, w ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. Given u ∈ R,  (u) = ∑︀
=1 Φ(u)w where
Φ : R → R is a function that orders a vector in decreasing order.
      </p>
      <p>For example, given u = (10, 30, 20, 15) and w = [0.5, 0.3, 0.05, 0.15], the result of the
corresponding OWA operator is  (10, 30, 20, 15) = 30 * 0.5 + 20 * 0.3 + 15 * 0.05 + 10 * 0.15 = 23.25.
Here the numbers 10, 30, 20, 15 are the rewards  for a certain choice . Given the
correspondence between an OWA operator  and its weight vector w, we will refer to OWA operators
by their weight vector.</p>
      <p>OWA operators are a simple way of computing a preference over a set of worlds of which we
know only an ordering : w is the importance to give to each world ordered by their rewards
u. Some particular decision strategies have specific OWA operators: the pessimistic attitude,
optimistic attitude and normative attitude are respectively represented using the following
w, w and w:
⎛1⎞</p>
      <p>⎛ 1 ⎞
⎛0⎞</p>
      <p>.</p>
      <p>⎜ .. ⎟⎟
w = ⎜
⎜⎝0⎠⎟
1
w = ⎜ . ⎟ w = ⎜⎜⎜ ... ⎟⎟⎟
⎜0⎟
⎜⎝ .. ⎟⎠ ⎝ 1 ⎠
0 
The pessimistic attitude attributes all the weight to the last world in the ordering, the optimistic
attitude assigns it to the first world in the ordering and the normative attitude attributes the
same importance to all worlds.</p>
      <p>OWA operators can be interpreted as human induced probability distribution over possible
worlds, where w is the probability of the world with the -th best outcome. We will make use
of this interpretation to define the Incomplete Knowledge Semantics (IKS). While in decision
making the ordering is given by a notion of cost or reward ( ), we will use weak constraints
to obtain such ordering and we assume these to be domain specific and human defined. For a
given PASP, the IKS uses OWA operators in each of the scenarios for which it obtains multiple
answer sets, which are the possible worlds obtained from a specific scenario.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Incomplete Knowledge Semantics</title>
      <p>
        We assume the syntax and semantics of Answer Set programs as defined by the ASP-CORE 2
[12]. We consider a probabilistic answer set program  to be composed of an ASP  and a
set of independent annotated disjunctions  . An annotated disjunction (AD) is of the form
1 ::1 ; ...;  :: :- 1 , ..., 
where 1 , ...,  are atoms,  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] with Σ=1 ≤ 1, the body 1 , ..,  are literals and  is
the index of the AD in  2. Informally, when the body condition of an AD is fulfilled, its head
represents a random variable that take value  with probability  . For the following definition,
we will note  ( ) =  .
      </p>
      <p>Definition 3.1 (Total Choice). Let  = | | be the number of ADs, a total choice  =
{1, 2 ...} ∈   is a set containing one ground atom from each AD’s head. Each total choice
 corresponds to a possible world in the probabilistic sense and has an associated probability
 () = Π=1 () as ADs are independent.</p>
      <p>
        Each total choice corresponds to a deterministic setting, where an outcome has been chosen
for each random variable represented by the ADs. There exist one total choice for each possible
combination of outcome of all the ADs. This deterministic setting (the total choice) is given
the joint probability of observing all the random variable in their given state, which is the
product of the probabilities of each individual outcome as the random variables represented by
the ADs are assumed independent. We consider the answer sets of the ASPs  ⋃︀  (we
add to the ASP the atoms in C as facts) for each total choice C. We denote the set of answer
sets of  ⋃︀  with () and informally refer to its elements as the answer sets for the
total choice . We also denote the full set of answer sets of  as ( ) = ⋃︀∈  ().
Similarly to Cozman and Mauá [8] we say that a program is consistent if it has at least one
answer set for each total choice, i.e. () ̸= ∅. We define a probability distribution over the
sets () of answer sets of  as  (()) =  (). Thus if all the total choices have only
one single answer set, it directly yields a probability distribution over the answer sets, which
coincides with the Distribution Semantics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However if some total choices have multiple
answer sets, we must redistribute the probability of the total choice to the answer sets. We
make use of weak constraints which denfie a relation of dominance between answer sets, the
answer set dominating all others being the most optimal. Given a set of weak constraints  ,
we note  ◁   if answer set A dominates answer set B under the weak constraints  .
Definition 3.2 (Ordering of Answer Sets). Given weak constraints   and a set of answer
sets  of cardinality , we define an ordering Φ  () such as ∀ = 1.., ∀ = 1..,
Φ  () ⋫  Φ  () . That is an answer set in position  does not dominate any of
the preceding answer sets in the ordering.
      </p>
      <p>2In practice, for each AD where Σ= 1 &lt; 1 we add an extra annotated atom +1::
+1 to it where
+1 = 1 − Σ= 1 and +1 is an atom that does not appear in  . Also, if the -th AD in  has a non-empty
body, we add to  the clauses  :- 1 , ..,</p>
      <p>, (, ). for  = 1... and replace the AD in  with
1 ::(, 1); ...;</p>
      <p>::(, ). If the ADs are not grounded, we first take all their groundings before doing the
previous step.</p>
      <p>Given weak constraints   and for a given answer set  in an ordered set of answer
set Φ  (), we define (, ,  ) = {| ∈ Φ  (),  ⋫   and
 ⋫  } the indexes of the answer sets in Φ  () which are not dominated by
or dominate , we will informally say that they share the same optimality as . Note that
 ∈ (, ,  ) and that the answer sets whose index are in (, ,  ) might
be placed interchangeably in an ordering of  as they do not dominate each other. We now
make use of this notion of ordering of answer sets in tandem with OWA operators.
Definition 3.3 (Incomplete Knowledge Semantics). Given a consistent PASP  with weak
constraints   and a user defined family of OWA operators ∀ ∈ N (w ∈ R). The incomplete
knowledge semantics of P is a probability distribution over ( ) defined as follow:
∑︀ w
∈(,(), )
∀ ∈   ,  () = |(, (),  )| *  (())
(2)
with  = Φ  (()) the -th answer set in the ordering of () and  = |()|.</p>
      <p>Definition 3.3 defines the IKS by attributing to each answer set in a total choice  a fraction
of  (). This is done by ordering the answer sets from the most optimal to least optimal
using the weak constraints and using a defined OWA operator to assign to each answer set
its own probability, that we then multiply by  () =  (()). However as the ordering
Φ  might not be unique, we average in Equation 2 over the answer sets that share the same
optimality.</p>
      <p>Example Let us consider the ASP program  given in Listing 1 augmented with the user
defined weak constraint : ∼ run. [-1]. This makes the answer sets that contain the atom run
have a lower weight (and be more optimal). There are four total choices 1...4 and we give
the details of the answer sets for each in Table 1. 1 and 2 both have only one answer set,
so we assign directly their probabilities to the element in the singleton. For 3 and 4, we
need to distribute the probability of the total choice to the answer sets using OWA operators.
Studying the case of 4, the answer set {sun, run} is the most optimal while the two others
have the same weight. As such, they will both always get the same probability, whatever the
OWA operator. We give examples of redistribution of the probability  (4) = 0.308 to the
three answer sets given diferent OWA operators in Table 2.</p>
      <p>
        A query in the IKS has the form /, where  is a set of literals and  is the family of
OWA operators to use. In our implementation,  is a real number in [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] (or a keyword as
explained in the next section). The probability of / is the sum of the probabilities of the
answer sets satisfying , the probability being calculated using the OWA operators defined .
We denote this probability as  (/). Queries may also include evidence |/, in which
case  (|/) =  (, /)/ (/).
      </p>
      <p>OWA operator</p>
    </sec>
    <sec id="sec-4">
      <title>4. Computing Ordered Weighted Averaging operators</title>
      <p>In order to compute the probabilities in the IKS, one needs to define OWA operators  for
all the possible amounts of worlds  by setting the weights w. Indeed, an OWA operator is
required for each possible number of answer sets for a total choice. While this is simple for
specific decision strategies like the optimistic, pessimistic and normative attitudes, it is also
possible to generate more complex operators depending on a "level of optimism”. Yager [11]
defines the optimism metric for OWA operators as:</p>
      <p>
        − 
(w) = ∑︁ w *  − 1
=1
(3)
For any OWA operator w, (w) ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. We have that (w) = 0, (w) = 1 and
(w) = 0.5. It is a criteria that measures how "optimistic” an OWA operator is. A more
optimistic OWA operator ((w) close to 1) will assign more weights to the best worlds, while
a less optimistic operator ((w) close to 0) will assign more weights to the worse worlds. It
indicates at a high level the decision strategy adopted by the user. Given an optimism level, it is
possible to generate an OWA operator such as it reflects this decision strategy.
      </p>
      <p>In our implementation we make use of the method described by Chen et al. [13] adopting the
normal distribution as a base. To generate an OWA operator w of size  with an optimism of ,
we first generate a vector of size  using the following formula:
v =
−
( −  )</p>
      <p>2 2
∑︀ −
=1 
( −  )
2 2
With  = 1 +  and  = √︂ 1 ∑︀=1( −  )2. This intermediate operator has an optimism of
2 
0.5 so we need to modify it to match our target optimism of . We generate a specific adjustment
matrix U such as w = Uv gives (w) = . For details on how to generate the adjustment
matrix U refer to Algorithm 1 in Chen et al. [13]. We give examples of generated OWA operators
using this algorithm in Table 3. As for  = 0.5 we do not need to modify v, we have that
v = w0.5 in Table 3.</p>
      <p>The method chosen to generate the OWA operator has an impact on the shape of the latter,
as diferent operators can have the same optimism value. As an example, both w0.5 and w
given in Table 3 have an optimism value of 0.5. We use this specific generation method for
computational reasons when  is big and also because the shape of the operator allows us to
select a range of answer sets in total choices.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Related Works</title>
      <p>
        We compare here the IKS with other semantics for probabilistic answer set programs. But firstly,
one can note that when the logic program is stratified, the IKS is equivalent to the Distribution
Semantics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The IKS uses probabilities on independent facts and takes its syntax from
Problog [14]. It is similar in its mechanism to P-log [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], though P-log doesn’t expressly define
(4)
total choices, however it gives a single probability distribution by distributing the probability
evenly when necessary. Neural-ASP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] uses a very similar semantics to P-log by assuming
equiprobability in each total choice, but has the particularity of not always defining a full proper
probability distribution: the sum of probabilities of all the answer sets does not need to sum to
one. The IKS allows the user to decide the assumption to redistribute probabilities instead of
using equiprobability by default. The equiprobability assumption is a special cases of the IKS
when we restrict ourselves to use w exclusively. On the other hand, the Credal Semantics
[8, 9] avoids the issue of using assumptions by accepting all possible probability distributions,
giving a lower and upper probability to queries. Compared to the IKS, these bounds are the
minimum and maximum probabilities that can be obtained using diferent weak constraints
and OWA operators.
      </p>
      <p>Other semantics for PASP difer in the way the probabilities are declared. PASP [ 15] defines
probabilities directly on atoms in the program without assuming their independence. PrASP [16]
defines probabilities on first-order-logic sentences that are translated to ASP. When the logic
program is not stratified, both accept an infinite number of probability distributions as correct
under their semantics, but both systems define a specific optimization algorithm to obtain a
particular distribution. Finally, LP [17] defines weights on ASP clauses that are used to
compute probabilities over answer sets by a normalization procedure. This semantics does not
use the notion of total choices, however two answer sets that satisfy the same set of rules will
be given an equal probability. Compared to these other semantics for PASPs, IKS focuses on
allowing the user the degree of freedom to explore the diferent probability distributions that
can be obtained from the program.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Behaviour of the IKS</title>
      <p>We implement the IKS using the OWA operator generation algorithm mentioned in Section
4. The user selects the family of OWA operators to use by setting their optimism value. We
show here how the diferent hyper-parameters (weak constraints and optimism) influence the
downstream probabilities.</p>
      <p>Example 1 We highlight here the fact that assuming equiprobability between answer sets in
the same total choice is not always the most natural assumption. We consider the program in
Listing 1. In the total choice 4 there are three answer sets (see Table 1). Assigning (4)/3 =
0.102 to each answer set seems to be the easiest option, assuming equiprobability between
all three. But looking at the program, it would be more natural to assume equiprobability on
the disjunctive events: run/walk and listen(rock)/listen(classical). In this case, we would assign
0.154 to {run, sun} and 0.077 to each of the two other answer sets.</p>
      <p>The IKS avoids the issue of making assumptions in place of the user when faced with this kind
of situation. Focusing again on 4 and adding the weak constraint ":∼ run. [-1]" to the program.
The answer set with the atom sun is more optimal than the two others, which are equally optimal.
An optimism of 1 gives all the probability to the {run, sun} answer set while an optimism of 0
gives it none. Consider now an optimism of 0.5. Because we generate a Gaussian shaped OWA
operator its weights are close to [0.25, 0.5, 0.25], making that  ({run, sun}) = 0.074. Using
an optimism of 0.697 allows to have  ({run, sun}) = 0.154 and be in the case described above
where equiprobability was assumed on the individual disjunctive events. Finally if instead of a
generated OWA operator we use w, we obtain the equiprobability case where all three answer
sets are assigned 0.102.</p>
      <p>We give other examples of possible redistributions in Table 2. By doing the same process
in all the total choices, we can compute the probability of queries over the entire program:
 ({}/0) = 0.572,  ({}/1) = 0.880,  ({}/0.697) = 0.725,  ({}/w) =
0.674. Depending on the hyperparameters (OWA operator, optimism, weak constraint) chosen
for running this program, we can obtain very diferent probabilities.</p>
      <p>While the goal of the IKS is not to allow the user to match one or another equiprobability
assumption, these methods are subsumed by the IKS which ofers more granularity. Choosing a
level of optimism is a simple way to define a rather complex decision strategies across all total
choices.</p>
      <p>Example 2 We now give a more complex setting which shows a case where, given the same
program, the probability of a query might range anywhere between 0 and 1 depending on the
assumption used to distribute the probabilities. This highlights the importance of choosing the
appropriate assumption given the application domain. Consider a probabilistic version of the
graph colouring problem where edges exist with a given probability. Each node can be coloured
in one of three colours and adjacent nodes (connected by an edge) cannot be of the same colour.
One application is radio frequency assignments, nodes are emitters, edges represent possible
interference between two emitters and the colours are diferent radio frequencies. Consider that
some emitters may be turned of some of the time, and that we consider worlds where more
emitters are running to be "better”. We formalise this problem in Listing 2 as a PASP. Nodes 1
to 4 are always on and nodes 5 to 8 can be on or of. The colours assigned are red, yellow or
blue. The weak constraint ranks as more optimal the answer sets with more nodes present. In
this program, a total choice corresponds to a combination of links between nodes being active.
In each total choice there are multiple answer sets, each corresponding to a diferent coloring
and difering as well in the amount of nodes present.</p>
      <p>Take as an example the following query  :"what is the probability that emitters 1 and 3
share the same frequency/colour ?". Depending on the context, the user might want to adopt a
more optimistic or pessimistic decision attitude when querying the program given in Listing 2:
if the user believes more emitters are functioning the optimism level chosen will be closer to 1.
With an optimism of 1, the predicted probability  (/1) is 0.289 while with an optimism of 0
we have  (/0) = 0.333. Finally using w we obtain  (/w) = 0.317.</p>
      <p>The weak constraints chosen also have an important impact on the predicted probability. For
example, consider replacing the weak constraint in Listing 2 with
: ~ c o l o u r ( 1 , r e d ) , c o l o u r ( 3 , r e d ) . [ − 1 ]
: ~ c o l o u r ( 1 , y e l l o w ) , c o l o u r ( 3 , y e l l o w ) . [ − 1 ]
: ~ c o l o u r ( 1 , b l u e ) , c o l o u r ( 3 , b l u e ) . [ − 1 ]
They order the answer sets such as the most optimal are the ones satisfying the query . Thus
we obtain with an optimism of 0  (/0) = 0 and with an optimism of 1  (/1) = 1, which
are the lower and upper bound for our query under the Credal Semantics. This is due to the fact
that when the optimism is 0, we give the probability of each total choice to the answer sets that
do not satisfy , so  is false in all the answer sets with a non zero probability. Inversely when
the optimism chosen is 1, the probability of each total choice is exclusively given to answer sets
which satisfy , making that the only answer sets with non zero probability are satisfying .
Modifying the weak constraints have of course no efect when using w as the OWA operator.</p>
      <p>It is important to distinguish between the contribution of the optimism value and that of the
weak constraints: the weak constraints define the ranking of the answer sets, meaning that
they define a relative cost between possible answer sets. The optimism value defines which
answer sets in the ordering are favored by the user.</p>
    </sec>
    <sec id="sec-7">
      <title>7. The impact of optimism on learning</title>
      <p>
        We have shown the possibility to manipulate the probabilities using the IKS. We show here
the impact on parameter learning. It consists of learning the probability of some probabilistic
facts in the program. We reuse the program from Listing 1 and generate custom datasets. The
data is defined as follows: let  = {} be a collection of  data points of the form  = (/)
where  is a query and  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is the optimism to use for that particular data point.  might
also be a keyword that designates a special OWA operator like w. We train the program
by maximising the log-likelihood  = ∑︀ ( ()). We implement this using gradient
ascent, pre-computing the OWA operators and answer sets to have a diferentiable graph. We
initialize randomly the probabilities of the probabilistic facts in the program, compute  over
the dataset and diferentiate the loss with regards to the learnable probabilities to make a step
(by mini-batch).
Data generation We generate synthetic datasets based on the program from Listing 1. We
consider that the observed atoms are the following: run, walk, listen(rock), listen(classical).
Then there are four possible labels: {}, {run}, {walk, listen(rock)} and {walk, listen(classical)}. We
generate the datasets by sampling these four labels at the frequency described by the following
mechanism. We first sample the atoms wind and rain using their respective probabilities 0.65
and 0.12. If both are true then the label is {} and if only rain is true then the label is {run}. If
only wind is true then we have two possibilities, we sample another random variable which is
true with probability  = 0.5, if true the label is {walk, listen(rock)} and if false the label is
{walk, listen(classical)}. In the case where both wind and rain are false, we sample the atom run
with probability , which if true gives the label {run}. If false, we again have two options
({walk, listen(rock)} and {walk, listen(classical)}) which we sample using  = 0.5.
      </p>
      <p>We generate two datasets using this process, the first one 1 using  = 0.5 which
represents the case where someone would have chosen to run half of the times if faced with the
1
choice of running and walking. The second dataset 2 uses  = which represents the
3
distribution that would be obtained if we were using the program in Listing 1 under a semantics
that assumes equiprobability of answer sets.</p>
      <p>Learning results We delete from Listing 1 the probabilities of the two probabilistic facts,
add the weak constraint ":∼ run. [-1]" and attempt to learn from data. To do so, we initialize
randomly the probabilities and use gradient ascent to maximize L for 50 epochs. Before learning,
we need to consider which optimism value to use for each example in the dataset. We simplify
here by assuming that all examples during a run have the same optimism so use the same OWA
operators. We perform two runs for each dataset: one run where the optimism is 0.697 and
one where the OWA operator used is w. The first setting semantically corresponds to the
distribution used to generate 1 while the second setting corresponds to 2. We give the
results of these experiments in table 4. As we can see, the first setting allows to learn the proper
probabilities on 1 while the second setting works well for 2. However using setting 1 with
2 and setting 2 with 1, we obtain diferent probabilities from the targets.</p>
      <p>This sanity check confirms that the choice of OWA operator impacts the learned probabilities.
The IKS allows each data point to have its own OWA operators, avoiding a "one assumption
ifts all” all solution. In a realistic setting, the optimism, OWA operators and weak constraints
would either be given by expert knowledge during the data collection or would be treated as
hyper-parameters to tune during training.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>We have introduced in this paper a novel semantics for probabilistic answer set programs with
incomplete knowledge, where the stochastic information contained might result in a number
of diferent probability distributions. It allows the user to define the set of assumptions to use
through the use of ordered weighted averaging operators, a user defined distribution taken from
the field of decision making under ignorance. Using weak constraints to define an ordering over
answer sets, OWA operators are used to assign probabilities when none are provided by the
program itself. We present our implementation which uses a specific OWA operator generation
algorithm based on the user’s level of optimism. We show that for even simple PASPs, using
diferent OWA operators have a big impact on the downstream probability distribution, and
that some simple assumptions (like equiprobability) can lead to unnatural behaviours. Finally
the impact of OWA operators is showed when performing parameter estimation, arguing that
the choice of weak constraints, OWA operator and optimism can fall withing the realm of model
selection. As future work, we will study the possibility to learn the structure of PASPs under
the IKS and study the impact of the OWA operators on neural-symbolic settings.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I.</given-names>
            <surname>Thon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Landwehr</surname>
          </string-name>
          , L. De Raedt,
          <article-title>Stochastic relational processes: Eficient inference and applications</article-title>
          ,
          <source>Machine Learning</source>
          <volume>82</volume>
          (
          <year>2011</year>
          )
          <fpage>239</fpage>
          -
          <lpage>272</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10994-010-5213-8.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nitti</surname>
          </string-name>
          , T. De Laet, L. De Raedt,
          <article-title>Relational object tracking and learning</article-title>
          ,
          <source>Proceedings - IEEE International Conference on Robotics and Automation</source>
          (
          <year>2014</year>
          )
          <fpage>935</fpage>
          -
          <lpage>942</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICRA.
          <year>2014</year>
          .
          <volume>6906966</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Groß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kracher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Kraus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Kühlwein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pfister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wiese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Luckert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pötz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Joos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Van</given-names>
            <surname>Daele</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. De Raedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kühl</surname>
            ,
            <given-names>H. A.</given-names>
          </string-name>
          <string-name>
            <surname>Kestler</surname>
          </string-name>
          ,
          <article-title>Representing dynamic biological networks with multi-scale probabilistic models</article-title>
          ,
          <source>Communications Biology</source>
          <volume>2</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . doi:
          <volume>10</volume>
          .1038/s42003-018-0268-3.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ishay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>NeurASP: Embracing neural networks into answer set programming</article-title>
          ,
          <source>IJCAI International Joint Conference on Artificial Intelligence 2021-Janua</source>
          (
          <year>2020</year>
          )
          <fpage>1755</fpage>
          -
          <lpage>1762</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2020</year>
          /243.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumancic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <article-title>DeepProbLog: Neural Probabilistic Logic Programming</article-title>
          , CoRR abs/
          <year>1805</year>
          .1 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1805</year>
          .10872. arXiv:
          <year>1805</year>
          .10872.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <article-title>A Statistical Learning Method for Logic Programs with Distribution Semantics</article-title>
          , in: ICLP,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Baral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rushton</surname>
          </string-name>
          ,
          <article-title>Probabilistic reasoning with answer sets</article-title>
          ,
          <source>in: International Conference on Logic Programming and Nonmonotonic Reasoning</source>
          , Springer,
          <year>2004</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>