A Hybrid Approach to Inference in Probabilistic
     Non-Monotonic Logic Programming

                     Matthias Nickles and Alessandra Mileo

                        Insight Centre for Data Analytics
                     National University of Ireland, Galway
                {matthias.nickles,alessandra.mileo}@deri.org


      Abstract. We present a probabilistic inductive logic programming frame-
      work which integrates non-monotonic reasoning, probabilistic inference
      and parameter learning. In contrast to traditional approaches to prob-
      abilistic Answer Set Programming (ASP), our framework imposes only
      comparatively little restrictions on probabilistic logic programs - in par-
      ticular, it allows for ASP as well as FOL syntax, and for precise as well as
      imprecise (interval valued) probabilities. User-configurable sampling and
      inference algorithms, which can be combined in a pipeline-like fashion,
      provide for general as well as specialized, more scalable approaches to
      uncertainty reasoning, allowing for adaptability with regard to different
      reasoning and learning tasks.


1   Introduction
With this paper, we present the probabilistic logic framework PrASP. PrASP
is both a probabilistic logic programming language and a software system for
probabilistic inference and inductive weight learning based on Answer Set Pro-
gramming (ASP). Compared to previous works on this framework [11, 10], we
introduce another inference algorithm and additional evaluation results.
Reasoning in the presence of uncertainty and relational structures such as social
networks or Linked Data is an important aspect of knowledge discovery and
representation for the Web, the Internet Of Things, and other heterogeneous
and complex domains. Probabilistic logic programing, and the ability to learn
probabilistic logic programs from data, can provide an attractive approach to
uncertainty reasoning and statistical relational learning, since it combines the
deduction power and declarative nature of logic programming with probabilistic
inference abilities traditionally known from graphical models, such as Bayesian
and Markov networks. We build upon existing approaches in the area of prob-
abilistic (inductive) logic programming in order to provide a new ASP-based
probabilistic logic programming language and inference tool which combines
the benefits of non-monotonic reasoning using state-of-the-art ASP solvers with
probabilistic inference and machine learning. The main enhancement provided by
PrASP over (non-probabilistic) ASP as well as existing probabilistic approaches
to ASP is the possibility to annotate any formula with point or interval (i.e.,
imprecise) probabilities (including formulas in full FOL syntax, albeit over finite


                                        57
A Hybrid Approach to Inference

  domains of discourse only), while providing a hybrid set of inference approaches:
  in addition to general inference algorithms, this includes specialized, more scal-
  able inference algorithms for cases where certain optional assumptions hold (in
  particular mutual independence of probabilistic events). As we will show later,
  it can even make sense to combine different such algorithms (which can each on
  its own obtain valid results if its specific prerequisites are fulfilled).
  The remainder of this paper is organized as follows: the next section presents
  related work. Section 3 describes the syntax and semantics of the formal frame-
  work. Section 4 describes approximate inference algorithms, and Section 5 pro-
  vided initial evaluation results. Section 6 concludes.


  2   Related Work

  Approaches related to PrASP include [18, 8, 3–6, 14, 13, 15] which support prob-
  abilistic inference based on monotonic reasoning and [9, 1, 17, 2] which are based
  on non-monotonic logic programming. Like P-log [1], our approach computes
  probability distributions over answer sets (that is, possible worlds are identified
  with answer sets). However, P-log as well as [17] do not allow for annotating
  arbitrary formulas (including FOL formulas) with probabilities. [2] allows to
  associate probabilities with abducibles (only) and to learn both rules and prob-
  abilistic weights from given data (in form of literals). Again, PrASP does not
  impose such restrictions on probabilistic annotations or example data. On the
  other hand, PrASP cannot make use of abduction for learning. Various less
  closely related approaches to probabilistic reasoning exist (either not based on
  logic programming at all, or not in the realm of non-monotonic logic program-
  ming): Stochastic Logic Programs (SLP) [8] are an influential approach where
  sets of rules in form of range-restricted clauses can be labeled with probabilities.
  Parameter learning for SLPs is approached in [3] using the EM-algorithm. Ap-
  proaches which combine concepts from Bayesian network theory with relational
  modeling and learning are, e.g., [4–6]. Probabilistic Relational Models (PRM)
  [4] can be seen as relational counterparts to Bayesian networks. In contrast to
  these, our approach does not directly relate to graphical models such as Bayesian
  or Markov Networks but works on arbitrary possible worlds which are generated
  by ASP solvers in form of stable models (answer sets). ProbLog [14] allows for
  probabilistic facts, annotated disjunctions and definite clauses, and approaches
  to probabilistic rule and parameter learning (from interpretations) also exist for
  ProbLog. ProbLog builds upon the Distribution Semantics approach introduced
  for PRISM [18], which is also used by other influential approaches, such as In-
  dependent Choice Logic (ICL) [13]. Another important approach outside the
  area of ASP are Markov Logic Networks (MLN) [15]. A Markov Logic Network
  consists of first-order formulas annotated with weights (which are, in contrast to
  PrASP, not in general probabilities). MLNs are used as templates for the con-
  struction of Markov networks. The (ground) Markov network generated from the
  MLN then determines a probability distribution over possible worlds, with in-
  ference performed using weighted SAT solving (which is related to but different


                                         58
A Hybrid Approach to Inference

  from ASP). MLNs are syntactically roughly similar to the logic programs in our
  framework (where weighted formulas can also be seen as soft or hard constraints
  for possible worlds).


  3    Syntax and Semantics
  In this section, we briefly describe the formal language and its semantics. Com-
  pared to [10], the syntax of PrASP programs has been extended (in particular
  by allowing interval and non-ground weights) and a variety of approximate in-
  ference algorithms have been added (see next section) to the default inference
  approach which is described below and which still underlies the formal semantics
  of PrASP programs.
  PrASP is a Nilsson-style [12] probabilistic logic language. Let Φ be a set of
  function, predicate and object symbols and L(Φ) a first-order language over
  Φ with the usual connectives (including both strong negation “-” and default
  negation “not”) and first-order quantifiers. It can be assumed that this lan-
  guage covers both ASP and FOL syntax (ASP “specialties” such as choice con-
  structs can be seen as syntactic sugar which we omit here in order to keep
  things simple). A PrASP program (background knowledge) is a non-empty fi-
  nite set Λ = {[li ; ui ]fi } ∪ {[li ; ui |ci ]fi } ∪ {indep({f1i , ..., fni })} of annotated for-
  mulas (each concluded by a dot) and optional independence constraints (PrASP
  does not require an independence assumption but makes optionally use of de-
  clared or automatically discovered independence). [l; u]f asserts that the impre-
  cise probability of f is within interval [l, u] (i.e., l ≤ P r(f ) ≤ u) whereas [l; u|c]f
  states that the probability of f conditioned on formula c is within interval [l, u]
  (l ≤ P r(f |c) ≤ u).
  Formulas can be non-ground (including existentially or universally quantified
  variables in FOL formulas). For the purpose of this paper, weights need to be
  ground (real numbers), however, the prototype implementation also allows for
  certain non-ground weights. An independence constraint indep({f1i , ..., fni }) spec-
  ifies that the set of formulas {f1i , ..., fni } is mutually independent in the proba-
  bilistic sense (independence can also be discovered by PrASP by analyzing the
  background knowledge, but this is computationally more costly).
  If the weight of a formula is omitted, [1; 1] is assumed. Point probability weights
  [p] are translated into weights of the form [p; p] (analogously for conditional
  probabilities). Weighted formulas can intuitively be seen as constraints which
  specify which possible worlds (in the form of answer sets) are indeed possi-
  ble, and with which probability. w(f ) denotes the weight of formula f . The
  fi and ci are formulas either in FOL syntax and supported by means of a
  transformation into ASP syntax described in [7]) or plain AnsProlog syntax,
  e.g., [0.5] win :- coin(heads). Informally, every FOL formula or program with
  FOL formulas results in a set of ASP formulas. The precise AnsProlog syntax
  depends on the external ASP grounder being employed by PrASP - in principle,
  any grounder could be used. The current prototype implementation has been
  tested with Gringo/Clingo 3 and 4 (http://potassco.sourceforge.net).


                                               59
A Hybrid Approach to Inference

  The semantics of PrASP is defined in terms of probability distributions over
  possible worlds which are identified with answer sets (models) - an assumption
  inspired by P-Log [1]. Let M = (D, Θ, π, µ) be a probability structure where D
  is a finite discrete domain of objects, Θ is a non-empty set of possible worlds, π
  is a function which assigns to the symbols in Φ predicates, functions and objects
  over/from D, and µ = (µl , µu ) is a discrete probability function over Θ, a PrASP
  program and a query formula, as defined further below.
  Each possible world is a Herbrand interpretation over Φ. Since we will use an-
  swer sets (i.e., stable models of a (disjunctive) answer set program) as possible
  worlds, defining Γ (a) to be the set of all answer sets of answer set program a
  will become handy.
      We define a (non-probabilistic) satisfaction relation of possible worlds and
  unannotated programs as follows: let Λ− be is an unannotated program and
  lp a transformation which transforms such a program (which might contain
  formulas in first-order logic syntax in addition to formulas in ASP syntax) into
  a disjunctive program. The details of this transformation are outside the scope
  of this paper and can be found in [7].
      Then (M, θ) Θ Λ− iff θ ∈ Γ (lp(Λ− )) and θ ∈ Θ. For a disjunctive program
  ψ, we define (M, θ) Θ ψ iff θ ∈ Γ (ψ) and θ ∈ Θ.
      To do groundwork for the computation of a probability distribution over
  possible worlds Θ from a given PrASP program, we define a (non-probabilistic)
  satisfaction relation of possible worlds and unannotated formulas:
  Let φ be a PrASP formula (without weight) and θ be a possible world. Fur-
  thermore, let (M, θ) Λ φ iff (M, θ) Θ ρ(Λ) ∪ lp(φ) and Θ = Γ (ρ(Λ)) (we say
  formula φ is true in possible world θ). Sometimes we will just write θ |=Λ φ if
  M is given by the context. We abbreviate (M, θ) Λ φ as θ Λ φ. At this, the
  spanning program ρ(Λ) of PrASP program Λ is a non-probabilistic disjunctive
  program (without independence constraints) generated by removing all weights
  and transforming each formerly weighted formula f or ¬f into a disjunction
  f |¬ f , where ¬ stands for default negation. Informally, the spanning program
  represents the uncertain but unweighted beliefs of the knowledge engineer or
  agent. With Γ (a) as defined above, the set of possible worlds deemed possible
  according to existing belief ρ(Λ) is denoted as Γ (ρ(Λ)).
  We define the minimizing parameterized probability distribution µl (Λ, Θ, q) over
  a set Θ = {θ1 , ..., θm } = Γ (ρ(Λ)) of answer sets (possible worlds), a PrASP
  program Λ = {([pi ]fi , i = 1..n)} ∪ {([pi |ci ]fic )} ∪ {indep({f1i , ..., fki })} and a
  query formula q as {θi 7→ P r(θi ) : θi ∈ Θ} where (P r(θ1 ), ..., P r(θm )) is
  any solution P of the following system of inequalities (constraints) such that 1)
  P rl (q) = θi ∈Θ:θi Λ q P r(θi ) is minimized and 2) the distribution has maximum
  entropy [19] among any other solutions which minimize the said sum. Anal-
  ogously, µu denotes a maximum entropy       P probability distribution so that the
  P r(θ1 ), ..., P r(θm ) maximize P ru (q) = θi ∈Θ:θi Λ q P r(θi ).

                    X                                                     X
     l(f1 ) ≤                    P r(θi ) ≤ u(f1 )   ···   l(fn ) ≤                    P r(θi ) ≤ u(fn )   (1)
                θi ∈Θ:θi Λ f1                                        θi ∈Θ:θi Λ fn


                                                     60
A Hybrid Approach to Inference
                                              X
                                                     θi = 1                                       (2)
                                             θi ∈Θ

                                     ∀θi ∈ Θ : 0 ≤ P r(θi ) ≤ 1                                   (3)


  At this, l(fi ) and u(fi ) denote the lower and upper endpoints of the probability
  interval (imprecise probability) of unconditional formula fi (analogous for inter-
  val endpoints l(fic |ci ) and u(fic |ci ) of conditional probabilities).
  In addition, any indep-declaration indep(F i ) in the program induces for every
  subset
  Q        {f1i , ..., fri }P⊆ F i , r > 1 constraints of theQfollowing form:
                  i
      i
     fk=1..r l(f k  )  ≤                    V i
                               θj ∈Θ:θj Λ fk=1..r   P r(θj ) ≤ f i            u(fki ). In the case of
                                                                     Vk={1..r} i       Q
  point (i.e., precise) probabilities, these encode P r( k=1..r fk ) = k=1..r P r(fki ).
  Furthermore, any conditional probability formula [pi |ci ]fic ) in the program in-
  duces constraints for ensuring l(fic |ci ) ≤ P r(fic |ci ) ≤ u(fic |ci )
  (with
  P       pi = [l(fic |ci ); u(fic |ci )]),P  namely
                                 c                        c
           P  r(θ   j )ν(θ   ,
                            j if   ∧ c i ) +   θj ∈Θ −l(fi |ci )P r(θj )ν(θj , ci ) > 0
  Pθj ∈Θ                         c
                                             P
     θj ∈Θ P r(θj )ν(θj , fi ∧ ci ) +                −u(fic |ci )P r(θj )ν(θj , ci ) < 0
                                           ( θj ∈Θ
                                             1, if θ Λ f
  At this, we define ν(θ, f ) =
                                             0, otherwise
  For small systems, PrASP can compute minimizing and maximizing probability
  distributions directly using the inequalities above with linear programming, and
  a maximum entropy solution amongst a number of candidate distributions (so-
  lutions of an underdetermined system) can be discovered using gradient descent.
  However, to make distribution finding tractable, we need to use different algo-
  rithms, as described in the next section. That is, the inequalities system above
  serves mainly as a means to define the semantics of PrASP formulas.
  Finally, marginal inference results are obtained as follows: the result of a query
  of form [?] q is defined as the interval [P rl (q), P ru (q)] and the result of con-
  ditional queries of form [?|c] f (which stands for P r(f |c), where c is some
  evidence) is computed using P r(f ∧ c)/P r(c). An example PrASP program:
  coin(1..10).
  [0.4;0.6] coin_out(1,heads).
  [[0.5]] coin_out(N,heads) :- coin(N), N != 1.
  1{coin_out(N,heads), coin_out(N,tails)}1 :- coin(N).
  n_win :- coin_out(N,tails), coin(N).
  win :- not n_win.
  [0.8|win] happy.
  :- happy, not win.

     The line starting with [[0.5]]... is syntactic sugar for a set of weighted rules
  where variable N is instantiated with all its possible values (i.e.,
  [0.5] coin_out(2,heads) :- coin(2), 2 != 1 and
  [0.5] coin_out(3,heads) :- coin(3), 3 != 1). It would also be possible to use
  [0.5] as annotation of this rule, in which case the weight 0.5 would specify the
  probability of the entire non-ground formula instead.
  1{coin_out(N,heads), coin_out(N,tails)}1 (Gringo AnsProlog syntax) denotes
  that a coin comes up with either heads or tails but not both.


                                                 61
A Hybrid Approach to Inference

  Our system accepts query formulas in format [?] a, which asks PrASP for
  the marginal probability of a and [?|b] a which computes the conditional
  probability P r(a|b). E.g., query [?|coin_out(2,tails)] happy results in [0;0].


  4   Sampling and Inference Algorithms
  PrASP (as a software system) contains a variety of exact and approximate in-
  ference algorithms which can be partially combined in a hybrid fashion. Using
  command line options, the user selects a pipeline of alternative pruning (simpli-
  fication), sampling and inference steps (depending on the nature and complexity
  of the respective problem). E.g., the user might chose to sample possible worlds
  from a near-uniform distribution and to pass on the resulting models to a sim-
  ulated annealing algorithm which computes a probability distribution over the
  sampled possible worlds. Finally, this distribution is used to compute the condi-
  tional or marginal probabilities of the query formulas. The inference algorithms
  available in the current prototype (version 0.7) of PrASP are:
  Linear programming Direct solution for the linear inequalities system de-
  scribed before. Precise and very fast for very small systems, intractable other-
  wise.
  Various answer set sampling algorithms for so-called initial sampling
  These can in some cases be used directly for inference, by computing a distri-
  bution which complies with the constraints (linear system) described before. An
  exemplary such algorithm is Algorithm 1. Alternatively, they can be followed
  by another inference algorithm (simulated annealing or iterative refinement, see
  below) which corrects the initial distribution computed by initial sampling.
  Parallel simulated annealing This approach (Algorithm 2) performs simu-
  lated annealing for inference problems where no assumptions can be made about
  independence or other properties of the program (except consistency). It can be
  used either stand-alone or in a hybrid combination with an initial sampling stage
  (e.g., Algorithm 1).
  Iterative refinement An adaptation of the inference algorithm described in
  [16] with guaranteed minimal Kullback−Leibler divergence to the uniform dis-
  tribution (i.e., maximum entropy).
  Direct counting Weights are transformed into unweighted formulas and queries
  are then solved by mere counting of models (see [10] for details).
      Most of our algorithms rely heavily on near-uniform sampling, either using
  randomization provided by the respective external ASP solver (fast but typically
  rather low quality, i.e., weakly uniform) or using so-called XOR-constraints as
  described in [10] (which provides higher sampling quality at expense of speed).
  From PrASP’s inference algorithms, we describe one of the initial sampling al-
  gorithms (Algorithm 1) and parallel simulated annealing (Algorithm 2).
  An interesting property of the first algorithm is its ability to provide a suitable
  distribution over possible worlds directly if all weighted formulas in the PrASP
  program are mutually independent (analogously to the independence assump-
  tion typically made by distribution semantics-based approaches). Algo. 2 can be


                                         62
A Hybrid Approach to Inference

  used stand-alone or subsequently to Algo. 1: in that case, the probability dis-
  tribution computer by initial sampling (with replacement) is used as the initial
  distribution which is then refined by simulated annealing until all constraints
  (given probabilities) are fulfilled. The benefit of this pipelined approach to in-
  ference is that the user (knowledge engineer) doesn’t need to know about event
  independence - if the uncertain formulas in the program are independent, ini-
  tial sampling already provides a valid distribution and the subsequent simulated
  annealing stage almost immediately completes. Otherwise, simulated annealing
  “repairs” the insufficient distribution computed by the initial sampling stage.
  Concretely, Algo. 1 samples answer sets and computes a probability distribution
  over these models which reflects the weights provided in the PrASP program,
  provided that all uncertain formulas in the program describe a mutually in-
  dependent set of events. Other user-provided constraints (such as conditional
  probabilities in the PrASP program) are ignored here. Also, Algo. 1 does not
  guarantee that the solution has maximum entropy.


  Algorithm 1 Sampling from models of spanning program (point probabilities
  only)
  Require: max number of samples n, set of uncertain formulas uf                              =
      {[w(uf i )]uf i with 0 < w(uf i ) < 1}, set of certain formulas cf = {cf i : w(uf i ) = 1}
      (i.e., with probability 1)
   1: i ← 1
   2: for i ≤ |uf | do
   3:     ri ← random element of Sym({1, ..., n}) (permutations of {1, ..., n})
   4:     i ← i+1
   5: end for
   6: m ← ∅, j ← 1
   7: parfor j ∈ {1, ..., n} do
   8:     p ← ∅, k ← 1
   9:     for k ≤ |uf | do
  10:         if rjk ≤ n · w(uf k ) then p ← p ∪ {uf k } else p ← p ∪ {¬uf k } endif
  11:         k ← k+1
  12:      end for
  13:      s ← model sampled uniformly from models of program cf ∪ p (∅ if UNSAT)
  14:      m ← m ] {s}
  15: end parfor
  Ensure: Multiset m contains samples from all answer sets of spanning program such
      that
                         |{s∈m:s|=uf i }|
  16: ∀uf i : w(uf i ) ≈       |m|
                                          iff set uf mutually independent.


     Algorithm 2 presents the approach PrASP uses for approximate inference
  using a parallel form of simulated annealing (which does not require event in-
  dependence). The initial list of samples initSamples are computed according to
  an initial sampling approach such as Algo. 1.


                                              63
A Hybrid Approach to Inference
  Algorithm 2 Inference by parallel simulated annealing Part 1.
  We show only the basic variant for non-conditional formulas with point weights.
  The extension for conditional probabilities and interval probabilities (imprecise
  probabilities) is straightforward.
  Require: maxT ime, maxEnergy, initT emp, initSamples (from, e.g., Algo. 1).
     initSamples is a multiset which encodes a probability distribution via frequen-
     cies of models), frozen, degreeOfParallelization, α, F (set of weighted formulas), Λ
     (PrASP program)

   1: s ← initSamples, k ← 0, temp ← initTemp
   2: e ← energy(s)
   3: while k ≤ maxT ime ∧ temp ≥ frozen do
   4:    parfor i ← 1, degreeOfParallelization do
   5:        s00i ← s00 ] sampleStep(samplingMethod )
   6:    end parfor
   7:    s0 ← argmins00 (energy(s001 ), ..., energy(s00n ))
   8:    e0 ← energy(s0 )
                                     0
   9:    if e0 < e ∨ random10 < e−(e −e)/temp) then
  10:        s ← s0
  11:        e ← e0
  12:    end if
  13:    temp ← temp · α
  14:    k ← k+1
  15: end while

  Ensure: Multiset s = (pw, frq) = µapprox (Λ) approximates the probability distribu-
     tion µ(Λ) = P r(Γ (ρ(Λ))) over the set pw = {pwi } = Γ (ρ(Λ)) of possible worlds
     by {P r(pwi ) ≈ frq(pw)
                        |s|
                             }.

  16: function energy(s)
  17:    parfor fi ∈ |F | do
                         |{{s0 ∈s:s0 |=Λ fi }}|
  18:        freq fi ←
                                  |s|
  19:     end parfor
                 qP
                                                  2
          return     fi ∈F (freq fi − weight fi )
  20: end function
  21:                                                         . (Continued in Part 2 below)


     In addition to the actual inference algorithms, it is often beneficial to let
  PrASP remove (prune) all parts of the spanning program which cannot influence
  the probabilities of the query formulas. The approach to this is straightforward
  (program dependency analysis) and therefore omitted here. We will show in the
  evaluation section how such simplification affects inference performance.


                                                  64
A Hybrid Approach to Inference
  Algorithm 2 Inference by parallel simulated annealing Part 2.
  22: function stepSample(samplingMethod )
              . The framework provides various configurable methods for the simulated
      annealing sampling step, of which we show here only one.
  23:    F0 ← ∅
  24:    for fi ∈ |F | do
  25:        if random10 < weightfi then
  26:            F 0 ← F 0 ∪ {fi }
  27:        else
  28:            F 0 ← F 0 ∪ {¬fi }
  29:        end if
  30:    end for
          return answerSets(F 0 ) (might be ∅)
  31: end function


     While for space-related reasons this paper covers deductive inference only,
  PrASP also supports induction (learning of weights of hypotheses from example
  data). Please refer to [10] for details.


  5   Experiments
  The main goal of PrASP is not to outperform existing approaches in terms
  of speed but to provide a flexible, scalable and highly configurable framework
  which puts as few restrictions as possible on what users can express in terms
  of (non-monotonic) certain and uncertain beliefs while being competitive with
  more specialized inference approaches if the respective conditions (like event
  independence) are met.
  For our first experiment, we model a coin game (a slightly simplified variant of
  the example code shown before): a number of coins are tossed and the game is
  won if a certain subset of all coins comes up with “heads”. The inference task
  is the approximation of the winning probability. In addition, another random
  subset of coins are magically dependent from each other and one of the coins is
  biased (probability of “heads” is 0.6). Despite its simplicity, this scenario shows
  how inference copes with independent as well as dependent uncertain facts, and
  how effective the pruning approach of the respective framework works (since
  winning depends only on a subset of coins). Also, inference complexity clearly
  scales with the number of coins. In PrASP syntax, such a partially randomly
  generated program looks, e.g., as follows (adaptation to MLN or ProbLog syntax
  is straightforward):
  coin(1..8).
  [0.6] coin_out(1,heads).
  [[0.5]] coin_out(N,heads) :- coin(N), N != 1.
  1{coin_out(N,heads), coin_out(N,tails)}1 :- coin(N).
  win :- 2{coin_out(3,heads),coin_out(4,heads)}2.
  coin_out(4,heads) :- coin_out(6,heads).


                                         65
A Hybrid Approach to Inference

     The inference algorithm used is initial sampling (Algo. 1) followed by sim-
  ulated annealing (Algo. 2). Using Algo. 1, we computed 100 random models
  (number of samples n), which is sufficient to obtain a precision of +/-0.01 for
  the query probabilities. The winning subset of coins and the subset of mutually
  dependent coins (from which a rule of the form
  coin_out(a,heads) :- coin_out(b,heads), coin_out(c,heads), ...
  is generated) is each a random set with 25% of the size of the respective full set of
  coins. “PrASP 0.7.2 simp” in Fig. 1 stands for results obtained with pruning (i.e.,
  parts of the program on which the query result cannot depend have been auto-
  matically removed). We also report the results obtained solely using (Algorithm
  2) (“noinit” in Fig. 1), in order to see whether the initial sampling stage provides
  any benefits here. Simulated annealing parameters have been maxEnergy = 0.15,
  initTemp = 5, frozen = 10−150 , α = 0.85.
  We compared the performance (duration in dependency of the number of coins
  (x-axis), minimum number of 18 coins) of the current prototype of PrASP with
  that of Tuffy 0.3 (http://i.stanford.edu/hazy/hazy/tuffy/), a recent implemen-
  tation of Markov Logic Networks which uses a database system in order to in-
  crease scalability, and ProbLog2 2.1 (https://dtai.cs.kuleuven.be/problog/) (de-
  spite random dependencies). Times are in milliseconds, obtained using an i7
  4-cores processor with 3.4GHz over five trials.
                  Experiment Suite 'Coins'(Duration)
                                                                   ProbLog2 and Tuffy scale
  9000
                                                               very   well here, with some
  8000
                                     ProbLog2 2.1:Duration     additional  time required by
                                  MLN(Tuffy 0.3):Duration
  7000                              (PrASP 0.7.2):Duration     Tuffy   probably   due to the
                              (PrASP 0.7.2 simp):Duration      overhead   introduced  by ex-
  6000
                             (PrASP 0.7.2 noinit):Duration
  5000                                                         ternal database operations.
  4000                                                         With PrASP, we observe that
  3000                                                         priming simulated annealing
  2000
                                                               with an initial sampling step
  1000
                                                               (which would give inaccurate
     0
       0     50      100         150        200        250 300 results if used standalone)

               Fig. 1. Biased coins game                       improves performance mas-
                                                               sively, whereas pruning ap-
                                                               pears to suffer from the addi-
  tional overhead introduced by the required dependency analysis of the logic
  program. Our hypothesis regarding this behavior is that the initial distribution
  over possible worlds is, albeit not perfect, quite close to the accurate distribution
  so that the subsequent simulated annealing task takes off a lot faster compared
  to starting from scratch (i.e., from the uniform distribution).
  The next experiment shows how PrASP copes with a more realistic benchmark
  task - a form of the well-known friends-and-smokers problem [15] - which can
  be tractably approached using Algorithm 1 alone since the independence as-
  sumption is met (which also makes it suitable for ProbLog). On the other
  hand, the rules are more complex. In this benchmark scenario, a randomly
  chosen number of persons are friends, a randomly chosen subset of all people


                                             66
A Hybrid Approach to Inference

  smoke, there is a certain probability for being stressed ([[0.3]] stress(X)),
  it is assumed that stress leads to smoking (smokes(X) :- stress(X)),
  and that some friends influence each other with a certain probability
  ([[0.2]] influences(X,Y)), in particular with regard to their smoking behavior
  smokes(X) :- friend(X,Y), influences(Y,X), smokes(Y). With a certain proba-
  bility, smoking leads to asthma ([[0.4]] h(X). asthma(X) :- smokes(X), h(X)).
  The query comprises of [?] asthma(X) for each person X.
               Experiment Suite 'Smokers Network'(Duration)
                                                                    The results (Fig. 2) have
  300000
                                                                been  averaged over five trials.
                                         ProbLog2:Duration      Again, ProbLog2 scores best
  250000                           MLN(Tuffy 0.3):Duration
                        (PrASP 0.7.2 (10 models)):Duration      in this scenario. PrASP, using
                       (PrASP 0.7.2 (100 models)):Duration      Algorithm 1 (since all uncer-
  200000
                                                                tain facts in this scenario are
  150000
                                                                mutually independent), does
  100000                                                        quite well for most of episodes
   50000                                                        but looses on ProbLog2. Tuffy
                                                                does very well below 212 per-
       0
         0       50          100          150         200   250 sons, then performance mas-
             Fig. 2. Smokers social network                     sively breaks in for unknown
                                                                reasons (possibly due to some
                                                                internal cache overflow). For
  technical reasons, we couldn’t get the smokers-scenario working with the pub-
  licly available current implementation of P-Log (we received segmentation faults
  which couldn’t be resolved), but experiments with examples coming with this
  software seem to indicate that this approach also scales fine.


  6    Conclusion
  We have presented a new software framework for uncertainty reasoning and pa-
  rameter estimation based on Answer Set Programming. In contrast to most other
  approaches to probabilistic logic programming, the philosophy of PrASP is to
  provide a very expressive formal language (ASP or full FOL syntax over finite
  domains for formulas annotated with precise as well as imprecise probabilities)
  on the one hand and a variety of inference algorithms which are able to take
  advantage of certain problem domains which facilitate “fast track” reasoning
  and learning (in particular inference in the presence of formula independence)
  on the other. We see the main benefit of our framework, besides its support for
  non-monotonic reasoning, thus in its semantically rich and configurable uncer-
  tainty reasoning approach which allows to combine various sampling and infer-
  ence approaches in a pipeline-like fashion. Ongoing work focuses on additional
  experiments and the integration of further inference algorithms, and the direct
  integration of an ASP solver into PrASP, in order to avoid expensive calls of ex-
  ternal reasoning tools. Another area of ongoing work is the support for so-called
  annotated disjunctions [20]. Sponsored by SFI grant n. SFI/12/RC/2289.


                                              67
A Hybrid Approach to Inference

  References
   1. Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets.
      Theory Pract. Log. Program. 9(1), 57–144 (2009)
   2. Corapi, D., Sykes, D., Inoue, K., Russo, A.: Probabilistic rule learning in nonmono-
      tonic domains. In: Procs. 12th international conference on Computational logic in
      multi-agent systems. pp. 243–258. CLIMA’11, Springer-Verlag, Berlin, Heidelberg
      (2011), http://dl.acm.org/citation.cfm?id=2044543.2044565
   3. Cussens, J.: Parameter estimation in stochastic logic programs. In: Mach. Learn.
      p. 2001 (2000)
   4. Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational
      models. In: In IJCAI. pp. 1300–1309. Springer-Verlag (1999)
   5. Kersting, K., Raedt, L.D.: Bayesian logic programs. In: Proceedings of the 10th
      International Conference on Inductive Logic Programming (2000)
   6. Laskey, K.B., Costa, P.C.: Of klingons and starships: Bayesian logic for the 23rd
      century. In: Procs. of the 21st Conf. on Uncertainty in Artificial Intelligence (2005)
   7. Lee, J., Palla, R.: System f2lp - computing answer sets of first-order formulas. In:
      Erdem, E., Lin, F., Schaub, T. (eds.) LPNMR. Lecture Notes in Computer Science,
      vol. 5753, pp. 515–521. Springer (2009)
   8. Muggleton, S.: Learning stochastic logic programs. Electron. Trans. Artif. Intell.
      4(B), 141–153 (2000)
   9. Ng, R.T., Subrahmanian, V.S.: Stable semantics for probabilistic deductive
      databases. Inf. Comput. 110(1), 42–83 (1994)
  10. Nickles, M., Mileo, A.: Probabilistic inductive logic programming based on answer
      set programming. In: 15th Int’l Workshop on Non-Monotonic Reasoning (NMR’14)
      (2014)
  11. Nickles, M., Mileo, A.: A system for probabilistic inductive answer set program-
      ming. In: 9th International Conference on Scalable Uncertainty Management
      (SUM’15) (2015, to appear)
  12. Nilsson, N.J.: Probabilistic logic. Artificial Intelligence 28(1), 71–87 (1986)
  13. Poole, D.: The independent choice logic for modelling multiple agents under un-
      certainty. Artificial Intelligence 94, 7–56 (1997)
  14. Raedt, L.D., Kimmig, A., Toivonen, H.: Problog: A probabilistic prolog and its
      application in link discovery. In: IJCAI. pp. 2462–2467 (2007)
  15. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learning 62(1-2),
      107–136 (February 2006), http://dx.doi.org/10.1007/s10994-006-5833-1
  16. Rodder, W., Meyer, C.: Coherent knowledge processing at maximum entropy by
      spirit. In: Proceedings of the Twelfth Conference on Uncertainty in Artificial In-
      telligence (UAI’96), 1996 (1996)
  17. Saad, E., Pontelli, E.: Hybrid probabilistic logic programming with non-monotoic
      negation. In: In Twenty First International Conference on Logic Programming.
      Springer Verlag (2005)
  18. Sato, T., Kameya, Y.: Prism: a language for symbolic-statistical modeling. In: In
      Proceedings of the 15th International Joint Conference on Artificial Intelligence
      (IJCAI97. pp. 1330–1335 (1997)
  19. Thimm, M., Kern-Isberner, G.: On probabilistic inference in relational conditional
      logics. Logic Journal of the IGPL 20(5), 872–908 (2012)
  20. Vennekens, J., Verbaeten, S., Bruynooghe, M.: Logic programs with annotated dis-
      junctions. In: Demoen, B., Lifschitz, V. (eds.) Logic Programming, Lecture Notes
      in Computer Science, vol. 3132, pp. 431–445. Springer Berlin Heidelberg (2004)


                                            68