<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Differentiable SAT/ASP</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National University of Ireland, Galway School of Engineering and Informatics</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose Differentiable SAT and Differentiable Answer Set Programming for multi-model optimization through gradient-controlled answer set or satisfying assignment computation. As a use case, we also show how our approach can be used for expressive probabilistic inference constrained by logical background knowledge. In addition to presenting an enhancement of the CDNL/CDCL algorithm as primary implementation approach, we introduce alternative algorithms which use an unmodified ASP solver and map the optimization task to conventional answer set optimization or use so-called propagators.</p>
      </abstract>
      <kwd-group>
        <kwd>SAT</kwd>
        <kwd>ASP</kwd>
        <kwd>Probabilistic Programming</kwd>
        <kwd>Gradient Descent proximate Probabilistic Inference</kwd>
        <kwd>Relational Artificial Intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Modern SAT and Answer Set solvers are, like their closely related cousins constraint
processing and Satisfiability Modulo Theories (SMT), powerful and fast tools for
logical reasoning. We present an approach which utilizes and enhances current SAT/ASP
solving algorithms for multi-model optimization, with probabilistic inference as the
focused - but not the only - use case. We build on previous work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and add new
algorithms, in particular methods which require only an existing, unmodified ASP solver
or map the task to a regular answer set optimization problem.
      </p>
      <p>With Multi-model optimization we mean the search for a multi-set of models
(satisfying Boolean assignments or answer sets) which indirectly represents an (approximate)
minimum of a user-provided cost function. We consider cost functions defined over
certain statistical properties of the model multi-set, namely frequencies of atoms (with cost
functions over fact, rule, clause or model weights as straightforward instances), and we
incrementally sample models until a cost minimum is reached, with partial derivatives
of the cost function guiding the assignment of decision literals in the SAT or ASP
solving process.</p>
      <p>In the use case of probabilistic logic programming (a form of declarative probabilistic
programming), the resulting multi-model optimum approximates a probability
distribution over possible worlds (models) induced by probabilistic constraints (encoded as
the cost function) and non-probabilistic rules and clauses (a regular ASP program or
Boolean formula in CNF of which all sampled models are answer sets respectively
satisfying assignments). By sampling only as many models as required for cost
minimization, we reduce the number of expensive conventional deductive inference steps
and avoid the combinatorial explosion of materialized possible worlds with increasing
number of nondeterministic atoms which typically precludes the use of straightforward
optimization techniques (such as linear programming) in probabilistic logic
programming.</p>
      <p>
        In principle, arbitrary differentiable cost function can be used (although obviously not
all cost functions lead to convergence of the optimization process) and there are no
restrictions on the background knowledge rules or clauses, or the random variables (such
as independence assumptions), except consistency. The expressiveness of the
framework is thus quite high, and, despite the use of sampling, computation times remain a
challenge (in particular in comparison with approaches which deliberately put
restrictions into place, such as frameworks based on the Distribution Semantics). We tackle
this concern with our first concrete algorithm (a variant of the algorithm presented in
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]) by integrating the cost minimization steps directly into a state-of-the-art ASP/SAT
solving approach [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and we compare its performance experimentally with that of
several alternative methods introduced in this paper.
      </p>
      <p>The remainder of this paper is organized as follows: The following section introduces
basic concepts and notation. Sect. 3 presents our general approach and proposes several
concrete computation methods. Sect. 4 presents results from preliminary experiments,
and Sect. 5 discusses related approaches. Sect. 6 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        We consider ground normal logic programs under stable model semantics and SAT
problems in form of propositional formulas in Conjunctive Normal Form (CNF).
Recall that a normal logic program is a finite set of rules of the form
h : b1; :::; bm; not bm+1; :::; not bn (with 0 m n). h and the bi are atoms without
variables. not represents default negation. Rules without body literals are called facts.
Most other syntactic constructs supported by contemporary ASP solvers (like integrity
constraints, choice rules or classical negation) can be translated into (sets of) normal
rules. We consider only ground programs in this work. The answer sets (stable
models) of a normal logic program are as defined in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Throughout the paper, we use the
term “answer set program” to mean a ground normal logic program and “model” in
the sense of answer set or, in the SAT case, a complete truth assignment such that the
formula evaluates to true, however, to use the same model notation with both ASP and
SAT, we generally do not show false atoms in models, i.e., a model is represented as
a set of atoms which hold in the model. As common in probabilistic ASP, we identify
possible worlds with models. denotes the set of all answer sets or satisfying
assignments of answer set program or propositional formula . Sometimes we use only logic
programming terminology where the translation to SAT terminology is obvious (e.g.,
“program” instead of set of clauses). The set of all atoms respectively propositional
variables in a program or formula is denoted as atoms( ). We write S to denote a
set of negative literals fsi : si 2 Sg.
      </p>
      <p>
        A partial assignment denotes an incomplete “model under construction”: a sequence of
literals which have been iteratively added (assigned) to the assignment (and sometimes
retracted from the assignment in backtracking steps) by the SAT or ASP solver until the
assignment is complete or the procedure aborts in case of unsatisfiability.
We use a unified approach to SAT and ASP solving based on nogoods [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
(corresponding to clauses in CNF, but with all literals negated; a concept originally introduced for
constraint solving). Clauses and rules are translated into nogoods in a preprocessing
step (covering Clark’s completion in the ASP case). Additional nogoods are learned
from conflicts and loops (in non-tight ASP programs).
2.1
      </p>
      <sec id="sec-2-1">
        <title>Cost Functions and Parameter Atoms</title>
        <p>The cost functions considered in this work are user-provided functions of several
variables. Each variable corresponds to a so-called parameter atom. When evaluating the
cost function, we instantiate each variable with the normalized count (frequency) of the
respective parameter atom in a possibly incomplete sample. The sample is a multi-set
of models sampled with replacement from the complete set of answers sets of the
given program, respectively the set of all satisfying assignments (SAT case). Where
a parameter atom or its negation can occur we speak of parameter literals.
Parameter atoms can occur in rules and clauses of the given answer set program or Boolean
formula without limitations, and may even be conditioned on the truth values of other
atoms, including other parameter atoms. The set of parameter atoms is denoted as or
f ig, and i is the i-th parameter atom under some arbitrary but fixed order. We denote
the individual cost function variables corresponding to the parameter atoms as iv or av
(where a is a parameter atom). The parameter atoms need to be chosen by the user from
the overall set of atoms in the program or set of clauses (theoretically, all atoms could
be declared parameter atoms, but this might be inefficient).</p>
        <p>With each newly sampled model, the frequencies of the parameter atoms are updated
as follows. (sample) = h sample ( 1); sample ( 2); :::i is defined as the (parameter)
frequencies vector h j[mj : mj 2jsasammpplelej; 1 2 mj]j ; :::; j[mj : mj 2jsasammpplelej; n 2 mj]j i
of parameter atom frequencies in the model multi-set sample. sample ( i) denotes the
frequency of parameter atom i in sample. We omit sample and simply write ( i)
where it is clear from the context what the sample is. cost ( ( 1); ( 2); :::) evaluates
the cost function cost over parameter frequencies vector (sample). We sometimes
write cost (sample) in place of cost ( ( 1); ( 2); :::).</p>
        <p>
          It makes sense to allow only parameter atoms whose truth values are not fully fixed by
the input program or formula. To ensure this in the ASP case, we can give the logic
program the shape of a so-called spanning program [
          <xref ref-type="bibr" rid="ref13 ref9">13, 9</xref>
          ] where uncertain atoms a
are defined by spanning formulas: choice rules or analogous constructs amounting to
nondeterministic facts 0fag1 or (informally) a _ not a. However, our framework does
not require any particular form of the input program or formula.
        </p>
        <p>Informal examples for cost functions are “In 30% of all models, atom a should hold
and in 40% of all models, atom b should hold” or “Atom a should hold more often than
the square root of the frequency of atom b minus 10%”. In principle, other types of cost
functions which refer to other properties of the current sample multi-set are conceivable
too, provided the model sampler is able to minimize such a cost.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Cost Functions and Parameter Atoms for Probabilistic Logic Programming</title>
        <p>
          For the application case of deductive probabilistic inference, cost functions specify
probabilistic constraints, whereas the plain ASP rules or SAT clauses serve as hard
constraints. More concretely, we consider the case that the probabilistic constraints are
provided by the user in form of weights associated with individual parameter atoms i.
Weights directly represent probabilities. Weights can also be attached to arbitrary rules,
by introducing weighted fresh auxiliary atoms as “shortcuts” for these rules, using the
following scheme: h : b1; :::; bn; not aux and aux : b1; :::; bn; not h [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
As one (but not the only) suitable cost function which can be derived directly from a set
of weighted parameter atoms, we propose the use of the Mean Squared Error (MSE)
cost( 1v; 2v; :::) := n1 Pin=1( ( i) i)2 (squared Euclidean distance normalized with
the number n of parameter atoms). The i are the user-defined weights of the parameter
atoms i.
        </p>
        <p>
          With this cost function, appending models to the sample until the cost reaches zero
(i.e., maximum accuracy) corresponds to finding a solution of a linear equation system
with additional conditions to ensure that the solutions form a probability distribution,
with the probabilities of all possible worlds as the unknowns and the actual model
frequencies in sample as approximate solution vector (details are provided in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]).
Queries can then be performed as usual, by adding up the approximated probabilities
of those possible worlds where the query holds (that is, the frequencies of those models
in sample which positively contain the query atoms).
        </p>
        <p>As an example for how to assign probabilities to atoms using cost functions, consider
function cost (av; bv) = ((0:2 av)2 + (0:6 bv)2)=2) which specifies that the weight
of parameter atom a should be 0.2 and the weight of parameter atoms b should be 0.6.
As another example, this time for a cost function different from MSE (but also
solvable using Differentiable ASP), consider cost (aux v; qv) = (0:4 aux v=qv)2 which
specifies that the conditional probability P r(pjq) is 0.4. The accompanying answer set
program needs to include rules aux : p; q and p : aux and q : aux .
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Differentiable SAT/ASP</title>
      <p>To find a sample of models which minimizes the cost function, our approach iteratively
appends models to multi-set sample until the cost falls below a specified threshold
(allowing the user to trade speed against accuracy). All models are answer sets or
satisfying truth assignment of the given answer set program or Boolean formula, ensuring
that they adhere to the given “hard” logical constraints. Partial cost function derivatives
guide, during the generation of each individual model, the SAT/ASP solving process on
the level of branching decisions, namely truth value assignments to parameter atoms.</p>
      <p>Fig. 1(a) shows a high level view of this approach, named Differentiable SAT/ASP
or @SAT/ASP for short. An outer loop (left side of Fig. 1(a)) samples models and adds
them to an initially empty multi-set sample until the termination criterion is reached
(there are several specific possibility for checking termination, depending on the nature
of the cost function and the use case: if the cost expression is not too large, we can
check if it is equal or below a given threshold (accuracy), and/or we could perform
a stagnation check on the cost or the parameter atom frequencies. In some applications
it might also be sensible to demand a minimum sample size n in addition to
reaching threshold , e.g., to increase the sample entropy). In our experiments, we simply
stopped sampling when the cost reached or fell below .</p>
      <p>The models are sampled with the aim of reducing the multi-model cost, using a form
of discretized gradient descent. We approach this with a special branching rule for
selecting decision literals (literals not enforced by propagation) for inclusion in the
partial assignment: Each time the solver nondeterministically (i.e., not forced by rules or
clauses) extends the partial assignment with a not yet assigned literal, the literal is
selected from all unassigned parameter literals (if any) if the value (or its negation in case
of negative literals) of the cost function’s partial derivative with respect to this literal
is minimal (compared with the values obtained for the other unassigned parameter
literals). Since the parameter search space is discrete, we could theoretically measure the
cost impacts of hypothetically assignments of candidate literals directly. But taking the
partial derivatives with respect to the parameter atoms splits the overall cost calculation
(which might be complex) into typically simpler calculations whose results can even
be pre-computed and ranked after each new model according to their current values,
after updating the frequencies vector with the new model. Finally, for branching
decisions on non-parameter decision literals, some conventional branching heuristics (e.g.,
VSIDS) can be used.</p>
      <p>
        Fig. 1(b) outlines a variant where each new model is explicitly required to lower the
cost, without specifying how to achieve this. A specific approach to this variant,
proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], uses a so-called cost backtracking mechanism which in case a model
candidate fails to improve the cost, jumps back to the most recent literal decision point
with untried parameter literals and tries a new parameter literal. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] also indicates how
cost backtracking and a split of the set of parameter atoms into measured atoms and
actual parameter atoms can be utilized for inductive weight learning and abductive
reasoning (which are not possible using the approach in Fig. 1(a)).
      </p>
      <p>In the following subsections, we propose concrete approaches to put the general
approach in Fig. 1(a) into practice.</p>
      <sec id="sec-3-1">
        <title>Implementing Differentiable SAT/ASP based on CDNL</title>
        <p>
          A concrete approach to the rather general scheme described above is to enhance the
current state-of-the-art approach to Answer Set Programming (Conflict-Driven
Nogood Learning (CDNL) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) or a similar approach (CDCL (Conflict-Driven Clause
Learning)- or DPLL-style solving) with a new branching rule which selects free
parameter literals for inclusion into the current partial assignment according to their
negative “impact” on the cost, determined using partial derivatives wrt. parameter atoms.
This approach, which we call Diff-CDNL-ASP/SAT (as it covers both SAT and
Answer Set solving), is shown as Algo. 1. The SAT solving path through the algorithm,
enabled by parameter SATmode , is largely identical to the more complex ASP path,
with stable model checks and loop handling omitted. The algorithm is a variant of the
approach presented in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], with a somewhat more general branching approach using
partial derivatives but omitting the optional cost backtracking.
        </p>
        <p>
          Algo. 1 iteratively adds literals to a partial assignment until all literals are covered and
no nogood is violated. Important steps are unit propagation, i.e., the deterministic
expansion of the partial assignment (procedure PROPAGATE) with literals “fired” by
nogoods, and conflict analysis (conflict means at least one nogood is subset of the partial
assignment) and handling, including the deriving of further nogoods from the analysis
of conflicts and backjumping (undoing recent literal assignments) in line 21 (details
follow [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and are omitted here for lack of space).
        </p>
        <p>The procedure for generating a single model (the while-loop from line 6) is thus guided
by the following factors: 1) the initial set of nogoods obtained from the CNF clauses or
Clark’s completion of the answer set program, 2) further nogoods added to this set by
conflict handling or due to the presence of loops (ASP mode), 3) the given cost function
and set of parameter atoms, and 4) our new branching approach for assigning parameter
literals (lines 9 to 13). The inner while loop ends once all atoms are covered as
positive or negative literals (or UNSAT). Afterwards (line 25), the stable model check takes
place (unless in SAT mode), and the new model is appended to the multi-set sample .
The outer loop (from line 3) ends when a convergence criterion is met (e.g., when the
cost falls below the given accuracy threshold (line 32) and we have obtained the
requested number of models, or if there is no more progress).</p>
        <p>The decision branching rule is the main different to regular CDNL: In lines 11ff., we
select the next parameter literal according to the previously described approach using
partial derivatives. At this, it was in all our initial tests sufficient for reaching
convergence to fix a new ranking of parameter literals by the values of the respective partial
derivatives only after a new model was generated (ignoring the current partial
assignment in the computation of the parameter atom frequencies) and to use this ranking to
determine the next decision literal in line 11.</p>
        <p>
          For further details on the non-probabilistic aspects of the algorithm, we need to refer to
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for lack of space. Note that for loop handling, Algo. 1 uses the older and simpler
ASSAT approach [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], just to simplify our initial implementation.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Differentiable ASP using propagators (Diff-ASP-ThProp)</title>
        <p>While the approach from Sect. 3.1 is quite fast (see Sect. 4), it has the shortcoming that
it cannot be realized using an unmodified existing ASP or SAT solver. As an alternative
5:
approach to @SAT/ASP, we can make use of Clingo’s1 propagators. We show how to
do this using preliminary code2 (file propdiff_1.py.lp), instantiated with an
MSEshaped example cost function and two parameter atoms a (with given probability 0.6)
and b (probability 0.2). It requires only Clingo (tested with version 5.2) with Clingo’s
Python scripting interface and Python 2.7.</p>
        <p>While custom propagators cannot directly implement a branching heuristics, they can
be used to add (parameter) literals in form of singleton clauses to the ongoing solving
1 https://github.com/potassco/clingo
2 https://github.com/MatthiasNickles/Diff-ASP-Propagators
process, intercepting propagation. We compute the parameter literal we would like to
assign next (again using the approach in Sect. 3) and then pass this literal
(branch_param_lit) on to the propagator (lines 56ff. in the code) to add it to the
partial assignment.</p>
        <p>The rest of the Python code is straightforward: The loop from line 155 corresponds to
the outer loop in Algo. 1: it iteratively calls the solver to compute new models, adds
each newly sampled model to the model list sample (in callback on_model), updates
frequencies and evaluates the cost (method update_cost), and checks for convergence
against threshold psi.</p>
        <p>The code supports both numerical and automatic differentiation (the latter using the ad
package3). In both cases, the search for the parameter literal which gives the minimum
partial derivative (i.e., steepest descent) is performed in lines 124ff. Automatic
differentiation wrt. parameter atoms takes place in method __cost_ad. For numerical
differentiation, we use a simple approximation which just adds a small value h to the frequency
of each respective parameter atom to estimate the slope (method__cost_upd).
The actual cost function (including the given weights of the two parameter atoms) is in
line 84 (we use the expression format of the ad-package for Python to represent the cost
expression, also with numerical differentiation).
(Remark: We have also experimented with domain heuristics using Clingo’s designated
predicate _heuristic/3, but found no way yet to make this reliably working for our
use case. Also, this attempt appears to be much slower than all other approaches
presented in this paper.)
An example for a simple associated background theory (file propdiff_bgk_1.lp) is
0{a}1. % spanning rule for parameter atom a
0{b}1.
:- a, b. % an example for a hard rule
The overall program is called with clingo-python propdiff_1.py.lp propdiff_bgk_1.lp
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Direct cost minimization using model reification (Diff-ASP-Reification)</title>
        <p>As the final approach proposed in this paper, we use Clingo with reified predicates and
models to solve the cost function directly (without derivatives involved), alternatively
by mapping it to a conventional single answer set optimization task or by mapping the
problem to ASP-encoded equation solving.</p>
        <p>Here, each predicate in the original answer set program (not only the parameter atoms)
whose truth value is not fully fixed is enhanced with a model number as extra argument.
This way, we can let a single actual model (returned by Clingo) represent multiple
reified models where each reified model consists exactly of those atoms in the actual
model which share the same model number as their extra argument.</p>
        <p>We distinguish two flavors of this approach: 1) Computation of one or more individually
optimal answer set(s) using optimization statements or weak constraints (...: ...), as
supported by several ASP solvers, including smodels, DLV and Clingo/clasp, and 2)
using ASP directly for constrained linear equations solving.</p>
        <p>We consider variant 1) first, shown in the code below for the same example as before
(two uncertain atoms a and b with weights 0.2 and 0.6), and MSE as cost function
format (adaptations to some other types of cost function should be straightforward). The
number of reified models nmodels does not need to be known precisely but should
be at least 10n where n is the number of decimal places which should be accurate
3 https://pypi.org/project/ad/
when using the overall result to query Pr (a) and Pr (b). We found this approach very
slow in our preliminary experiments with default settings (more efficient encodings or
optimization strategies might exist), but in any case it is useful to exemplify how our
multi-model optimization task can be mapped to conventional answer set optimization
using reification.
#const nmodels = 10.
model(1..nmodels).
mcount(0..nmodels).
{a(M)} :- model(M). % spanning formulas
{b(M)} :- model(M).
:- a(M), b(M), model(M). % an example for a background knowledge rule (hard constraint)
wa(nmodels * 2 / 10). % weight a = 0.2
wb(nmodels * 6 / 10). % weight b = 0.6
fa(F) :- F { a(M): model(M) } F, mcount(F).
fb(F) :- F { b(M): model(M) } F, mcount(F).
diffa(D) :- D = (W - F)**2, wa(W), fa(F). % alternatively: D = |F - W|
diffb(D) :- D = (W - F)**2, wb(W), fb(F).
#minimize { DA : diffa(DA) }. % minimize the distances betw. weights and frequencies
#minimize { DB : diffb(DB) }.
#show a/1.
#show b/1.</p>
        <p>In variant 2) of our reification-based approach, we map the problem to a set of
linear equations (or inequalities, if error tolerance bounds are considered) as outlined
in Sect. 2.2, and encode it as a plain answer set program. It is immediately clear that
here the number of reified models introduces a bottleneck: Every predicate whose truth
value is not fully fixed across all reified models needs to be reified, which multiplies
the number of its instances with the overall number of reified models (nmodels in the
code below). Nevertheless, as detailed in the next section, this simple approach fares
surprisingly well for relatively small problem sizes, and significantly better than the
approach using propagation (Sect. 3.2). The following plain ASP program shows how
to implement this for the example problem above.
#const tol = 3. % NB: tol has a different semantics than \psi
#const multiplier = 100. % to map float numbers to integers; limits precision
#const nmodels = 400.
model(1..nmodels).
wa(nmodels * 2 * multiplier / (10 * multiplier)). % 2 represents given weight 0.2
W-tol &lt; { a(M): model(M) } &lt; W+tol :- wa(W).
wb(nmodels * 6 * multiplier / (10 * multiplier)). % 6 represents given weight 0.6
W-tol &lt; { b(M): model(M) } &lt; W+tol :- wb(W).
1{__aux_1(M);a(M)}1 :- model(M). % spanning formulas
1{__aux_2(M);b(M)}1 :- model(M).
:- a(M), b(M). % example for a hard background knowledge rule
#show a/1.
#show b/1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Preliminary Experiments</title>
      <p>To provide initial insight into the performance characteristics of the presented approaches,
we have performed two preliminary experiments. Approaches considered were
DiffCDNL-ASP/SAT (Sect. 3.1), Diff-ASP-ThProp (Sect. 3.2) and Diff-ASP-Reification
(Sect. 3.3), the latter in the equation solving variant (the version using #minimize
proved too slow with these experiments to be considered).</p>
      <p>We performed two synthetic experiments (Figs. 2a and 2b): a coin tossing game with
background rules and a variant of the well-known “Friends &amp; Smokers” scenario (which
exists in several variants in the literature). Times have been averaged over three trials
per experiment on a i7-4810MQ machine (4 cores, 2.8GHz) with 16GB RAM. The
Diff-CDNL variant of the @SAT/ASP approach (Sect. 3.1) has been experimentally
implemented in Scala 2.12 (i.e., running in a Java VM). Experiments have been
performed with different accuracy thresholds . All times are end-to-end times, so they
include some overhead for file operations, parsing, etc. For the Clingo-based tasks we
have used Clingo 5.2.2 with Python 2.7.10 for scripting. The tolerance 20 specified with
the reification-based tasks with 400 models corresponds roughly to accuracy :001
for coins and 0:05 for smokers.</p>
      <p>In the coin game, a number of coins are tossed and the game is won if a certain subset
of all coins comes up with “heads”. The inference task is the approximation of the
winning probability, calculated by counting the models with the winning coin combinations
within the resulting sample and normalizing this count with the size of the sample. In
addition, another random subset of coins are magically dependent from each other and
one of the coins is biased (probability of “heads” is 0.6). This scenario contains
probabilistically independent as well as mutually dependent uncertain facts. Also, inference
difficulty clearly scales with the number of coins. In pseudo-syntax, such a randomly
generated program looks, e.g., as follows:
coin(1..8).
0.6: coin_out(1,heads).
0.5: coin_out(N,heads) :- coin(N), N != 1.
1{coin_out(N,heads), coin_out(N,tails)}1 :- coin(N).
win :- 2{coin_out(3,heads),coin_out(4,heads)}2.
coin_out(4,heads) :- coin_out(6,heads).</p>
      <p>The proposed methods cannot directly work with non-ground rules, so weighted
non-ground rules anywhere have been translated into sets of ground rules, each
(respectively, their corresponding auxiliary “shortcut” parameter atoms) annotated with
the respective weights.</p>
      <p>In “Friends &amp; Smokers”, a randomly chosen number of persons are friends, a
randomly chosen subset of all people smoke, there is a certain probability for being stressed
(0.3: stress(X)), it is assumed that stress leads to smoking (smokes(X) :- stress(X)),
and that friends influence each other with a certain probability (0.2: influences(X,Y)),
in particular with regard to smoking: smokes(X) :- friend(X,Y), influences(Y,X), smokes(Y).
Smoking may leads to asthma (0.4: h(X). asthma(X) :- smokes(X), h(X)). The query
atoms are asthma(X) per each person X.</p>
      <p>For both experiments, it should be kept in mind that @SAT/ASP does not include any
special treatment of probabilistic independence.</p>
      <p>While the solution using Clingo with reified models cannot compete with the native
approach Diff-CDNL-ASP/SAT, it is still surprisingly fast, which, together with the
fact that there is virtually no difference between the 400 and 800 model variants (the
curves almost cover each other), exemplifies the strong solving performance of Clingo.
However, these results also seem to indicate (since both Clingo/clasp and
Diff-CDNLASP/SAT are based on CDNL) that the prototypical Diff-CDNL-ASP/SAT
implementation could be made significantly faster by further optimizing its code or by using, e.g.,
C/C++ or Rust. The alternative approach using propagators is clearly usable only for
tiny problems, at least with the current implementation (the graph for Diff-ASP-ThProp
with = 0:001 is not visible in Fig. 2a because its task immediately timed out).
(a) Coin Game</p>
      <p>
        (b) Friends &amp; Smokers
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposes distribution-aware sampling for SAT with weight functions over entire
models (which might be seen as an instance of our MSE-style variant of cost
optimization, albeit technically approached very differently). Also related is the weighted
satisfiability problem which aims at maximizing the sum of the given weights of satisfied
clauses (e.g., using MaxWalkSAT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), which can be used for inference in Bayesian
networks. PSAT (Probabilistic Boolean Satisfiability) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and SSAT problem [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
tackle related but different problems compared to ours. The envisaged main
application case for our multi-model sampling approach is probabilistic logic (programming)
and probabilistic deductive databases, in particular Probabilistic ASP (of which
existing approaches include, e.g., [
        <xref ref-type="bibr" rid="ref1 ref11 ref13 ref9">1, 11, 9, 13</xref>
        ]), but we expect other uses cases too, such
as working with combinatorial and search problems with uncertain facts and rules. In
the context of nonmonotonic Probabilistic Inductive Logic Programming, gradient
descent has been used for weight estimation. E.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses gradient descent to estimate
the probabilities of abducibles. In contrast to the common MC-SAT sampling approach
to inference with Markov Logic Networks (MLNs) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] (a form of slice sampling),
our task has a different semantics and our sampler is not wrapped around a uniform
sampler, and we allow to specify rule or clause probabilities directly. An interesting
approach with combines MLN with logic programming under the stable model semantics
and compiles programs directly into ASP (using weak constraints) is [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Other than our
approach, existing approaches to machine learning-based SAT solving (such as
Learning Rate Branching Heuristic (LRB)), or the combination of gradient descent / neural
network-based techniques or numerical optimization with SAT (such as [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) aim at an
improvement of SAT solving itself (which is not our concern in this work), or enable
single model optimization.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        We have presented differentiation-based approaches to SAT and ASP multi-model
computation which sample models in order to minimizes a custom cost function. Using
customized cost functions and parameter atoms, our overall approach can be instantiated in
order to provide a tool for probabilistic logic programming. Building on existing work
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], we have presented an approach using a steepest descent method, an algorithm
which utilizes Clingo’s propagators, and finally an approach (not differentiation-based)
which maps the problem to plain answer set optimization. Planned work comprises
further experiments and the determination of formal convergence criteria.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baral</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelfond</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rushton</surname>
          </string-name>
          , N.:
          <article-title>Probabilistic reasoning with answer sets</article-title>
          .
          <source>Theory Pract. Log. Program</source>
          .
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>57</fpage>
          -
          <lpage>144</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fremont</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meel</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seshia</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          , M.Y.:
          <article-title>Distribution-aware sampling and weighted model counting for SAT</article-title>
          .
          <source>In: Procs. 28th AAAI Conference on Artificial Intelligence (AAAI)</source>
          . pp.
          <fpage>1722</fpage>
          -
          <lpage>1730</lpage>
          (
          <year>July 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Corapi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sykes</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inoue</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Probabilistic rule learning in nonmonotonic domains</article-title>
          .
          <source>In: Computational Logic in Multi-Agent Systems</source>
          . pp.
          <fpage>243</fpage>
          -
          <lpage>258</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gebser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaufmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Conflict-driven answer set solving</article-title>
          .
          <source>In: Procs. of the 20th Int'l Joint Conf. on Artificial Intelligence (IJCAI'07)</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gelfond</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lifschitz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The stable model semantics for logic programming</article-title>
          .
          <source>In: Proc. of the 5th Int'l Conference on Logic Programming</source>
          . vol.
          <volume>161</volume>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gomes</surname>
            ,
            <given-names>C.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabharwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Near-uniform sampling of combinatorial spaces using xor constraints</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <fpage>481</fpage>
          -
          <lpage>488</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gressler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oetsch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tompits</surname>
          </string-name>
          , H.:
          <article-title>Harvey: A system for random testing in asp</article-title>
          .
          <source>In: Logic Programming and Nonmonotonic Reasoning</source>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kautz</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A general stochastic approach to solving problems with hard and soft constraints</article-title>
          .
          <source>In: The Satisfiability Problem: Theory and Applications</source>
          . pp.
          <fpage>573</fpage>
          -
          <lpage>586</lpage>
          . American Mathematical Society (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talsania</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Computing lp using asp and mln solvers</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          <volume>17</volume>
          (
          <issue>5-6</issue>
          ),
          <fpage>942</fpage>
          -
          <lpage>960</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Assat: computing answer sets of a logic program by sat solvers</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>157</volume>
          (
          <issue>1</issue>
          ),
          <fpage>115</fpage>
          -
          <lpage>137</lpage>
          (
          <year>2004</year>
          ), nonmonotonic Reasoning
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , R.T.,
          <string-name>
            <surname>Subrahmanian</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Stable semantics for probabilistic deductive databases</article-title>
          .
          <source>Inf. Comput</source>
          .
          <volume>110</volume>
          (
          <issue>1</issue>
          ),
          <fpage>42</fpage>
          -
          <lpage>83</lpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nickles</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Sampling-based sat/asp multi-model optimization as a framework for probabilistic inference</article-title>
          .
          <source>In: Procs. 28th International Conference on Inductive Logic Programming (ILP'18)</source>
          . Springer (
          <year>2018</year>
          (to appear))
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nickles</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mileo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A hybrid approach to inference in probabilistic non-monotonic logic programming</article-title>
          .
          <source>In: 2nd Int'l Workshop on Probabilistic Logic Programming (PLP'15)</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Papadimitriou</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          :
          <article-title>Games against nature</article-title>
          .
          <source>J. Comput. Syst. Sci</source>
          .
          <volume>31</volume>
          (
          <issue>2</issue>
          ),
          <fpage>288</fpage>
          -
          <lpage>301</lpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Sound and efficient inference with probabilistic and deterministic dependencies</article-title>
          .
          <source>In: Procs. of AAAI'06</source>
          . pp.
          <fpage>458</fpage>
          -
          <lpage>463</lpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pretolani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Probability logic and optimization sat: The psat and cpa models</article-title>
          .
          <source>Annals of Mathematics and Artificial Intelligence</source>
          <volume>43</volume>
          (
          <issue>1</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>221</lpage>
          (
          <year>Jan 2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Selsam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamm</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Bu¨nz,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Liang</surname>
          </string-name>
          , P.,
          <string-name>
            <surname>de Moura</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dill</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Learning a SAT solver from single-bit supervision</article-title>
          . CoRR abs/
          <year>1802</year>
          .03685 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>