Probabilistic Constraint Logic Theories

    Marco Alberti1 , Elena Bellodi2 , Giuseppe Cota2 , Evelina Lamma2 , Fabrizio
                           Riguzzi1 , and Riccardo Zese2
          1
              Dipartimento di Matematica e Informatica – University of Ferrara
                          Via Saragat 1, I-44122, Ferrara, Italy
                   2
                     Dipartimento di Ingegneria – University of Ferrara
                          Via Saragat 1, I-44122, Ferrara, Italy
                               name.surname@unife.it


        Abstract. Probabilistic logic models are used ever more often to deal
        with the uncertain relations typical of the real world. However, these
        models usually require expensive inference procedures. Very recently the
        problem of identifying tractable languages has come to the fore. In this
        paper we consider the models used by the learning from interpretations
        ILP setting, namely sets of integrity constraints, and propose a proba-
        bilistic version of them. A semantics in the style of the distribution se-
        mantics is adopted, where each integrity constraint is annotated with a
        probability. These probabilistic constraint logic models assign a probabil-
        ity of being positive to interpretations. This probability can be computed
        in a time that is logarithmic in the number of ground instantiations of
        violated constraints. This formalism can be used as the target language
        in learning systems and for declaratively specifying the behavior of a
        system. In the latter case, inference corresponds to computing the prob-
        ability of compliance of a system’s behavior to the model.


1     Introduction
Probabilistic logic models are gaining popularity due to their successful appli-
cation in a variety of fields, such as natural language processing, information
extraction, bioinformatics, semantic web, robotics and computer vision.
    However, these models usually require expensive inference procedures. Very
recently the problem of identifying tractable languages has come to the fore.
Proposals such as Tractable Markov Logic [9], Tractable Probabilistic Knowl-
edge Bases [25,17] and fragments of probabilistic logics [24,16] strive to achieve
tractability by limiting the form of sentences.
    In the ILP field, the learning from interpretation setting [8,2,7] offers ad-
vantages in terms of tractability with respect to the learning from entailment
setting. In learning from interpretations, the logic theories are sets of integrity
constraints and the examples are interpretations. The coverage problem there
consists in verifying whether the constraints are satisfied in the interpretations.
This problem is simpler than checking whether an atom follows from a logic pro-
gram because the constraints can be considered in isolation: the interpretation
satisfies the constraints iff it satisfies all of them individually. A first attempt
16

to this problem was presented in [11,10] where authors described the algorithm
LFI-ProbLog for learning ProbLog programs from partial interpretations.
    Our aim is to consider a probabilistic version of sets of integrity constraints
with a semantics in the style of the distribution semantics [23]. Each integrity
constraint is annotated with a probability and a model assigns a probability of
being positive to interpretations. This probability can be computed in a time that
is logarithmic in the number of groundings of the constraints that are violated.
    The formalism we propose, Probabilistic Constraint Logic Theories (PCLT),
has a variety of applications. It can be used as the target language of a learning
system, thus lifting the learning from interpretations ILP setting to the proba-
bilistic case.
    It is also useful for system verification or the problem of checking whether a
system’s behaviour is compliant to a specification [14].
    The system’s specification can be given in a number of ways. Specifications
based on logic are appropriate to many applications since they provide a non-
ambiguous semantics. Additionally, Computational Logic frameworks come with
operational semantics with formal correctness properties, which can be used for
verification; such a feature has motivated the mapping of heterogeneous speci-
fication formalisms onto ones based on computational logic [15]. For instance,
the SCIFF framework [1], applied to verification of multi-agent systems, medical
guidelines, electronic commerce, is composed of a language for specification with
an abductive logic programming semantics, and a sound and complete proof
procedure, so that behaviours found correct by the proof procedure are indeed
correct according to the declarative semantics, and vice-versa.
    The essence of many logic-based specification languages (including SCIFF) is
to provide a form of integrity constraints: logic formulas that represent protocols,
rules or guidelines, and are required to be satisfied by the system’s behaviour.
    A binary notion of compliance (where the behaviour is compliant if it satisfies
all the integrity constraints, and simply non-compliant otherwise) may not be
suited to applications where some amount of non-compliance is inevitable, and
quantifying non-compliance is more important than just detecting it. One form of
flexibility is to allow the designer to specify the integrity constraints as optional,
and to express how important each constraint is. PCTL allows precisely this.
    The paper is organized as follows. In Sect. 2, we recall the notion of Constraint
Logic Theory, and in Sect. 3 we define PCLT. In Sect. 4 we show how to compute
the probability of compliance given the compliance to each integrity constraint.
Sect. 5 discusses PCLTs in more detail while Sect. 6 introduces an extended
version of CLTs. Sect. 7 discusses related work. Finally, Sect. 8 concludes the
paper with some remarks on future work.


2    Constraint Logic Theories

A Constraint Logic Theory (CLT) [7] T is a set of integrity constraints (ICs) C
of the form
                         L1 , . . . , Lb → A1 ; . . . ; Ah                   (1)
                                                                                      17

where the Li s are logical literals. Their conjunction L1 , . . . , Lb is called the body
of the IC and is indicated with Body(C). The Aj are logical atoms, where the
semicolon stands for disjunction, thus A1 ; . . . ; Ah is a disjunction of atoms called
the head of the IC and indicated with Head(C).
    Together with a CLT T , we may have a background knowledge B on the
domain which is a normal logic program that can be used to represent domain-
specific knowledge.
    CLTs can be used to classify Herbrand interpretations, i.e., sets of ground
facts, that represent for example the behaviour of the system undergoing ver-
ification. Given a Herbrand interpretation I (in the following it will be called
simply interpretation), the given background knowledge B is used to complete
the information in I. Basically, instead of simply considering I, we consider a
model M (B ∪ I) which follows the Prolog semantics (i.e. Clark completion [4])
where I is interpreted as a set of ground facts. In this way, all the facts of I
are true in M (B ∪ I), moreover M (B ∪ I) can contain new facts derived from I
using B.
    Given an interpretation I, a background knowledge B and a CLT T we can
ask whether T is true in I given B. Formally, an IC C is true in an interpre-
tation I given a background knowledge B, written M (B ∪ I) |= C, if for every
substitution θ for which Body(C) is true in M (B ∪ I), there exists a disjunct
in Head(C) that is true in M (B ∪ I). If M (B ∪ I) |= C we say that I satisfies
the constraint C given B; if M (B ∪ I) 6|= C we say that I does not satisfy C.
If every IC of a CLT T is true in it, then T is true in an interpretation I given
B and we write M (B ∪ I) |= T . We also say I satisfies T given B or that I is
positive given T and B.
    If all the variables that appear in the head also appear in the body, then the
logical clause is range-restricted. As shown in [5], the truth of a range-restricted
IC in an interpretation I with range-restricted background knowledge B can be
tested by asking the goal

                              ? − Body(C), ¬Head(C).

against a Prolog database containing the atoms of I as facts together with the
rules of the normal program B. By ¬Head(C) we mean ¬A1 , . . . , ¬Ah so the
query is
                        ? − L1 , . . . , Lb , ¬A1 , . . . , ¬Ah .          (2)
If the query fails, C is true in I given B, otherwise C is false in I given B. If B is
range-restricted, every answer to a query Q against B ∪I completely instantiates
Q, i.e., it produces an element of M (B ∪ I). So the queries ¬Aj are ground when
they are called and no floundering occurs.

Example 1 (Bongard Problems). Introduced by the Russian scientist M. Bon-
gard in his book [3], the Bongard Problems consist of a number of pictures, some
positive and some negative, and is usually used in learning problems aimed at
learning a description which correctly classify the most figures, i.e., at discrimi-
nating between the two classes.
18

    The pictures contain different shapes with different properties, such as small,
large, pointing down, . . . and different relationships between them, such as in-
side, above, . . . Figure 1 shows some of these pictures.
    Each picture can be described by an interpretation. Consider the left picture.
It consists of a large triangle that includes a small square that, in turn, includes
a small triangle. This picture can be described using the interpretation

          Il = {triangle(0), large(0), square(1), small(1), inside(1, 0),
               triangle(2), inside(2, 1)}

Moreover, suppose you are given the background knowledge B:


                             Fig. 1. Bongard pictures.


                       in(A, B) ← inside(A, B).
                       in(A, D) ← inside(A, C), in(C, D).

Thus M (B ∪ Il ) will contain the atoms in(1, 0), in(2, 1) and in(2, 0). Given the
IC
                 C1 = triangle(T ), square(S), in(T, S) → false
stating that a figure satisfying the IC cannot contain a triangle inside a square,
C1 is false in Il given B because triangle 2 is inside square 1.
    In the central picture instead C is true given B because the only triangle is
outside any square, while in the rightmost picture C1 is false again because of
the presence of triangles 3, 4 and 5 inside square 0.


3    Probabilistic Constraint Logic Programming
A Probabilistic Constraint Logic Theory (PCLT) is a set of probabilistic integrity
constraints (PICs) of the form

                         pi :: L1 , . . . , Lb → A1 ; . . . ; Ah                (3)
                                                                                        19

Each constraint Ci is associated with a real value in [0, 1] which defines its
probability. A PCLT T is sometimes defined also as a set {(C1 , p1 ), . . . , (Cn , pn )}.
A PCLT T defines a probability distribution on ground constraint logic theories
called worlds in this way: for each grounding of each IC, we include the grounding
in a world with probability pi and we assume all groundings to be independent.
The notion of world as a theory is similar to the notion of world in ProbLog [6]
where a world is a normal logic program. Let us assume that constraint Ci has
ni groundings called Ci1 , . . . , Cini . Let us call the ICs Cij instantiations of Ci .
Thus, the probability of a world w is given by the product:
                                    m Y
                                    Y                  Y
                          P (w) =                pi             (1 − pi )
                                    i=1 Cij ∈w        Cij 6∈w


where m is the number of PICs. P (w) so defined is a probability distribution
over the set of worlds W . The probability P (⊕|w, I) of the positive class given
an interpretation I, a background knowledge B and a world w is defined as the
probability that I satisfies w given B 3 . Its value is P (⊕|w, I) = 1 if M (B ∪
I) |= w and 0 otherwise. The probability P (⊕|I) of the positive class given an
interpretation I and a background B is the probability of I satisfying a PCLT
T given B. From now on we always assume B as given and we do not mention
it again. P (⊕|I) is given by
             X                 X                              X
  P (⊕|I) =       P (⊕, w|I) =     P (⊕|w, I)P (w|I) =                 P (w) (4)
              w∈W                w∈W                                   w∈W,M (B∪I)|=w


The probability P ( |I) of the negative class given an interpretation I is the
probability of I not satisfying T and is given by 1 − P (⊕|I).

Example 2 (Example 1 continued). Consider the PCLT

              {C1 = 0.5 :: triangle(T ), square(S), in(T, S) → false}

In the left picture of Figure 1, considering the interpretation

           Il = {triangle(0), large(0), square(1), small(1), inside(1, 0),
                 triangle(2), inside(2, 1)}

There are two different instantiations for the IC C1 :

                                C11 = (C1 , {T /0, S/1})
                                C12 = (C1 , {T /2, S/1})

Under Il there are thus four possible worlds

                             {∅, {C11 }, {C12 }, {C11 , C12 }}
3
    B is omitted from the formula for the sake of brevity.
20

and for the first two of them M (B ∪ Il ) |= wi , thus P (⊕|Il ) = P (w1 ) + P (w2 ) =
0.25 + 0.25 = 0.5. In the central picture there are four different instantiations for
C1 , thus we can build 16 worlds. The interpretation Ic is verified in all of them
since the constraint is never violated irrespective of the instantiation, thus the
probability is P (⊕|Ic ) = 1. Finally, the third figure has 8 different instantiations
for IC C1 and so 256 different worlds. Only 32 worlds satisfy the interpretation
Ir , and the probability is P (⊕|Ir ) = 0.125.


4    Inference with Probabilistic Constraint Logic Theories
Computing P (⊕|I) with Formula (4) is impractical as there is an exponential
number of worlds. In fact, as seen in Example 2, the number of worlds is expo-
nential in the number of instantiations because, in order to build each world, we
must decide whether to include each instantiation of every IC in the world.
    Practically, we can associate a Boolean random variable Xij to each instanti-
ated constraint Cij : if Cij is included in the world Xij takes on value 1. Moreover,
P (Xij ) = P (Cij ) = pi and P (Xij ) = 1 − P (Cij ) = 1 − pi . Let X be the set of the
Xij variables. These variables are all mutually independent. A valuation ν is an
assignment of a truth value to all variables in X. There is clearly a one to one
correspondence between worlds and valuations. A valuation can be represented
as a set containing Xij (if Cij is included in the corresponding world) or Xij (if
Cij is not included in the corresponding world) for each Xij , and corresponds
to the Boolean formula φν :
                                    m ^
                                    ^                   ^
                            φν =                 Xij             Xij .
                                    i=1 Xij ∈ν         Xij ∈ν

Since all the Xij variables are independent, the probability of φν being true is
                                    m Y
                                    Y                  Y
                        P (φν ) =                pi             (1 − pi ).
                                    i=1 Cij ∈w        Cij 6∈w

As seen above, we can assign to each world w a valuation νw of X in this way:
Xij ∈ νw iff Cij ∈ w and Xij ∈ νw iff Cij 6∈ w.
Suppose a ground IC Cij is violated in I. The worlds where Xij holds in the
respective valuation are thus excluded from the summation in Formula (4). We
must keep only the worlds where Xij holds in the respective valuation for all
ground constraints Cij violated in I. So I satisfies all the worlds where the
formula
                                 ^m      ^
                            φ=                  Xij
                                     i=1 M (B∪I)6|=Cij

is true in the respective valuations, so
                                                   m
                                                   Y
                          P (⊕|I) = P (φ) =              (1 − pi )ni               (5)
                                                   i=1
                                                                                  21

where ni is the number of instantiations of Ci that are not satisfied in I, since the
random variables are all mutually independent. Since computing ab is O(log b),
P (⊕|I) can be computed in a time that is logarithmic in the number of ground-
ings of constraints that are violated.
    Each constraint may have a different number of violated groundings, which
may result in a larger weight associated with constraints with many groundings.
However, the parameters should be learned from data in order to maximize the
likelihood, so the parameters should be adjusted to take into account the number
of groundings.


Example 3 (Example 2 continued). Consider the PCLT of Example 2. In the left
picture of Figure 1 the body of C1 is true for the single substitution T /2 and
S/1 thus n1 = 1 and P (⊕|Il ) = 0.5. In the right picture of Figure 1 the body of
C1 is true for three couples (triangle, square) thus n1 = 3 and P (⊕|Ir ) = 0.125.
These results clearly correspond to those seen in Example 2.


5   Discussion


PCLT can be seen as defining a conditional probability distribution over a ran-
dom variable C representing the class, positive (+) or negative (-), given the
value of the random variables A1 , . . . , An representing the Herbrand base.
    In other words, we do not want to model the dependence among atoms of the
Herbrand base but only the conditional dependence of the class given the value
of the atoms, i.e., given an interpretation. Our aim is to build a discriminative
model, rather than a generative model, similarly to what is done with conditional
random fields [12] that focus on the relationship between class variables and input
variables and do not model the relationship among input variables.
    A PCLT defines a Bayesian network of the form shown in Figure 2, with the
variables associated to ground atoms that are all parents of the class variable.
This model differs from a naive Bayes model because there the input variables
(ground atoms) are all children of the class variable. This is a significant differ-
ence because the model in Figure 2 can have up to 2n parameters if n is the
number of ground atoms.
    The assumption of independence of the constraints may seem restrictive but
PCLT can model any conditional probabilistic relationship between the class
variable and the ground atoms. For example, suppose you want to model a
general conditional dependence between the class atom and a Herbrand base
containing two atoms: a and b. This dependence can be represented with the
Bayesian network of Figure 3, where the conditional probability table (CPT)
has four parameters, p1 , . . . , p4 , so it is the most general. Let us call P 0 the
distribution defined by this network.
22


                        H
                                 A1           ...                  An


                                               C


Fig. 2. Bayesian Network representing the dependence between the class of an inter-
pretation and the Herbrand base H.

                                                    P 0 (C|a, b)             C
             a              b                       a         b       −          +
                                                    0         0     1 − p1       p1
                                                    0         1     1 − p2       p2
                                                    1         0     1 − p3       p3
                   C                                1         1     1 − p4       p4


     Fig. 3. Bayesian Network representing the dependence between class C and a, b.


      This model can be represented with the following PCLT

                                C1 = 1 − p1 :: ¬a, ¬b → false                         (6)
                                C2 = 1 − p2 :: ¬a, b → false                          (7)
                                C3 = 1 − p3 :: a, ¬b → false                          (8)
                                C4 = 1 − p4 :: a, b → false                           (9)

In fact, consider the interpretation {} that assigns value false to each atom of
the Herbrand base. The probability that the class variable assumes value + is

                        P (C = +|¬a, ¬b) = 1 − (1 − p1 ) = p1

since only constraint C1 is violated, so P (C = +|¬a, ¬b) = P 0 (C = +|¬a, ¬b).
Similarly we can show that for the other possible interpretations, the probability
assigned to the positive class by the above PCLT coincide with the one assigned
by the Bayesian network of Figure 3.
    Modeling the dependence between C and a, b with the above PCLT is equiva-
lent to representing the Bayesian network of Figure 3 with the Bayesian network
of Figure 4, where a Boolean variable Xi represents whether constraint Ci is
included in the world (i.e., if it is enforced) and a Boolean variable Yi whether
constraint Ci is violated. Let us call P 00 the distribution defined by this network.
The conditional probability tables for nodes Xi s are P 00 (Xi = 1) = 1 − pi , those
                                                                                                  23


       X1               X2                 X3                X4           a                b


       Y1               Y2                 Y3                 Y4


                                 C


Fig. 4. Bayesian Network modeling the distribution P 00 over C, a, b, X1 , . . . X4 ,
Y1 , . . . Y4 .


for nodes Yi s encode the deterministic functions
                                        Y1 = X1 ∧ ¬a ∧ ¬b
                                        Y2 = X2 ∧ ¬a ∧ b
                                        Y3 = X3 ∧ a ∧ ¬b
                                        Y4 = X4 ∧ a ∧ b
and that for C encodes the deterministic function
                                 C = ¬Y1 ∧ ¬Y2 ∧ ¬Y3 ∧ ¬Y4
where C is interpreted as a Boolean variable with 1 corresponding to + and 0
to -. If we want to compute P 00 (C|¬a, ¬b) we get
                 X
 P 00 (C|¬a¬b) =     P 00 (X1 ) . . . P 00 (X4 )P 00 (Y1 |X1 , ¬a, ¬b) . . . P 00 (Y4 |X4 , ¬a, ¬b)
                    Y,X
                     00
                   P (C|Y1 , Y2 , Y3 , Y4 ) =
                           X
                 = p1                     P 00 (X2 ) . . . P 00 (X4 )P 00 (C|Y1 = 0, Y2 , Y3 , Y4 )
                      X2 ,X3 ,X4 ,Y2 ,Y3 ,Y4

                    P (Y2 |X2 , ¬a, ¬b) . . . P 00 (Y4 |X4 , ¬a, ¬b) =
                     00
                             X
                 = p1                P 00 (X2 ) . . . P 00 (X4 )
                       X2 ,X3 ,X4
                      00
                    P (C|Y1 = 0, Y2 = 0, Y3 = 0, Y4 = 0)
                    P 00 (Y2 = 0|X2 , ¬a, ¬b) . . . P 00 (Y4 = 0|X4 , ¬a, ¬b) =
                             X
                 = p1                P 00 (X2 ) . . . P 00 (X4 ) =
                        X2 ,X3 ,X4
                 = p1
24

where X = {X1 , . . . , X4 } and Y = {Y1 , . . . , Y4 }. Similarly, it is possible to show
that P and P 00 coincide for the other possible interpretations. If we look at the
network in Figure 4 we see that the X variables are mutually unconditionally
independent, showing that it is possible to represent any conditional depen-
dence of C from the Herbrand base by using independent random variables. Of
course, not assuming independence may result in a finer modeling of the domain.
However, this would preclude PCLTs’ nice computational properties. Achieving
tractability requires approximations and we think that constraint independence
is a reasonable assumption, similar to the independence among probabilistic
choices in the distribution semantics for PLP.

   Moreover, PCLT can compactly encode the dependence because they can
take advantage of context specific independences [19]. For example, in the CPT
in Table 1 the probability of C = + does not depend on b when a is true. This


                               P 0 (C|a, b)            C
                               a         b      −          +
                               0         0    1 − p1       p1
                               0         1    1 − p2       p2
                               1         0    1 − p3       p3
                               1         1    1 − p3       p3

                Table 1. A CPT with context specific independence.


dependence can be encoded with


                            C1 = 1 − p1 :: ¬a, ¬b → false
                            C2 = 1 − p2 :: ¬a, b → false
                            C3 = 1 − p3 :: a → false


PCLT are also related to Markov Logic Networks (MLNs) [20]: similarly to
MLNs, PCLT encode constraints on the possible interpretations and the proba-
bility of an interpretation depends on the number of violated constraints. How-
ever, MLNs encode the joint distribution of the ground atoms and the class,
while we concentrate on the conditional distribution of the class given the ground
atoms. Given a PCLT, it is possible to obtain an equivalent MLN. For example,
                                                                                              25

the MLN equivalent to the PCLT (6)-(9) is

                                  ln(1 − p1 ) ¬a ∧ ¬b ∧ ¬C
                                       ln(p1 ) ¬a ∧ ¬b ∧ C
                                  ln(1 − p2 ) ¬a ∧ b ∧ ¬C
                                       ln(p2 ) ¬a ∧ b ∧ C
                                  ln(1 − p3 ) a ∧ ¬b ∧ ¬C
                                       ln(p3 ) a ∧ ¬b ∧ C
                                  ln(1 − p4 ) a ∧ b ∧ ¬C
                                       ln(p4 ) a ∧ b ∧ C

where C is an atom representing the class. If we compute the conditional prob-
ability of C given an interpretation I, we get the same results of the PCLT. In
fact, consider the empty interpretation and call P 000 the distribution defined by
the MLN. We get

               P 000 (C = +|a, b) = P 000 (C = +, a, b)/P 000 (a, b) =
                                            eln(p1 )
                                               Z                      eln(p1 )
                                   =   eln(1−p1 ) +eln p1
                                                            =                       =
                                                                eln(1−p1 ) + eln p1
                                               Z
                                            p1
                                   =                   = p1
                                       1 − p1 + p1
where Z is the partition function. Similarly for the other interpretations. So
PCLT are a specialization of MLNs that, by focusing on a simpler problem,
allow better performance of inference algorithms.


6     Extensions of CLTs
Integrity constraints can be extended to logical formulas of the form
    L1 , . . . , Lb → ∃ (ConjP1 ); . . . ; ∃ (ConjPn ); ∀ ¬(ConjN1 ); . . . ; ∀ ¬(ConjNm )   (10)

where the Li s are logical literals and their conjunction L1 , . . . , Lb represents the
body of the IC (as in Section 2), while ConjPi (i = 1, . . . , n) and ConjNj (j =
1, . . . , m) are conjunctions of literals of the form A1 , . . . , Ak [13]. The semicolon
stands for disjunction, thus ∃(ConjP1 ); . . . ; ∃(ConjPn ); ∀¬(ConjN1 ); . . . ;
∀¬(ConjNm ) is a disjunction of conjunctions of literals.
     We will use Body(C) to indicate the body of the IC and Head(C) to in-
dicate the formula ∃(ConjP1 ); . . . ; ∃(ConjPn ); ∀¬(ConjN1 ); . . . ; ∀¬(ConjNm )
and call them respectively the body and the head of C. All the formulas ConjPj
in Head(C) can also be referred to as P disjuncts and all the formulas ConjNj
in Head(C) as N disjuncts.
     An IC C is true in an interpretation I given a background knowledge B,
written M (B ∪ I) |= C, if for every substitution θ for which Body(C) is true in
M (B ∪ I), there exists a disjunct in Head(C) that is true in M (B ∪ I).
26

    Variables in the body are implicitly universally quantified with scope the
entire formula; the quantifiers in the head apply to all the variables not appearing
in the body.
    The truth of an extended IC C in an interpretation I can be tested by running
the query ? − Body(C), ¬ConjP1 , . . . , ¬ConjPn , ConjN1 , . . . , ConjNm . against
a Prolog database containing the clauses of B and the atoms of I as facts. If B
is range-restricted, every answer to an atomic query Q against B ∪ I completely
instantiates Q, i.e., it produces an element of M (B ∪ I). If the query finitely fails
the IC is true in I. If the query succeeds, the IC is false in I.
    This language extends clausal logic by allowing more complex formulas as
disjuncts in the head of clauses. The ICs are more expressive than logical clauses,
as can be seen from the query used to test them: for ICs we have the negation
of conjunctions, while for clauses we have only the negation of atoms.


7    Related Work


The approach presented here refers to the distribution semantics [23]: a proba-
bilistic theory defines a distribution over non-probabilistic theories by assuming
independence among the choices in probabilistic constructs. The distribution se-
mantics has emerged as one of the most successful approaches in Probabilistic
Logic Programming and underlies many languages such as Probabilistic Horn
Abduction, Independent Choice Logic, PRISM, Logic Programs with Annotated
Disjunctions and ProbLog.
    In the distribution semantics, the aim is to compute the probability that a
ground atom is true. However, performing such inference requires an expensive
procedure that is usually based on knowledge compilation. For example, ProbLog
[6] and PITA [21,22] build a Boolean formula and compile it into a Binary
Decision Diagram from which the computation of the probability is linear in the
size of the diagram. However, the compilation procedure is #P in the number of
variables. On the contrary, computing the probability of the positive class given
an interpretation in a PCLT is logarithmic in the number of variables. This
places PCLTs in the recent line of research committed to identifying tractable
probabilistic languages.
    In addition, a probabilistic program in one of the languages under the distri-
bution semantics defines a probability distribution over normal logic programs
called worlds. The distribution is extended to queries and the probability of a
query is obtained by marginalizing the joint distribution of the query and the
programs. Instead, PCLTs define a conditional probability distribution over a
random variable C representing the class, given the value of a set of atoms (an
interpretation).
                                                                                       27

8    Conclusions
We have proposed a probabilistic extension of constraint logic theories for which
the computation of the probability of an interpretation being positive is loga-
rithmic in the number of falsified constraints.
    In the future we are going to develop a system for learning such probabilistic
integrity constraints. A possible way is to exploit Limited-memory BFGS (L-
BFGS) [18] for tuning the parameters and constraint refinements for finding good
structures. L-BFGS is an optimization algorithm in the family of quasi-Newton
methods that approximates the BroydenFletcherGoldfarbShanno (BFGS) algo-
rithm using a limited amount of computer memory.

Acknowledgement This work was supported by the “GNCS-INdAM”.


References
 1. Alberti, M., Chesani, F., Gavanelli, M., Lamma, E., Mello, P., Torroni, P.: Ver-
    ifiable agent interaction in abductive logic programming: the S CIFF framework.
    ACM T. Comput. Log. 9(4) (2008), iF: 2.766
 2. Blockeel, H., De Raedt, L., Jacobs, N., Demoen, B.: Scaling up inductive logic
    programming by learning from interpretations. Data Min. Knowl. Discov. 3(1),
    59–93 (1999)
 3. Bongard, M.M.: Pattern Recognition. Hayden Book Co., Spartan Books (1970)
 4. Clark, K.L.: Negation as failure. In: Logic and Data Bases. pp. 293–322 (1977)
 5. De Raedt, L., Dehaspe, L.: Clausal discovery. Machine Learning 26(2-3), 99–146
    (1997)
 6. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and its
    application in link discovery. In: 20th International Joint Conference on Artificial
    Intelligence, Hyderabad, India (IJCAI-05). vol. 7, pp. 2462–2467. AAAI Press, Palo
    Alto, California USA (2007)
 7. De Raedt, L., Van Laer, W.: Inductive constraint logic. In: Proceedings of the
    6th Conference on Algorithmic Learning Theory (ALT 1995). LNAI, vol. 997, pp.
    80–94. Springer, Fukuoka, Japan (1995)
 8. De Raedt, L., Dzeroski, S.: First-order jk-clausal theories are pac-learnable. Artif.
    Intell. 70(1-2), 375–392 (1994)
 9. Domingos, P., Webb, W.A.: A tractable first-order probabilistic logic. In: Hoff-
    mann, J., Selman, B. (eds.) 26th National Conference on Artificial Intelligence,
    AAAI’12, Toronto, Ontario, Canada. AAAI Press (2012)
10. Fierens, D., den Broeck, G.V., Renkens, J., Shterionov, D.S., Gutmann, B., Thon,
    I., Janssens, G., De Raedt, L.: Inference and learning in probabilistic logic programs
    using weighted boolean formulas. Theor. Pract. Log. Prog. 15(3), 358–401 (2015)
11. Gutmann, B., Thon, I., De Raedt, L.: Learning the parameters of probabilistic
    logic programs from interpretations. In: Gunopulos, D., Hofmann, T., Malerba, D.,
    Vazirgiannis, M. (eds.) European Conference on Machine Learning and Knowledge
    Discovery in Databases. LNCS, vol. 6911, pp. 581–596. Springer (2011)
12. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic
    models for segmenting and labeling sequence data. In: 18th International Confer-
    ence on Machine Learning. vol. 1, pp. 282–289 (2001)
28

13. Lamma, E., Mello, P., Riguzzi, F., Storari, S.: Applying inductive logic pro-
    gramming to process mining. In: Proceedings of the 17th International Con-
    ference on Inductive Logic Programming, ILP 2007. pp. 132–146. No. 4894 in
    Lecture Notes in Artificial Intelligence, Springer, Heidelberg, Germany (2008),
    http://dx.doi.org/10.1007/978-3-540-78469-2_16
14. Ly, L.T., Maggi, F.M., Montali, M., Rinderle-Ma, S., van der Aalst, W.M.: Com-
    pliance monitoring in business processes: Functionalities, application, and tool-
    support. Inform. Syst. 54, 209 – 234 (2015)
15. Montali, M.: Specification and Verification of Declarative Open Interaction Models:
    a Logic-Based Approach, LNBIP, vol. 56. Springer (2010)
16. Niepert, M., den Broeck, G.V.: Tractability through exchangeability: A new per-
    spective on efficient probabilistic inference. In: Brodley, C.E., Stone, P. (eds.) 28th
    National Conference on Artificial Intelligence, AAAI’14, Québec City, Québec,
    Canada. pp. 2467–2475. AAAI Press (2014)
17. Niepert, M., Domingos, P.: Tractable probabilistic knowledge bases: Wikipedia and
    beyond. In: AAAI-14 Workshop on Statistical Relational Artificial Intelligence.
    AAAI Workshops, vol. WS-14-13. AAAI Press (2014)
18. Nocedal, J.: Updating quasi-newton matrices with limited storage. Mathematics
    of Computation 35(151), 773–782 (1980)
19. Poole, D., Zhang, N.L.: Exploiting contextual independence in probabilistic infer-
    ence. J. Artif. Intell. Res. 18, 263–313 (2003)
20. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1-2), 107–
    136 (2006)
21. Riguzzi, F., Swift, T.: The PITA system: Tabling and answer subsumption for
    reasoning under uncertainty. Theor. Pract. Log. Prog. 11(4–5), 433–449 (2011)
22. Riguzzi, F., Swift, T.: Well-definedness and efficient inference for probabilistic logic
    programming under the distribution semantics. Theor. Pract. Log. Prog. 13(Special
    Issue 02 - 25th Annual GULP Conference), 279–302 (2013)
23. Sato, T.: A statistical learning method for logic programs with distribution seman-
    tics. In: Sterling, L. (ed.) 12th International Conference on Logic Programming,
    Tokyo, Japan. pp. 715–729. MIT Press, Cambridge, Massachusetts (1995)
24. Van den Broeck, G.: On the completeness of first-order knowledge compilation
    for lifted probabilistic inference. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L.,
    Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Pro-
    cessing Systems 24: 25th Annual Conference on Neural Information Processing
    Systems. pp. 1386–1394 (2011)
25. Webb, W.A., Domingos, P.: Tractable probabilistic knowledge bases with existence
    uncertainty. In: AAAI-13 Workshop on Statistical Relational Artificial Intelligence.
    AAAI Workshops, vol. WS-13-16. AAAI Press (2013)