Learning Disjunctive Logic Programs from
                   Interpretation Transition

            Yi Huang1,3 , Yisong Wang1,? , Ying Zhang2 , and Mingyi Zhang2
                      1
                        Guizhou University, Guiyang 550025, P. R. China
                 2
                  Guizhou Academy of sciences, Guiyang 550001, P. R. China
          3
            Chongqing University of Arts and Sciences, Chongqing 402160, P. R. China


        Abstract. We present a new framework for learning disjunctive logic programs
        from interpretation transitions, called LFDT. It is a nontrivial extension to Inoue,
        Ribeiro and Sakama’s LF1T learning framework, which learns normal logic pro-
        grams from interpretation transitions. Two resolutions for disjunctive rules are
        also presented and used in LFDT to simplify learned disjunctive rules.


1     Introduction

In machine learning and specifically inductive logic programming [1, 2], it is an im-
portant task to learn the dynamics of complex systems, such as Boolean networks. A
Boolean network consist of a set of Boolean variables each of which has a Boolean
function. It is a simple yet a powerful mathematical tool to describe dynamics of com-
plex systems [3, 4].
    Since a seminal work on representing Boolean networks by logic programs [5],
there is increasing interest in approaching the task from the perspective of logic pro-
gramming [6–8]. In particular, Inoue, Ribeiro and Sakama proposed a novel frame-
work, named LF1T, for learning normal logic programs from interpretation transitions
that are pairs hI, Ji of interpretations with TP (I) = J, where TP is the immediate
consequence operator for a normal logic program P [6].
    It is well-known that disjunctive logic programs [9, 10] are substantially more ex-
pressive than normal logic programs at many aspects. To extend LF1T for learning
disjunctive logic programs from interpretation transitions, we present a new immediate
consequence operator TPd for a disjunctive logic program P . Informally, TPd (I) consists
of all minimal models of the heads of rules in P whose bodies are satisfied by I.
    Comparing with LF1T framework for learning normal logic programs, a nontrivial
work in the paper is to handle with nondeterministic interpretation transitions, i.e., it
is possible there are two interpretation transitions hI, Ji and hI, J 0 i in an observation
with J 6= J 0 . Combining with two new proposed resolutions for disjunctive rules, we
achieve the new framework for learning disjunctive logic programs from interpretation
transitions, called LFDT. It is proved being both sound and complete.
?
    Corresponding author, yswang@gzu.edu.cn.


                                               81
2    Disjunctive Logic Programs
We assume a underlying first-order language without proper function symbols L and
denote its Herbrand base (the set of all ground atoms) by A. We assume A is finite for
our learning purpose.
    A (disjunctive) logic program is a finite set of (disjunctive) rules of the form

                  A1 ∨ · · · ∨ Ak ← B1 ∧ · · · ∧ Bm ∧ ¬C1 ∧ · · · ∧ ¬Cn ,                        (1)

where k ≥ 1, m ≥ 0, n ≥ 0 and Ai (1 ≤ i ≤ k), Bj (1 ≤ j ≤ m), and Cs (1 ≤ s ≤ n)
are atoms of L.
    For a rule r of the form (1), the head of r, written hd(r), is A1 ∨ · · · ∨ Ak ; the
body of r, written bd(r), is the conjunction B1 ∧ · · · ∧ Bm ∧ ¬C1 ∧ · · · ∧ ¬Cn ; the
atoms occurring in the body of r positively (resp. negatively) is denoted by bd+ (r) =
{B1 , . . . , Bm } (resp. bd− (r) = {C1 , . . . , Cn }). If k = 1 then r is normal. A nor-
mal logic S   program is a finite set of normal
                                          S       rules. Given a logic program P , we denote
hd(P ) = r∈P hd(r) and bd(P ) = r∈P bd(r). For convenience, we also write hd(r)
as the set {A1 , . . . , Ak }, and bd(r) as the set {B1 , . . . , Bm , ¬C1 , . . . , ¬Cn } when there
is no confusion. In this sense, a rule r of the form (1) can be alternatively written as

                     {A1 , . . . , Ak } ← {B1 , . . . , Bm , ¬C1 , . . . , ¬Cn }.                (2)

   A substitution θ is a function from variables to terms, which is written in the fol-
lowing form
                                 {x1 /t1 , . . . , xn /tn }                         (3)
where xi s (1 ≤ i ≤ n) are pair-wise distinct variables and ti s (1 ≤ i ≤ n) are terms
(of the language L). The application eθ of θ to an expression e is obtained from e by
simultaneously replacing all occurrences of each variable xi in e with the same term
ti , and eθ is called an instance of e [11]. The substitution θ is ground if ti contains no
variables for every xi /ti in θ. If the instance eθ of e contains no variable then it is a
ground
S        instance of e. The ground of a logic program P , written ground(P ), is the set
    r∈P ground(r),  where

                      ground(r) = {rθ | rθ is a ground instance of r}.                           (4)

    For simplicity, we assume logic programs are always ground in the following of the
paper, unless explicitly stated otherwise.
    Let r1 , r2 be two rules. We say that r1 subsumes r2 , written r1  r2 , if there exists a
substitution θ such that hd(r1 )θ ⊆ hd(r2 ) and bd(r1 )θ ⊆ bd(r2 ). In this sense, we say
that r1 is more (or equally) general than r2 , and r2 is less (or equally) general than r1 .
By r1 ≺ r2 we mean r1 subsumes r2 but r2 does not subsume r1 . For a logic program
P , we denote SR(P ) the logic program obtained from P by removing all rules that are
properly subsumed by some other rules in P , i.e.,

                          SR(P ) = {r ∈ P |6 ∃r0 ∈ P s.t. r0 ≺ r}.                               (5)

    A (Herbrand) interpretation I is a set of ground atoms. An interpretation I satisfies
a ground rule r if I satisfies bd(r) implies I satisfies hd(r). It satisfies a rule r if I


                                                82
satisfies every ground rule in ground(r). It satisfies a logic program P if it satisfies
every rule in the logic program P . In this case we call I a model of a (ground) rule
(resp., logic program). In the following we use “|=” to denote the satisfaction relation,
and “≡” to denote the classical equivalence relation.
    A rule r is applicable w.r.t. an interpretation I if I |= bd(r). Let P be a logic
program. We denote app(P, I) the set of rules in P that are applicable w.r.t. I.

Definition 1. Let P be a disjunctive logic program. The immediate consequence oper-
                  A
ator TPd : 2A → 22 is defined as follows, for I ⊆ A,

    TPd (I) = {S|S is a minimal (under set inclusion) model of hd(app(P, I))}.          (6)

    Please note that the operator TPd is a generalization to the operator TP for normal
logic programs [12], and it is similar to the operator TPnd for logic programs with ab-
stract constraint atoms [13].


3   Two resolutions

To extend the learning algorithm for normal logic programs in [6] to disjunctive logic
programs, we extend its ground resolution for disjunctive rules and present a combined
resolution. Recall that a literal l is either an atom or its classical negation. The comple-
ment of l, written l, is defined as A = ¬A and ¬A = A where A is an atom. For a set
S of atoms, we denote ¬S = {¬A|A ∈ S}.

Definition 2 (disjunctively ground resolution). Let r and r0 be two ground rules. The
rule r is disjunctively ground resolvable w.r.t. r0 on a literal l whenever

(a) l ∈ bd(r) and l ∈ bd(r0 ),
(b) bd(r0 ) \ {l} ⊆ bd(r) \ {l}, and
(c) hd(r0 ) ⊆ hd(r).

The disjunctive ground resolvent of r w.r.t. r0 on l is the rule hd(r) ← bd(r) \ {l}. We
denote it by gr(r, r0 ). In particular, if the above condition (b) is strengthen to

                                bd(r0 ) \ {l} = bd(r) \ {l}                             (7)

then we say that r is disjunctively naive resolvable w.r.t. r0 on l.

     The following proposition shows that the disjunctive ground resolution preserves
the equivalence of the TPd operator in terms of TPd (I) = TPd 0 (I) for every I ⊆ A where
P 0 is obtained from P by adding some disjunctive ground resolvent.

Proposition 1. Let P be a ground logic program containing two disjunctive rules r
and r0 such that r is disjunctively ground resolvable w.r.t. r0 on a literal l, and Q =
P ∪ {gr(r, r0 )}. Then hd(app(P, I)) ≡ hd(app(Q, I)) for every I ⊆ A.


                                           83
Definition 3 (combined resolution). Let r1 , . . . , rk and r be the following rules:

                        r1 : hd(r1 ) ← bd+ ∪ ¬(bd− ∪ {b1 }),
                          ..
                           .
                        rk : hd(rk ) ← bd+ ∪ ¬(bd− ∪ {bk }),
                                                                             0
                         r : hd(r) ← bd+ ∪ {bi |1 ≤ i ≤ k} ∪ ¬bd−

such that

    – bd− ∩ {bi |1 ≤ i ≤ k} = ∅, and
    – hd(ri ) ⊆ hd(r) for every i (1 ≤ i ≤ k).

Then the combined resolvent of r, r1 , . . . , rk , written cr(r, r1 , . . . , rk ), is the rule
                                                                      0
                             r∗ : hd(r) ← bd+ ∪ ¬(bd− ∪ bd− ).                                     (8)

In this case we say that the rules r, r1 , . . . , rk are combined resolvable.

   The next proposition shows that, similar to the disjunctive ground resolution, the
combined resolution preserves the equivalence of the TPd operator as well.

Proposition 2. Let P be a logic program containing rules r, r1 , . . . , rk such that r,
r1 , . . ., rk are combined resolvable, and Q = P ∪ {cr(r, r1 , . . . , rk )}. It holds that
hd(app(P, I)) ≡ hd(app(Q, I)) for any I ⊆ A.


4     Learning from 1-step Transitions

In the section we present our inductive learning task for disjunctive logic programs and
its learning algorithm. Properties of the algorithm are investigated as well.


4.1    The Learning Task

A background theory is a logic program. An example (or observation) is a state tran-
sition (or interpretation transition), i.e., a tuple hI, Ji with I ⊆ A and J ⊆ A, which
means that the state J is a candidate successor of the state I in a Boolean network, or
J ∈ TPd (I) for a disjunctive logic program P . Let E be a set of examples. We denote
E i = {I | hI, Ji ∈ E}, E o = {J | hI, Ji ∈ S} and E(I) = {J | hI, Ji ∈ E} for
I ⊆ A. The set E is total whenever E i = 2A .

Definition 4 (the learning task). An inductive learning task from nondeterministic
state transitions is, given a background theory B and a set E of examples (state transi-
tions), to find a hypothesis (a logic program) H such that, for every example hI, Ji ∈ E,
       d
J ∈ TB∪H     (I) holds.


                                                84
    The above inductive learning task is written as ILT(B, E). Such a hypothesis H to
the inductive learning task is called a solution to ILT(B, E). For our learning purpose,
the given examples have to be restricted. For instance, let E = {h∅, ∅i, h∅, {p}i} and
                                                                         d
B = ∅. There will be no logic program H satisfying {∅, {p}} ⊆ TB∪H           (∅), since the
                         d
sets in the collection TB∪H (∅) are incomparable under set inclusion, while ∅ and {p}
are comparable under set inclusion.
    A set E of state transitions is coherent if J and J 0 are incomparable under set
inclusion for every hI, Ji and hI, J 0 i in E, i.e., J and J 0 are all minimal under set
inclusion. A set E of state transitions is consistent w.r.t. a logic program P , if for each
hI, Ji ∈ E, I |= bd(r) implies J |= hd(r) holds for every rule r in P .
    The following property identifies the sufficient and necessary condition for the ex-
istence of a solution to an inductive learning task.
Proposition 3. Given an inductive learning task ILT(B, E) where B is a background
theory and E is a set of observations, there exists a solution H to ILT(B, E) if and only
if E is coherent and E is consistent w.r.t. B.

4.2   An Inductive Learning Algorithm
In the following we present a bottom-up method to compute a logic program for our
inductive learning tasks. This method generates hypothesis by generalization from the
most specific rules until all examples are covered.
    Firstly, let q ∈ A and I ⊆ A. We denote rqI the following rule:
                                       q ← I ∪ ¬I                                       (9)
which is the most specific normal rule such that q belongs to a candidate successor
of the state I. Now the LFDT algorithm is showed in Algorithm 1. Intuitively, this
algorithm is to construct the following rules for these examples with the same first state
in the state transitions hI, J1 i, . . . , hI, Jm i of E:
              H ← I ∪ ¬I,        H is a minimal hitting set of J1 , . . . , Jm .       (10)
    The algorithm AddRule, shown in Algorithm 2, adds these rules into the result. It
also simplifies the result by removing being subsumed rules through disjunctive ground
resolution and combined resolution. Since disjunctive ground resolution is a generaliza-
tion of ground resolution, this algorithm is also a generalization of the algorithm LF1T
in [6], which learns normal logic programs from (deterministic) state transitions, i.e.,
I1 6= I2 for any two distinct state transitions hI1 , J1 i and hI2 , J2 i in E.
    Let P be a logic program, and E be a coherent set of state transitions which is
consistent w.r.t. a background theory B. The logic program P is complete for E w.r.t.
                               d
B if {J | hI, Ji ∈ E} ⊆ TB∪P        (I) for any I ∈ E i , it is sound for E if TB∪P
                                                                                d
                                                                                    (I) ⊆
                                  i
{J | hI, Ji ∈ E} for any I ∈ E . A learning algorithm is complete (resp. sound) for E
w.r.t. B if its output is complete (resp. sound) for E w.r.t. B. In the following we show
the correctness of the LFDT algorithm according to its soundness and completeness.
Theorem 1. The algorithm LFDT is sound and complete (with disjunctive ground res-
olution, combined resolution, and/or subsumption reduction). Namely, if E is coherent
and B is consistent w.r.t. E then the output P by the algorithm LFDT is sound and
complete for E w.r.t. B.


                                           85
    Algorithm 1: LFDT(E, B)
   Input: A set E of state transitions and a background theory B such that E is coherent and
          it is consistent w.r.t. B
   Output: A logic program P
 1 P ← B;
 2 foreach hI, Ji ∈ E do
 3     Q ← {rqI |q ∈ J};
 4     E ← E \ {hI, Ji};
 5     foreach hI 0 , J 0 i ∈ E with I 0 = I do
 6          E ← E \ {hI 0 , J 0 i};
 7          foreach p ∈ J 0 and r ∈ Q do Q ← Q ∪ {hd(r) ∪ {p} ← bd(r)};
 8          ;
 9     end
10     foreach r ∈ Q do AddRule(r, P );
11     ;
12 end
13 P ← P \ B;
14 return P ;


5     Concluding Remarks and Future Work

In this paper we proposed a new framework LFDT for learning disjunctive programs
from interpretation transitions. It is a nontrivial and substantial extension to the LF1T
framework. One remaining challenge work is to apply the learning approach to practical
domains, such as bio-informatics for which LF1T is successfully applied.
    Acknowledgement. We thank reviewers for their helpful comments. This work
is partially supported by NSFC under grant 63170161, Stadholder Fund of Guizhou
Province under grant (2012)62, Outstanding Young Talent Training Fund of Guizhou
Province under grant (2015)01 and Science and Technology Fund of Guizhou Province
under grant [2014]7640. Yi Huang’s work is partially supported by Scientific and Tech-
nological Research Program of Chongqing Municipal Education Commission under
Grant No. KJ1601129.


References

 1. Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods.
    Journal of Logic Programming, 19/20:629–679, 1994.
 2. Stephen Muggleton, Luc De Raedt, David Poole, Ivan Bratko, Peter A. Flach, Katsumi Inoue,
    and Ashwin Srinivasan. ILP turns 20 - biography and future challenges. Machine Learning,
    86(1):3–23, 2012.
 3. S.A. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets.
    Journal of Theoretical Biology, 22(3):437 – 467, 1969.
 4. Stuart A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution.
    Oxford: Oxford University Press, 1993.


                                           86
  Algorithm 2: AddRule(r, P )
    Input: A rule r and a logic program P
 1  if ∃r0 ∈ P s.t. r0 ≺ r then return;
  2 ;
                0                   0                   0
  3 foreach r ∈ P do if r ≺ r then P ← P \ {r };
  4 ;
  5 ;
  6 P ← P ∪ {r};
  7 while r, r1 , . . . , rk ∈ P are combined resolvable do AddRule(cr(r, r1 , . . . , rk ));
  8 ;
                0
  9 foreach r ∈ P do
 10      if r is disjunctively ground resolvable w.r.t. r0 then
 11           AddRule(gr(r, r0 ), P );
 12      else if r0 is disjunctively ground resolvable w.r.t. r then
 13           AddRule(gr(r0 , r), P );
 14      end
 15 end
 16 return P ;


 5. Katsumi Inoue. Logic programming for boolean networks. In Toby Walsh, editor, IJCAI
    2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence,
    Barcelona, Catalonia, Spain, July 16-22, 2011, pages 924–930. IJCAI/AAAI, 2011.
 6. Katsumi Inoue, Tony Ribeiro, and Chiaki Sakama. Learning from interpretation transition.
    Machine Learning, 94(1):51–79, 2014.
 7. Maxime Folschette, Loı̈c Paulevé, Katsumi Inoue, Morgan Magnin, and Olivier H. Roux.
    Identification of biological regulatory networks from process hitting models. Theoretical
    Computer Science, 568:49–71, 2015.
 8. David Martı́nez, Tony Ribeiro, Katsumi Inoue, Guillem Alenyà, and Carme Torras. Learning
    probabilistic action models from interpretation transitions. In Marina De Vos, Thomas Eiter,
    Yuliya Lierler, and Francesca Toni, editors, Proceedings of the Technical Communications
    of the 31st International Conference on Logic Programming (ICLP 2015), Cork, Ireland,
    August 31 - September 4, 2015., volume 1433 of CEUR Workshop Proceedings. CEUR-
    WS.org, 2015.
 9. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunc-
    tive databases. New Generation Computing, 9:365–385, 1991.
10. Jorge Lobo, Jack Minker, and Arcot Rajasekar. Foundations of disjunctive logic program-
    ming. Logic Programming. MIT Press, 1992.
11. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 2nd edition, 1987.
12. Maarten H. van Emden and Robert A. Kowalski. The semantics of predicate logic as a
    programming language. Journal of the ACM, 23(4):733–742, 1976.
13. Victor W. Marek and Miroslaw Truszczynski. Logic programs with abstract constraint atoms.
    In Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth
    Conference on Innovative Applications of Artificial Intelligence (AAAI 2004), pages 86–91,
    San Jose, California, USA, 2004. AAAI Press.
14. Gordon D. Plotkin. A note on inductive generalisation. In B. Meltzer and D. Michie, editors,
    Machine Intelligence, volume 5, chapter 8, pages 153–163. Edingburgh Unviersity Press,
    1970.


                                                87