Learning Disjunctive Logic Programs from Interpretation Transition Yi Huang1,3 , Yisong Wang1,? , Ying Zhang2 , and Mingyi Zhang2 1 Guizhou University, Guiyang 550025, P. R. China 2 Guizhou Academy of sciences, Guiyang 550001, P. R. China 3 Chongqing University of Arts and Sciences, Chongqing 402160, P. R. China Abstract. We present a new framework for learning disjunctive logic programs from interpretation transitions, called LFDT. It is a nontrivial extension to Inoue, Ribeiro and Sakama’s LF1T learning framework, which learns normal logic pro- grams from interpretation transitions. Two resolutions for disjunctive rules are also presented and used in LFDT to simplify learned disjunctive rules. 1 Introduction In machine learning and specifically inductive logic programming [1, 2], it is an im- portant task to learn the dynamics of complex systems, such as Boolean networks. A Boolean network consist of a set of Boolean variables each of which has a Boolean function. It is a simple yet a powerful mathematical tool to describe dynamics of com- plex systems [3, 4]. Since a seminal work on representing Boolean networks by logic programs [5], there is increasing interest in approaching the task from the perspective of logic pro- gramming [6–8]. In particular, Inoue, Ribeiro and Sakama proposed a novel frame- work, named LF1T, for learning normal logic programs from interpretation transitions that are pairs hI, Ji of interpretations with TP (I) = J, where TP is the immediate consequence operator for a normal logic program P [6]. It is well-known that disjunctive logic programs [9, 10] are substantially more ex- pressive than normal logic programs at many aspects. To extend LF1T for learning disjunctive logic programs from interpretation transitions, we present a new immediate consequence operator TPd for a disjunctive logic program P . Informally, TPd (I) consists of all minimal models of the heads of rules in P whose bodies are satisfied by I. Comparing with LF1T framework for learning normal logic programs, a nontrivial work in the paper is to handle with nondeterministic interpretation transitions, i.e., it is possible there are two interpretation transitions hI, Ji and hI, J 0 i in an observation with J 6= J 0 . Combining with two new proposed resolutions for disjunctive rules, we achieve the new framework for learning disjunctive logic programs from interpretation transitions, called LFDT. It is proved being both sound and complete. ? Corresponding author, yswang@gzu.edu.cn. 81 2 Disjunctive Logic Programs We assume a underlying first-order language without proper function symbols L and denote its Herbrand base (the set of all ground atoms) by A. We assume A is finite for our learning purpose. A (disjunctive) logic program is a finite set of (disjunctive) rules of the form A1 ∨ · · · ∨ Ak ← B1 ∧ · · · ∧ Bm ∧ ¬C1 ∧ · · · ∧ ¬Cn , (1) where k ≥ 1, m ≥ 0, n ≥ 0 and Ai (1 ≤ i ≤ k), Bj (1 ≤ j ≤ m), and Cs (1 ≤ s ≤ n) are atoms of L. For a rule r of the form (1), the head of r, written hd(r), is A1 ∨ · · · ∨ Ak ; the body of r, written bd(r), is the conjunction B1 ∧ · · · ∧ Bm ∧ ¬C1 ∧ · · · ∧ ¬Cn ; the atoms occurring in the body of r positively (resp. negatively) is denoted by bd+ (r) = {B1 , . . . , Bm } (resp. bd− (r) = {C1 , . . . , Cn }). If k = 1 then r is normal. A nor- mal logic S program is a finite set of normal S rules. Given a logic program P , we denote hd(P ) = r∈P hd(r) and bd(P ) = r∈P bd(r). For convenience, we also write hd(r) as the set {A1 , . . . , Ak }, and bd(r) as the set {B1 , . . . , Bm , ¬C1 , . . . , ¬Cn } when there is no confusion. In this sense, a rule r of the form (1) can be alternatively written as {A1 , . . . , Ak } ← {B1 , . . . , Bm , ¬C1 , . . . , ¬Cn }. (2) A substitution θ is a function from variables to terms, which is written in the fol- lowing form {x1 /t1 , . . . , xn /tn } (3) where xi s (1 ≤ i ≤ n) are pair-wise distinct variables and ti s (1 ≤ i ≤ n) are terms (of the language L). The application eθ of θ to an expression e is obtained from e by simultaneously replacing all occurrences of each variable xi in e with the same term ti , and eθ is called an instance of e [11]. The substitution θ is ground if ti contains no variables for every xi /ti in θ. If the instance eθ of e contains no variable then it is a ground S instance of e. The ground of a logic program P , written ground(P ), is the set r∈P ground(r), where ground(r) = {rθ | rθ is a ground instance of r}. (4) For simplicity, we assume logic programs are always ground in the following of the paper, unless explicitly stated otherwise. Let r1 , r2 be two rules. We say that r1 subsumes r2 , written r1  r2 , if there exists a substitution θ such that hd(r1 )θ ⊆ hd(r2 ) and bd(r1 )θ ⊆ bd(r2 ). In this sense, we say that r1 is more (or equally) general than r2 , and r2 is less (or equally) general than r1 . By r1 ≺ r2 we mean r1 subsumes r2 but r2 does not subsume r1 . For a logic program P , we denote SR(P ) the logic program obtained from P by removing all rules that are properly subsumed by some other rules in P , i.e., SR(P ) = {r ∈ P |6 ∃r0 ∈ P s.t. r0 ≺ r}. (5) A (Herbrand) interpretation I is a set of ground atoms. An interpretation I satisfies a ground rule r if I satisfies bd(r) implies I satisfies hd(r). It satisfies a rule r if I 82 satisfies every ground rule in ground(r). It satisfies a logic program P if it satisfies every rule in the logic program P . In this case we call I a model of a (ground) rule (resp., logic program). In the following we use “|=” to denote the satisfaction relation, and “≡” to denote the classical equivalence relation. A rule r is applicable w.r.t. an interpretation I if I |= bd(r). Let P be a logic program. We denote app(P, I) the set of rules in P that are applicable w.r.t. I. Definition 1. Let P be a disjunctive logic program. The immediate consequence oper- A ator TPd : 2A → 22 is defined as follows, for I ⊆ A, TPd (I) = {S|S is a minimal (under set inclusion) model of hd(app(P, I))}. (6) Please note that the operator TPd is a generalization to the operator TP for normal logic programs [12], and it is similar to the operator TPnd for logic programs with ab- stract constraint atoms [13]. 3 Two resolutions To extend the learning algorithm for normal logic programs in [6] to disjunctive logic programs, we extend its ground resolution for disjunctive rules and present a combined resolution. Recall that a literal l is either an atom or its classical negation. The comple- ment of l, written l, is defined as A = ¬A and ¬A = A where A is an atom. For a set S of atoms, we denote ¬S = {¬A|A ∈ S}. Definition 2 (disjunctively ground resolution). Let r and r0 be two ground rules. The rule r is disjunctively ground resolvable w.r.t. r0 on a literal l whenever (a) l ∈ bd(r) and l ∈ bd(r0 ), (b) bd(r0 ) \ {l} ⊆ bd(r) \ {l}, and (c) hd(r0 ) ⊆ hd(r). The disjunctive ground resolvent of r w.r.t. r0 on l is the rule hd(r) ← bd(r) \ {l}. We denote it by gr(r, r0 ). In particular, if the above condition (b) is strengthen to bd(r0 ) \ {l} = bd(r) \ {l} (7) then we say that r is disjunctively naive resolvable w.r.t. r0 on l. The following proposition shows that the disjunctive ground resolution preserves the equivalence of the TPd operator in terms of TPd (I) = TPd 0 (I) for every I ⊆ A where P 0 is obtained from P by adding some disjunctive ground resolvent. Proposition 1. Let P be a ground logic program containing two disjunctive rules r and r0 such that r is disjunctively ground resolvable w.r.t. r0 on a literal l, and Q = P ∪ {gr(r, r0 )}. Then hd(app(P, I)) ≡ hd(app(Q, I)) for every I ⊆ A. 83 Definition 3 (combined resolution). Let r1 , . . . , rk and r be the following rules: r1 : hd(r1 ) ← bd+ ∪ ¬(bd− ∪ {b1 }), .. . rk : hd(rk ) ← bd+ ∪ ¬(bd− ∪ {bk }), 0 r : hd(r) ← bd+ ∪ {bi |1 ≤ i ≤ k} ∪ ¬bd− such that – bd− ∩ {bi |1 ≤ i ≤ k} = ∅, and – hd(ri ) ⊆ hd(r) for every i (1 ≤ i ≤ k). Then the combined resolvent of r, r1 , . . . , rk , written cr(r, r1 , . . . , rk ), is the rule 0 r∗ : hd(r) ← bd+ ∪ ¬(bd− ∪ bd− ). (8) In this case we say that the rules r, r1 , . . . , rk are combined resolvable. The next proposition shows that, similar to the disjunctive ground resolution, the combined resolution preserves the equivalence of the TPd operator as well. Proposition 2. Let P be a logic program containing rules r, r1 , . . . , rk such that r, r1 , . . ., rk are combined resolvable, and Q = P ∪ {cr(r, r1 , . . . , rk )}. It holds that hd(app(P, I)) ≡ hd(app(Q, I)) for any I ⊆ A. 4 Learning from 1-step Transitions In the section we present our inductive learning task for disjunctive logic programs and its learning algorithm. Properties of the algorithm are investigated as well. 4.1 The Learning Task A background theory is a logic program. An example (or observation) is a state tran- sition (or interpretation transition), i.e., a tuple hI, Ji with I ⊆ A and J ⊆ A, which means that the state J is a candidate successor of the state I in a Boolean network, or J ∈ TPd (I) for a disjunctive logic program P . Let E be a set of examples. We denote E i = {I | hI, Ji ∈ E}, E o = {J | hI, Ji ∈ S} and E(I) = {J | hI, Ji ∈ E} for I ⊆ A. The set E is total whenever E i = 2A . Definition 4 (the learning task). An inductive learning task from nondeterministic state transitions is, given a background theory B and a set E of examples (state transi- tions), to find a hypothesis (a logic program) H such that, for every example hI, Ji ∈ E, d J ∈ TB∪H (I) holds. 84 The above inductive learning task is written as ILT(B, E). Such a hypothesis H to the inductive learning task is called a solution to ILT(B, E). For our learning purpose, the given examples have to be restricted. For instance, let E = {h∅, ∅i, h∅, {p}i} and d B = ∅. There will be no logic program H satisfying {∅, {p}} ⊆ TB∪H (∅), since the d sets in the collection TB∪H (∅) are incomparable under set inclusion, while ∅ and {p} are comparable under set inclusion. A set E of state transitions is coherent if J and J 0 are incomparable under set inclusion for every hI, Ji and hI, J 0 i in E, i.e., J and J 0 are all minimal under set inclusion. A set E of state transitions is consistent w.r.t. a logic program P , if for each hI, Ji ∈ E, I |= bd(r) implies J |= hd(r) holds for every rule r in P . The following property identifies the sufficient and necessary condition for the ex- istence of a solution to an inductive learning task. Proposition 3. Given an inductive learning task ILT(B, E) where B is a background theory and E is a set of observations, there exists a solution H to ILT(B, E) if and only if E is coherent and E is consistent w.r.t. B. 4.2 An Inductive Learning Algorithm In the following we present a bottom-up method to compute a logic program for our inductive learning tasks. This method generates hypothesis by generalization from the most specific rules until all examples are covered. Firstly, let q ∈ A and I ⊆ A. We denote rqI the following rule: q ← I ∪ ¬I (9) which is the most specific normal rule such that q belongs to a candidate successor of the state I. Now the LFDT algorithm is showed in Algorithm 1. Intuitively, this algorithm is to construct the following rules for these examples with the same first state in the state transitions hI, J1 i, . . . , hI, Jm i of E: H ← I ∪ ¬I, H is a minimal hitting set of J1 , . . . , Jm . (10) The algorithm AddRule, shown in Algorithm 2, adds these rules into the result. It also simplifies the result by removing being subsumed rules through disjunctive ground resolution and combined resolution. Since disjunctive ground resolution is a generaliza- tion of ground resolution, this algorithm is also a generalization of the algorithm LF1T in [6], which learns normal logic programs from (deterministic) state transitions, i.e., I1 6= I2 for any two distinct state transitions hI1 , J1 i and hI2 , J2 i in E. Let P be a logic program, and E be a coherent set of state transitions which is consistent w.r.t. a background theory B. The logic program P is complete for E w.r.t. d B if {J | hI, Ji ∈ E} ⊆ TB∪P (I) for any I ∈ E i , it is sound for E if TB∪P d (I) ⊆ i {J | hI, Ji ∈ E} for any I ∈ E . A learning algorithm is complete (resp. sound) for E w.r.t. B if its output is complete (resp. sound) for E w.r.t. B. In the following we show the correctness of the LFDT algorithm according to its soundness and completeness. Theorem 1. The algorithm LFDT is sound and complete (with disjunctive ground res- olution, combined resolution, and/or subsumption reduction). Namely, if E is coherent and B is consistent w.r.t. E then the output P by the algorithm LFDT is sound and complete for E w.r.t. B. 85 Algorithm 1: LFDT(E, B) Input: A set E of state transitions and a background theory B such that E is coherent and it is consistent w.r.t. B Output: A logic program P 1 P ← B; 2 foreach hI, Ji ∈ E do 3 Q ← {rqI |q ∈ J}; 4 E ← E \ {hI, Ji}; 5 foreach hI 0 , J 0 i ∈ E with I 0 = I do 6 E ← E \ {hI 0 , J 0 i}; 7 foreach p ∈ J 0 and r ∈ Q do Q ← Q ∪ {hd(r) ∪ {p} ← bd(r)}; 8 ; 9 end 10 foreach r ∈ Q do AddRule(r, P ); 11 ; 12 end 13 P ← P \ B; 14 return P ; 5 Concluding Remarks and Future Work In this paper we proposed a new framework LFDT for learning disjunctive programs from interpretation transitions. It is a nontrivial and substantial extension to the LF1T framework. One remaining challenge work is to apply the learning approach to practical domains, such as bio-informatics for which LF1T is successfully applied. Acknowledgement. We thank reviewers for their helpful comments. This work is partially supported by NSFC under grant 63170161, Stadholder Fund of Guizhou Province under grant (2012)62, Outstanding Young Talent Training Fund of Guizhou Province under grant (2015)01 and Science and Technology Fund of Guizhou Province under grant [2014]7640. Yi Huang’s work is partially supported by Scientific and Tech- nological Research Program of Chongqing Municipal Education Commission under Grant No. KJ1601129. References 1. Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20:629–679, 1994. 2. Stephen Muggleton, Luc De Raedt, David Poole, Ivan Bratko, Peter A. Flach, Katsumi Inoue, and Ashwin Srinivasan. ILP turns 20 - biography and future challenges. Machine Learning, 86(1):3–23, 2012. 3. S.A. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology, 22(3):437 – 467, 1969. 4. Stuart A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution. Oxford: Oxford University Press, 1993. 86 Algorithm 2: AddRule(r, P ) Input: A rule r and a logic program P 1 if ∃r0 ∈ P s.t. r0 ≺ r then return; 2 ; 0 0 0 3 foreach r ∈ P do if r ≺ r then P ← P \ {r }; 4 ; 5 ; 6 P ← P ∪ {r}; 7 while r, r1 , . . . , rk ∈ P are combined resolvable do AddRule(cr(r, r1 , . . . , rk )); 8 ; 0 9 foreach r ∈ P do 10 if r is disjunctively ground resolvable w.r.t. r0 then 11 AddRule(gr(r, r0 ), P ); 12 else if r0 is disjunctively ground resolvable w.r.t. r then 13 AddRule(gr(r0 , r), P ); 14 end 15 end 16 return P ; 5. Katsumi Inoue. Logic programming for boolean networks. In Toby Walsh, editor, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 924–930. IJCAI/AAAI, 2011. 6. Katsumi Inoue, Tony Ribeiro, and Chiaki Sakama. Learning from interpretation transition. Machine Learning, 94(1):51–79, 2014. 7. Maxime Folschette, Loı̈c Paulevé, Katsumi Inoue, Morgan Magnin, and Olivier H. Roux. Identification of biological regulatory networks from process hitting models. Theoretical Computer Science, 568:49–71, 2015. 8. David Martı́nez, Tony Ribeiro, Katsumi Inoue, Guillem Alenyà, and Carme Torras. Learning probabilistic action models from interpretation transitions. In Marina De Vos, Thomas Eiter, Yuliya Lierler, and Francesca Toni, editors, Proceedings of the Technical Communications of the 31st International Conference on Logic Programming (ICLP 2015), Cork, Ireland, August 31 - September 4, 2015., volume 1433 of CEUR Workshop Proceedings. CEUR- WS.org, 2015. 9. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunc- tive databases. New Generation Computing, 9:365–385, 1991. 10. Jorge Lobo, Jack Minker, and Arcot Rajasekar. Foundations of disjunctive logic program- ming. Logic Programming. MIT Press, 1992. 11. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 2nd edition, 1987. 12. Maarten H. van Emden and Robert A. Kowalski. The semantics of predicate logic as a programming language. Journal of the ACM, 23(4):733–742, 1976. 13. Victor W. Marek and Miroslaw Truszczynski. Logic programs with abstract constraint atoms. In Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence (AAAI 2004), pages 86–91, San Jose, California, USA, 2004. AAAI Press. 14. Gordon D. Plotkin. A note on inductive generalisation. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 5, chapter 8, pages 153–163. Edingburgh Unviersity Press, 1970. 87