On Datasets for Evaluating Architectures for Learning to Reason

Naveen Sundar Govindarajulu,1 Jean-Claude Paquin, Shreya Banerjee, Paul Mayol, Selmer Bringsjord∗
                                                       Rensselaer Polytechnic Institute
                                                     110 8th Street, Troy, NY 12180, USA
                                                         1
                                                           naveensundarg@gmail.com


                              Abstract                                   Why New Datasets?
    In our poster, we will introduce new datasets in propositional       Existing datasets that can be used for learning to reason
    logic and first-order logic that can be used for learning to rea-    are either too complex or do not show much variation in
    son, and present some initial results on systems that use this       the samples. For example, the Mizar repository (Naumow-
    data.
                                                                         icz and Kornilowicz 2009) has more than 50, 000 reason-
                                                                         ing problems (theorems in a first-order logic) and have been
                          Introduction                                   used in Area 1, but these problems are too complex to be
  There is a growing research interest in incorporating learn-           useful in bootstraping a learning system from scratch. On
  ing in reasoning systems. Such efforts fall largely into two           the other hand, simpler toy datasets such as the deduction
  different areas that we term Area I and Area II. We give               task in bAbI (Weston et al. 2015) do not show that much
  a quick overview of what a reasoning system does before                variation. For example, figures 1a and 1b show answers
  describing these two areas.                                            and full proofs to two different questions in the bAbI dataset
     In general, a reasoning system can be modeled as a search           represented in the Slate proof assistant system (Bringsjord
  through some space. This search usually relies on a number             et al. 2008). Both the proofs can be obtained by applying
  of hand-written heuristics. Theorem provers make this quite            the resolution inference rule twice. Moreover, the proofs are
  explicit, as one can specify these heuristics as an end-user.          structurally similar and can be generated by switching con-
  For instance, in first-order resolution theorem provers, the           stant symbols in a first-order proof. In fact, the entire dataset
  goal is to find a sequence of resolution operations using an           in the bAbI deduction task follows this one single proof pat-
  initial set of clauses C that results in an empty clause. At           tern. Ideally, we want a dataset that has problems of varying
  any point in the search, the prover has to choose a set of             levels of complexity.
  clauses from an overall set of clauses it has derived. The-
  orem provers use heuristics such as the size of a clause,                                     The Problem
  the complexity of a clause, age of a clause etc. to choose
  a clause.                                                              We present an abstract version of the problem we seek to
     Efforts in Area I, such as (Kaliszyk, Chollet, and Szegedy          solve. Assume that we have a formal logic F ≡ hL, I, ⊥i,
  2017), revolve around selecting or computing an appropriate            where L is the language of the formal logic, I is the infer-
  set of heuristics using some form of learning while not tam-           ence system and ⊥ ∈ L. Any set of formulae Γ in the lan-
  pering with the rest of the search process. Efforts in Area            guage L can be consistent, Γ 6`I ⊥, or inconsistent Γ `I ⊥.
  II aim to learn a function from scratch that does the entire           Any reasoning problem Γ ` φ can be posed as a consis-
  search. While there has been quite significant progress in             tency problem Γ + ¬φ `I ⊥ if certain conditions C1 , C2 are
  Area I, there has been very little progress in Area II. We feel        satisfied by I (Boolos, Burgess, and Jeffrey 2003).
  that one of the main reasons for this state of affairs, is that
  there is no standard dataset that can be leveraged for Area II.         C1 The deduction theorem Γ + φ ` ψ ⇒ Γ ` φ → ψ.
  Datasets for Area I are built up with learning heuristics as a          C2 The law of excluded middle. {} ` φ ∨ ¬φ.
  goal and are not ideal for Area II systems.
     In our poster, we will discuss new datasets in proposi-                In fact, resolution-based theorem provers such as Vam-
  tional logic and first-order logic that can be used for learning       pire (Kovács and Voronkov 2013) function in this manner
  to reason and present some initial results based on this data.         by searching for a proof of ⊥ from Γ + ¬φ in order to prove
     ∗
                                                                         Γ ` φ. Therefore, reasoning in logics with C can be reduced
       ONR                                                               to consistency checking.
  Copyright held by the author(s). In A. Martin, K. Hinkelmann, A.
  Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of        We pose the learning problem as a standard classifica-
  the AAAI 2019 Spring Symposium on Combining Machine Learn-             tion task. Let con(Γ) ∈ {0, 1} denote whether Γ is con-
  ing with Knowledge Engineering (AAAI-MAKE 2019). Stanford              sistent con(Γ) = 1 or not con(Γ) = 0. Given a training
  University, Palo Alto, California, USA, March 25-27, 2019.             data Dtrain of sentences and their consistency information,
                                                                         φ has two or more nested quantifiers, can cause some state-
                                                                         of-the-art theorem provers into running for an unbounded
                                                                         amount of time. To address this, we use sorted first-order
                                                                         logic with a given set of relation, function, and constant sym-
                                                                         bols, along with certain complexity constraints. The sorts
                                                                         prevent us from generating nonsensical formulae. Given a
                                                                         sorted signature σ, we generate a certain number of unique
                                                                         formulae Γ and apply first-order natural deduction inference
                                                                         rules till we have a proof ρ of a certain complexity. One such
                                                                         problem and a corresponding proof in an imaginary mechan-
                                                                         ical domain is given below. See Figure 2.
                                                                           Sample FOL Problem
(a) bAbI 1 Emily is a cat. Cats are afraid of wolves. What is emily
afraid of?
                                                                           1. If gear x is connected to       gear 3.
                                                                              gear y and gear y is con-
                                                                              nected to gear z, then       5. Gear 3 is enclosed by
                                                                              gear x is connected to          box 1.
                                                                              gear z.                      6. All boxes are con-
                                                                           2. If gear x and gear y are        nected.
                                                                              connected, and gear x is     7. Lever 1 either fixes or
                                                                              broken, gear y is broken        breaks all gears.
                                                                              too.
                                                                           3. Gear 1 is connected to       8. Lever 1 breaks gear 1.
                                                                              gear 2.                      9. If x encloses y and y is
 (b) bAbI 2 Winona is a sheep. Sheep are afraid of mice. What              4. Gear 2 is connected to          broken, x is broken.
 is winona afraidof?
                                                                           Question: Is box 20 broken? Answer: Yes
                    Figure 1: bAbI problems


{hΓ1 , con(Γi )i |1 ≤ i ≤ n }, the goal is to learn a function
that approximates con and is evaluated on a test set Dtest .

        Preview: Data and Data Generation
We look at two different formal logics: propositional cal-
culus and first-order logic. We randomly generate sen-
tences and their consistency labels. The generation process
is slightly different for the two different logics.

Propositional Logic
For propositional logic, we generate formula in conjunctive
normal form. Each formula φ is the form of a disjunction
l1 . . . li . . . ln . Each literal l is an atom P or its negation P .
We generate a radom set of formulae Γ by randomly gener-
ating u clauses c1 , c2 , . . . , cu where each clause has v ran-        Figure 2: Proof for the Sample FOL Problem A resolution
dom literals drawn from a set of w atomic propositions. For              proof for the sample FOL problem given above.
each such random set of sentences, we run a theorem prover
to check whether the sentence is consistent or not. Using
this method, we have three distinct datasets for propositional
logic with u, v, w = 3, u, v, w = 4 and u, v, w = 5.

                     First-order Logic
First-order Logic Due to the expressivity of first-order
logic, naı̈ve random generation of formulae can quickly lead
to very difficult to solve problems or degenerate problems
that do not have real-world analogs but are also difficult to
solve. For instance, biconditionals such as φ ↔ φ, where
                       References
Boolos, G. S.; Burgess, J. P.; and Jeffrey, R. C. 2003. Com-
putability and Logic (Fifth Edition). Cambridge, UK: Cam-
bridge University Press.
Bringsjord, S.; Taylor, J.; Shilliday, A.; Clark, M.; and Ark-
oudas, K. 2008. Slate: An Argument-Centered Intelligent
Assistant to Human Reasoners. In Grasso, F.; Green, N.;
Kibble, R.; and Reed, C., eds., Proceedings of the 8th In-
ternational Workshop on Computational Models of Natural
Argument (CMNA 8), 1–10. Patras, Greece: University of
Patras.
Kaliszyk, C.; Chollet, F.; and Szegedy, C. 2017. Holstep: A
Machine Learning Dataset for Higher-order Logic Theorem
Proving. arXiv preprint arXiv:1703.00426.
Kovács, L., and Voronkov, A. 2013. First-order theorem
proving and vampire. In International Conference on Com-
puter Aided Verification, 1–35. Springer.
Naumowicz, A., and Kornilowicz, A. 2009. A Brief
Overview of Mizar. In Berghofer, S.; Nipkow, T.; Urban,
C.; and Wenzel, M., eds., Theorem Proving in Higher Order
Logics, volume 5674 of Lecture Notes in Computer Science
(LNCS). Berlin: Springer. 67–72.
Weston, J.; Bordes, A.; Chopra, S.; Rush, A. M.; van
Merriënboer, B.; Joulin, A.; and Mikolov, T. 2015. Towards
ai-complete question answering: A set of prerequisite toy
tasks. arXiv preprint arXiv:1502.05698.