An Overview of Methods for Large-Theory Automated Theorem Proving (Invited Paper) Josef Urban Radboud University Nijmegen josef.urban@gmail.com Abstract This is an attempt at a brief initial overview of the state of the art in the young field of first-order automated reasoning in large theories (ARLT). It is necessarily biased by the au- thor’s imperfect knowledge, and hopefully will serve as material provoking further corrections and completions. 1 Why Large Theories? Why should we want to (automatically) reason in large theories and develop them instead of small theories? Here are several answers: • Mathematicians work in large theories. They know a lot of concepts, facts, examples and counter-examples, proofs, heuristics, and theory-development methods. • Other scientists (and humans in general) work with large theories. Consider physics, chemistry, biology, law, politics, large software libraries, Wikipedia, etc. Our current knowledge about the world is large. • In the last years, more and more knowledge is becoming available formally by all kinds of human efforts (interactive theorem proving, common-sense reasoning, knowledge bases for various sciences, Semantic Web, etc.). This is an opportunity for automated reasoning to help with the sciences and tasks mentioned above. • Existing resolution/superposition automated reasoning systems often derive large numbers of facts, even from small initial number of premises. Managing such large numbers can profit from specialized large-theory techniques. Automated reasoning in large theories is today often about increasing the comfort of users of automated reasoning methods: It is typically possible to manually select premises from which some conjecture should follow. Often this is even a significant part of one’s formal reasoning wisdom. But ultimately, manual is the opposite of automated. 1.1 Large Formal Theories Are Not Our Enemy However, automated selection of relevant facts is only the very first step that recently made existing ATP methods usable and useful in large theories. This premise-selection view treats large theories to a large extent only as an ATP person’s enemy: We need to select the few right facts from the large pile of less relevant facts before we get down to the “real science” of “doing ATP”. This is in the author’s opinion a very limited view of the large-theory field. The bigger reason for making large complex theories and knowledge bases available to the automated reasoning world is that they can contain a large amount of domain-specific problem-solving knowledge, 3 An Overview of Methods for Large-Theory Automated Theorem Proving J. Urban and likely (in less explicit form), also a large amount of general problem-solving knowledge that the automated reasoning field should reveal and integrate into its pool of methods. For this, however, another limited view needs to be overcome: Large theories (and theories in general) are not just random collections of usable facts. Mathematical theories in particular have been developed by smart people over centuries, and quite likely such theories are the best, deeply computer-understandable corpus of abstract human thinking that we currently have. It seems negligent to ignore the internal theory structure, and the problem-solving and theory- engineering knowledge developed by mathematicians so far. Especially when we know that first-order ATP is an undecidable problem, and that the current ATP methods are on average far behind what trained mathematicians can do. Thus, large complex formal theories and knowledge bases are not an enemy, but an opportu- nity. Not just an opportunity to reason with the knowledge of many already established facts, but also an opportunity to analyze and learn how smart people reason and prove difficult theo- rems, develop their conceptual space, and how they find surprising connections and solutions. In short, large formal theories are a great new playground for developing general AI. But because general AI (and theorem-proving oriented AI in particular) has been in the second half of the 20th century labeled as unproductive, general AI research in this field should go hand-in-hand with practical applications and usability testing. So far, this has fortunately often been the case in this young field. 2 Corpora Several large formal knowledge bases have become recently available to experiments with first- order automated reasoning tools. To name the major ones (in alphabetic order): • The CYC (OpenCyc, ResearchCyc) common-sense knowledge base [16] • The Isabelle/HOL mathematical library [10] • The Mizar/MML mathematical library [25] • The SUMO (and related ontologies) common-sense knowledge base [13] It is likely that more will follow (or already are available). For example, the HOL Light/Fly- speck [4, 5] large mathematical library should benefit from similar first-order translation tech- niques as the Isabelle/HOL library. More common-sense knowledge bases like YAGO [21] might be produced by semi-automated methods, and bridges to all kinds of specialized sci- entific databases are being build, spearheaded by systems like Biodeducta [19]. The LogAnswer project [3] has already started to reason over the first-order export of the full texts of German Wikipedia. The corpora differ in their purpose/origin, size, complexity, consistency, completeness, and the extent to which they cover various large-theory aspects. The common-sense ontologies contain a lot of classification/hierarchical knowledge, resulting typically in simple Horn clauses, and also a lot of concept definitions with relatively few facts proved about them. Storing and maintaining proofs has so far been a secondary aspect. Their primary emphasis was not (so far) on building up libraries of more and more advanced proved theorems about the world, but rather on covering as many concepts as possible by suitable definitions. On the other hand, the mathematical theories have a much larger number of nontrivial mathematical theorems in them, and their formal content typically follows some established 4 An Overview of Methods for Large-Theory Automated Theorem Proving J. Urban informal theory developments based on well-known and fixed mathematical foundations. There is more concept/fact re-use in mathematics, and nontrivial proofs of many facts exist and (at least in theory) can be made available in common formats and for large-theory techniques based on inspection of previous proofs and theory developments. 3 Automated Methods for Reasoning in Large Theories The existing large-theory reasoning methods can be divided into several groups, using various criteria. One criterion is the method used for knowledge selection. The methods developed so far include syntactic heuristics, heuristics using semantic information, methods that look at previous solutions, and combinations thereof. Systems and methods that make use mainly of syntactic criteria for premise selection include: • The SInE (SUMO Inference Engine) algorithm by Kryštof Hoder [6], and its E implemen- tation by Stephan Schulz.1 The basic idea is to use global frequencies of symbols to define their global generality, and build a relation linking each symbol S with all formulas F in which S is has the lowest global generality among the symbols of F . In common-sense ontologies, such formulas typically define the symbols linked to them, which is the reason for calling this relation a D-relation. Premise selection for a conjecture is then done by recursively following the D-relation, starting with the conjecture’s symbols. Various pa- rameters can be used, e.g., limiting the recursion depth significantly helps for the Mizar library [26], and preliminary experiments show that also for the Isabelle/HOL library. • The default premise selection heuristic used by the Isabelle/Sledgehammer export [11] seems to be quite similar to SInE, however it works internally in Isabelle, and uses addi- tional mechanisms like blacklisting. D-relation is not used there, the formulas are linked to all symbols they contain. • The Conjecture Symbol Weight clause selection heuristics in E prover [18] give lower weights to symbols contained in the conjecture, thus preferring during the inference steps the clauses that have common symbols with the conjecture. This is remotely similar to gen- eral goal-oriented ATP techniques, as for example the Set of Support (SoS) strategy in resolution/superposition provers,2 . Note that also the majority of tableau calculi are in practice goal-oriented, and the leanCoP [12] prover in particular performs surprisingly well on the MPTP Challenge large-theory benchmark. A method which is purely signature-based, however the word semantics appears in it, is latent semantics. Latent semantics is a machine learning method that has been successfully used for example in the Netflix Challenge, and in web search. Its principle is to automatically derive “semantic” equivalence classes of words (like car, vehicle, automobile ) from their co-occurrences in documents, and to use such equivalence classes (also called synsets in the WordNet ontology) instead of the original words for searching and related tasks. This technique has been so far used in: • Paul Cairns’ Alcor system [1] for searching and advice over the Mizar library. • Yuri Puzis’ initial relevance ordering of premises used in the SRASS ATP metasystem [22]. 1 http://www.mpi-inf.mpg.de/departments/rg1/conferences/deduction10/slides/stephan-schulz.pdf 2 In particular, SPASS [30] has been used successfully on the Isabelle data. 5 An Overview of Methods for Large-Theory Automated Theorem Proving J. Urban Semantics (in the original logical sense) has been used for a relatively long time in various ways for guiding the ATP inference processes. An older system that is worth mentioning with respect to the current efforts is John Slaney’s SCOTT system [20] constraining Otter inferences by validity in models. A similar idea has been recently revived by Jiřı́ Vyskočil at the Prague ATP seminar: His observation was that mathematicians have very fast conjecture-rejection methods based on a (relatively small) pool of (often imprecise) models in their heads, similar to some fast heuristic software testing methods. This motivated Petr Pudlák’s semantic axiom selection system for large theories [15], implemented later also by Geoff Sutcliffe in SRASS. The basic idea is to use finite model finders like MACE [9] and Paradox [2] to find counter-models of the conjecture, and gradually select axioms that exclude such counter-models. The models can differentiate between a formula and its negation, which is typically beyond the heuristic symbolic means. This idea has been also used later in the MaLARea system [27], however in the context of many problems solved simultaneously and many models kept in the pool, and using the models found also as classification features for machine learning. MaLARea is also an example of a system that uses learning from previous proofs for guiding premise-selection for new conjectures. The idea of this approach is to define suitable features characterizing conjectures (symbolic, semantic, structural, etc.), and to use machine learning methods on available proofs to learn the function that associates the conjecture features with the relevant premises. A sophisticated learning approach has been suggested and implemented in E prover by Stephan Schulz for his PhD work [17], which unfortunately preceded the appearance of large theories by several years.3 In this approach, proofs are abstracted into proof traces, consisting of clause patterns in which symbol names are abstracted into higher-order variables. Such proof traces from many proofs are collected into a common knowledge base, which is loaded when a new problem is solved, and used for guiding clause selection. This is probably quite similar to the hints technique in Prover9 [8], which however seems to be used more in a single-problem proof-shortening scenario. Note that such techniques already move the large-theory techniques towards smart general- purpose ATP techniques for proof guidance. A recent attempt in this direction is the MaLeCoP system [28]. There, the clause relevance is learned from all closed tableau branches, and the tableau extension steps are guided by a trained machine learner that takes as input features a suitable encoding of the literals on the current tableau branch. In some sense this tries to transfer the promising premise selection techniques deeper into the core of ATP systems. Unlike the above mentioned technique used in E prover, the advising is however left to external systems, which communicate with the prover over a sufficiently fast link. 4 More Systems and Metasystems Not all systems do premise selection, however they may be still worth of mentioning. One way how to reason with full large theories is to significantly limit the reasoning power. At the extreme, such methods become the many search methods available for the corpora men- tioned above. A somewhat more involved memorization/reasoning technique is subsumption implemented in various ATP systems. A type-aware extension of subsumption is implemented for the Mizar library in the MoMM system [24]. Extending such limited systems further in a controlled and restricted way might be quite rewarding. 3 The author and Stephan Schulz have shortly tried to revive this old E code and test it on the MPTP Challenge benchmark in 2007, however without any significant results. So this advanced code is still waiting to be properly revived and tested. 6 An Overview of Methods for Large-Theory Automated Theorem Proving J. Urban Another interesting large-theory techniques is lemmatization and concept creation. An ex- ample lemmatization system has been implemented by Petr Pudlák in his PhD thesis [14]: The system uses lemmas found in successful proofs to enrich the whole theory, find new proofs, and shorten existing ones. Concept creation is a long-time AI research, going back to Lenat’s sem- inal work on AM [7]. Recently, concept creation has been tried to shorten long, automatically produced proofs in [29]. Refactoring of proofs into human-digestible form seems to be a very interesting task that we are facing more and more as the automated methods are getting more and more usable. As computers are getting better in solving hard and large problems, we should also make them better in explaining their solutions to us. References [1] P. A. Cairns. Informalising formal mathematics: Searching the Mizar library with latent semantics. In A. Asperti, G. Bancerek, and A. Trybulec, editors, MKM, volume 3119 of Lecture Notes in Computer Science, pages 58–72. Springer, 2004. [2] K. Claessen and N. Sorensson. New Techniques that Improve MACE-style Finite Model Finding. In P. Baumgartner and C. Fermueller, editors, Proceedings of the CADE-19 Workshop: Model Compu- tation - Principles, Algorithms, Applications, 2003. [3] U. Furbach, I. Glöckner, and B. Pelzer. An application of automated reasoning in natural language question answering. AI Commun., 23(2-3):241–265, 2010. [4] T. C. Hales. Introduction to the flyspeck project. In Thierry Coquand, Henri Lombardi, and Marie-Françoise Roy, editors, Mathematics, Algorithms, Proofs, volume 05021 of Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, 2005. [5] T. C. Hales, J. Harrison, S. McLaughlin, T. Nipkow, S. Obua, and R. Zumkeller. A revision of the proof of the kepler conjecture. Discrete & Computational Geometry, 44(1):1–34, 2010. [6] K. Hoder and A. Voronkov. Sine qua non for large theory reasoning. In CADE 11, 2011. To appear. [7] D. Lenat. An Artificial Intelligence Approach to Discovery in Mathematics. PhD thesis, Stanford University, Stanford, USA, 1976. [8] W.W. McCune. Prover9. http://www.mcs.anl.gov/ mccune/prover9/. [9] W.W. McCune. Mace4 Reference Manual and Guide. Technical Report ANL/MCS-TM-264, Argonne National Laboratory, Argonne, USA, 2003. [10] J. Meng and L. C. Paulson. Translating higher-order clauses to first-order clauses. J. Automated Reasoning, 40(1):35–60, 2008. [11] J. Meng and L. C. Paulson. Lightweight relevance filtering for machine-generated resolution prob- lems. J. Applied Logic, 7(1):41–57, 2009. [12] J. Otten and W. Bibel. leanCoP: Lean Connection-Based Theorem Proving. Journal of Symbolic Computation, 36(1-2):139–161, 2003. [13] A. Pease and G. Sutcliffe. First order reasoning on a large ontology. In G. Sutcliffe et al. [23]. [14] P. Pudlak. Search for Faster and Shorter Proofs using Machine Generated lemmas. In G. Sutcliffe, R. Schmidt, and S. Schulz, editors, Proceedings of the FLoC’06 Workshop on Empirically Successful Computerized Reasoning, 3rd International Joint Conference on Automated Reasoning, volume 192 of CEUR Workshop Proceedings, pages 34–52, 2006. [15] P. Pudlak. Semantic selection of premisses for automated theorem proving. In G. Sutcliffe et al. [23]. [16] D. Ramachandran, Reagan P., and K. Goolsbey. First-orderized ResearchCyc: Expressiveness and Efficiency in a Common Sense Knowledge Base. In P. Shvaik , editor, Proceedings of the Workshop on Contexts and Ontologies: Theory, Practice and Applications, 2005. [17] S. Schulz. Learning Search Control Knowledge for Equational Deduction. PhD thesis, Technische Universität München, Munich, Germany, 2000. 7 An Overview of Methods for Large-Theory Automated Theorem Proving J. Urban [18] S. Schulz. E: A Brainiac Theorem Prover. AI Communications, 15(2-3):111–126, 2002. [19] J. Shrager, R. Waldinger, M. Stickel, and J. P. Massar. Deductive biocomputing. PLoS ONE, 2(4):e339, Apr 2007. [20] J. K. Slaney, E. Lusk, and W. W. McCune. SCOTT: Semantically Constrained Otter, System Description. In A. Bundy, editor, Proceedings of the 12th International Conference on Automated Deduction, number 814 in Lecture Notes in Artificial Intelligence, pages 764–768. Springer-Verlag, 1994. [21] F. M. Suchanek, G. Kasneci, and G. Weikum. YAGO: A large ontology from Wikipedia and WordNet. J. Web Semantics, 6(3):203–217, 2008. [22] G. Sutcliffe and Y. Puzis. SRASS — A semantic relevance axiom selection system. In F. Pfenning, editor, CADE, volume 4603 of Lecture Notes in Computer Science, pages 295–310. Springer, 2007. [23] G. Sutcliffe, J. Urban, and S. Schulz, editors. Proceedings of the CADE-21 Workshop on Empirically Successful Automated Reasoning in Large Theories, volume 257 of CEUR Workshop Proceedings. CEUR-WS.org, 2007. [24] J. Urban. MoMM - fast interreduction and retrieval in large libraries of formalized mathematics. International Journal on Artificial Intelligence Tools, 15(1):109–130, 2006. [25] J. Urban. Mptp 0.2: Design, implementation, and initial experiments. J. Automated Reasoning, 37(1-2):21–43, 2006. [26] J. Urban, K. Hoder, and A. Voronkov. Evaluation of automated theorem proving on the Mizar mathematical library. In K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama, editors, ICMS, volume 6327 of Lecture Notes in Computer Science, pages 155–166. Springer, 2010. [27] J. Urban, G. Sutcliffe, P. Pudlák, and J. Vyskocil. MaLARea SG1–machine learner for automated reasoning with semantic guidance. In A. Armando, P. Baumgartner, and G. Dowek, editors, IJCAR, volume 5195 of Lecture Notes in Computer Science, pages 441–456. Springer, 2008. [28] J. Urban, Jirı́ Vyskocil, and Petr Stepánek. MaLeCoP: Machine learning connection prover. In K. Brünnler and G. Metcalfe, editors, TABLEAUX, volume 6793 of Lecture Notes in Computer Science, pages 263–277. Springer, 2011. [29] J. Vyskocil, D. Stanovský, and J. Urban. Automated proof compression by invention of new defini- tions. In E. M. Clarke and A. Voronkov, editors, LPAR (Dakar), volume 6355 of Lecture Notes in Computer Science, pages 447–462. Springer, 2010. [30] C. Weidenbach, D. Dimova, A. Fietzke, R. Kumar, M. Suda, and P. Wischnewski. Spass version 3.5. In R. A. Schmidt, editor, CADE, volume 5663 of Lecture Notes in Computer Science, pages 140–145. Springer, 2009. 8