Automating “human-like” example-use in mathematics Alison Pease1 , Ursula Martin2 1 University of Dundee 2 University of Oxford A.Pease@dundee.ac.uk, Ursula.Martin@cs.ox.ac.uk Abstract 3 being supporting examples of the conjecture that all integers have an even number of divisors, and 4 being We describe two studies into ways in which a counterexample. The study found that examples are human mathematicians use mathematical ex- used for different reasons at different points in a con- amples in their research. In the first study versation, for instance to understand a conjecture, to we bring together theoretical and empirical ap- test it, or extend it. For instance, consider the following proaches to studying ways in which examples conversation, taken from an online forum for solving a are used in mathematical research, concluding conjecture [3]: that examples are used for conjecture inven- tion, understanding, plausibility-testing, dis- “If the points form a convex polygon, it is easy.” proof and modification. Where possible we [Anonymous July 19, 2011 @ 8:08 pm] describe corresponding efforts in automating these aspects of reasoning. In our second study “Yes. Can we do it if there is a single point not we present an investigation based on grounded on the convex hull of the points?” [Thomas H theory into example-use during an online math- July 19, 2011 @ 8:09 pm] ematical conversation. These studies suggest “Say there are four points: an equilateral tri- ways in which “human-like” example-use in angle, and then one point in the center of the mathematics could be further automated. triangle. No three points are collinear. It seems to me that the windmill can not use the cen- ter point more than once! As soon as it hits 1 Introduction one of the corner points, it will cycle indefi- nitely through the corners and never return to At an event that we organised a few years ago [1], lead- the center point. I must be missing something ing mathematicians flagged the importance of collabo- here...” [Jerzy July 19, 2011 @ 8:17 pm] rative systems that “think like a mathematician”, han- dle unstructured approaches such as the use of “sloppy” “This isn’t true it will alternate between the natural language and the exchange of informal knowl- centre and each vertex of the triangle.” [Joe edge and intuition not recorded in papers, and engage July 19, 2011 @ 8:21 pm] diverse researchers in creative problem solving. This accords with work of cognitive scientists, sociologists, Here we see people raising simple examples to under- philosophers and the narrative accounts of mathemati- stand a conjecture, and proposing and discussing other cians themselves, which highlight the paradoxical nature examples in order to explore and test the conjecture. of mathematical practice — while the goal of mathemat- ics is to discover mathematical truths justified by rig- In this paper we bring together work on example-use in orous argument, mathematical discovery involves “soft” mathematics research and relate it to automated rea- aspects such as creativity, informal argument, error and soning. In the first study we consider a course-grained analogy. empirical study (§2.1) and review theoretical ideas on example-use (§2.2), describing resulting roles for exam- We followed the above event with an empirical study of ples and corresponding automated systems (§2.3). In the what mathematicians talk about [22], and found that ex- second we conduct a fine-grained analysis of a mathemat- amples form the biggest single category. These may be ical research conversation (§3) and show the importance examples of a concept, such as the set of natural num- of context, language and social pleasantries for talking bers being an example of a group, and the numbers 3, 4, about different kinds of example. We conclude by con- and 5 an example of a Pythagorean triple, or support- sidering next steps in building collaborative systems for ing or counterexamples to a conjecture, such as 2 and contributing examples in mathematical research (§4). 1.1 Motivation concepts or conjectures: mathematical objects, such as specific numbers, graphs or groups, which have particu- The simulation of mathematical reasoning has been a lar properties. For instance, the number 2 is an example driving force throughout the history of Artificial In- of the concept prime number, a supporting example of telligence research, yet despite significant successes (eg the fundamental theorem of arithmetic, and a counterex- [10; 9; 13]) it has not achieved widespread adoption by ample to the conjecture that all primes are odd. Exam- mathematicians. An oft-cited reason for this is that cur- ples were used for different purposes at different stages of rent systems cannot do mathematics in the way that the discussion. One of the first comments was a simple humans do: machine proofs are thought to be unclear, supporting example of the conjecture – the only example uninspiring and untrustworthy, as opposed to human explicitly raised in this context. Other supporting ex- proofs which can be deep, elegant and explanatory. Tra- amples were raised as elaboration or as highlighting the ditionally there have been two barriers to developing sys- necessity of a condition in order to explore the condition. tems which produce “human-like” mathematics: firstly, One comment contained an argument as to why a par- it is difficult to know what this is; and secondly, it is ticular example could not exist. Some examples – both difficult to automate [6; 12]. Recent developments in support and counterexamples – were raised as clarifica- online collaborative mathematics, such as the Polymath tion. 83% of the comments we categorised as concerning and MathOverflow projects [4; 5] provide a remarkable examples were about counterexamples (or examples of opportunity for overcoming the first barrier: by provid- undetermined status). ing a working record of the backstage of mathematics, complete with mistakes, deadends and unfinished work, In a study of a sample mathoverflow conversations we these constitute a rich evidence base for understanding found that in a third of the responses explicit exam- live mathematical practice. At the same time, we be- ples were given, as evidence for, or counterexamples to, lieve that we can start to overcome the second barrier conjectures. Requests for examples of a phenomenon by focusing in on a specific and particularly prominent or an object with particular properties also featured in aspect of mathematical practice – example-use in math- our breakdown of questions as one of three predominant ematics. This will allow us to (a) build on the increas- kinds [16]. These analyses suggest that examples play a ing sophistication of model generators in automated rea- fundamental role in mathematical collaboration – a con- soning, and (b) formulate constrained, measurable and servative estimate puts it at a third of all mathematical achievable goals for automated “human-like” example- conversations that we analysed. use in mathematics. 2.2 Theoretical work 2 Study 1: Theories of example-use in The small amount of theoretical work on the use of ex- mathematical research amples in mathematical research – itself based on real- world case studies – supports our findings. Polya wrote 2.1 An empirical study extensively about the value of examples in conjecture generation and testing [24], while Lakatos followed up The mathematical community have developed systems Polya’s ideas with a demonstration of the role that coun- for “crowdsourcing” (albeit among a highly specialised terexamples play, driving negotiation and development crowd) the production of mathematics through collab- of concepts, conjectures and proofs [15]. oration and sharing [20]. The Polymath and MiniPoly- math collaborative proofs, a new idea led by Gowers and In his “Induction in solid geometry” [25, Part III, pp Tao, are of particular interest to us: these use a blog 35- ] Polya details how examples are invoked at differ- and wiki for collaboration among mathematicians from ent points to suggest, evaluate, develop and prove the different backgrounds and have led to major advances Descartes-Euler conjecture that for any polyhedra, the [11]. Also of interest are discussion fora which allow number of vertices (V) minus the number of edges (E) rapid informal interaction and problem solving; in seven plus the number of faces (F) is equal to 2. He starts years the community question answering system for re- with simple examples of polyhedra to find regularities search mathematicians MathOverflow has around 70,000 and formulating initial conjectures. Once a conjecture users and has hosted over 90,000 conversations. These has been found, plausibility testing is performed with systems provide substantial and unprecedented evidence examples getting gradually more complex, looking for for studying mathematical practice, allowing the aug- examples which support rather than refute the conjec- mentation of traditional ethnography with a variety of ture. If it stands up to these then more severe exam- empirical techniques for analysing the texts and network ples are looked for - with the focus going from find- structures of the interactions. ing supporting examples to looking for counterexamples, leading to “very difficult” cases. If a conjecture sur- In an ethnographic analysis of such a record [23], we clas- vives this test, then a proof is sought. Lakatos takes sified each conversation turn as relating to different as- over at the this point, describing a rational reconstruc- pects of mathematical activity and found that the largest tion in which counterexamples drive conjecture and con- single category was examples. Here we mean examples of cept change and are used to strengthen a proof via var- ious responses. These include three main methods of keeping with the dialectical aspect described by Lakatos, theorem formation – all triggered by counterexamples: HRL is implemented in an agent architecture. Each monster-barring (concerned with concept development); agent has a copy of the HR system, and starts with a dif- exception-barring (concerned with conjecture develop- ferent database of examples to work with, and different ment), and the method of proofs and refutations (con- interestingness measures. Agents send conjectures, con- cerned with proof development). cepts, counterexamples, or requests to a central agent, who choreographs a discussion, using the example-based 2.3 Examples in Automated Reasoning methods prescribed by Lakatos to modify faulty conjec- tures. Example, or model, generation is one of the successes of formal verification, interactive theorem proving, and automated reasoning, with substantial research under- 3 Study 2: A fine-grained study of lying these achievements. Techniques underlying such example-use in mathematics research systems include methods based on first order reasoning, constraint satisfaction, and propositional logic: see [27] 3.1 Source material for an overview. To apply empirical methods to the study of mathe- We summarise our empirical studies in §2.1 and theo- matical explanation we looked for a suitable source of retical work presented in §2.2 as using examples for the data which, ideally, would capture the live production of following purposes: (i) conjecture invention; (ii) under- mathematics rather than the finished outcome in text- standing a conjecture; (iii) plausibility-testing; (iv) dis- book or journal paper; would exhibit examples in prac- proving or modifying a conjecture. Simulations of (i) tice through capturing mathematical collaboration; and and (iv) can found in automated reasoning: to the best could be argued to represent the activities of a reasonable of our knowledge there is little or no automated work subset of the mathematical community. The dataset we on (ii) or (iii). The first purpose is scientific induc- chose was the first Mini-Polymath project, in 2009, an tion, which underlies the machine learning paradigm. online collaboration on a blog to solve problems drawn Colton’s theory formation system HR [7] also uses ex- from International Mathematical Olympiads. amples, or objects of interest (such as specific groups or numbers), to drive theory development. The sys- The first Mini-Polymath project started on July 20th, tem uses production rules to form new concepts from 2009 @ 6:02 am and ended August 15th, 2010 @ 3:30 old ones; measures of interestingness to drive a heuristic pm. The problem statement came from Problem 6 of search; empirical pattern-based conjecture making tech- the Math Olympiad that year: niques to find relationships between concepts, and third Let a1 , a2 , . . . , an be distinct positive integers party logic systems such as the Otter theorem prover[19], and let M be a set of n − 1 positive integers not the Mace model generator[17]. Otter is a first order res- containing s = a1 +a2 +. . .+an . A grasshopper olution theorem prover which has been used for many is to jump along the real axis, starting at the discovery tasks in algebraic domains, e.g., [18]. Mace point 0 and making n jumps to the right with is a model generator which employs the Davis-Putnam lengths a1 , a2 , . . . , an in some order. Prove that method for generating models to first order sentences. the order can be chosen in such a way that the The IsaCosy system by Johansson et al. is another in- grasshopper never lands on any point in M . stances of example-driven theory formation software [14] which performs inductive theory formation by synthe- The solution was found on 21st July, 2009@ 11:16 am af- sising conjectures from the available constants and free ter 201 comments. The total conversation contained 356 variables. comments and 32,430 words, and there were between 81 and 100 participants (some participants were anony- Along with colleagues, Pease has developed two soft- mous). We analysed the first 160 comments, which is ware systems based on the fourth purpose above. 80% of the conversation leading up to the solution.1 The Theorem Modifier system (TM) [8] uses Lakatos’s exception-barring methods to provide a flexible auto- mated theorem-proving system. This is able to take in 3.2 Method a conjecture, try to prove it and if unsuccessful (either because the conjecture is too hard to prove or because it We used an approach based on grounded theory [26] is false), use supporting or counter-examples to produce to conduct a fine-grained study. This is a data-driven modified versions of the conjecture which it can prove. method to systematically build integrated sets of con- For instance, given the non-theorem that all groups are cepts in a topic where little is known. Researchers keep Abelian, TM states that it cannot prove the original re- an open mind in order to build a theory which is purely sult, but it has discovered that all self-inverse groups are 1 The reason for 80% rather than the full 100% was purely Abelian. The HRL system [21] is a computational rep- pragmatic: such close analysis is extremely time-consuming resentation of Lakatos’s theory, in which all of his main to perform and it was felt that results were sufficiently robust methods for theory development are implemented. In after 80%. grounded in data rather than influenced by prior work. The open coding process created a total of 98 loosely As is the standard in grounded theory, we followed four connected codes and descriptions. Since we coded the stages: conversation into any appropriate size chunk of data and allowed multiple codings, we could get more code appli- 1. Open coding: Use the raw data to suggest code def- cations than number of comments: in fact we got signif- initions (anchors that help to identify key points in icantly more, with 646 applications of the codes across the data). our 160 comments. 2. Axial coding: Development of concepts by combin- ing codes into collections of similar content. Axial coding 3. Selective coding: Grouping the concepts into cate- gories - put the data back together by making con- In the second stage, we identified interrelationships be- nections across codes, categories, and concepts. tween our codes and formed concepts by combining codes, to describe repeated patterns of interactions and 4. Theory building: Compare the central phenomenon problem solving strategies in the conversation. We found across several dimensions, and formulate the major 23 concepts, including examples, and also error, expla- themes which have emerged. nation, question, re-representation, references to other Here codes, concepts and categories are different levels mathematics, metaphors and requests for help (we show of abstraction and are the building blocks for a grounded interrelationships for our examples category in Figure 2). theory. We used a software tool for grounded theory and mixed Selective coding/Theory building methods research, dedoose [2], shown in Figure 1. Here we see the codification of a comment. It can be seen In the third and fourth stages, we grouped concepts into that we sometimes applied multiple codes within a single categories, making connections across codes, categories, comment, or sentence; even applying overlapping codes and concepts. The following main categories emerged: if appropriate. In this example for instance, the words context (often a mathematical object such as a conjec- “ugle”, “elusive” and “hope” (which occurs twice) are ture or a concept), the language used (for instance, emo- coded as emotion or value words, which has been cate- tive or values language, or metaphors), and the social gorised under language. pleasantries needed to keep the conversation flowing (for instance, thanking a participant for their contribution or The coding was mainly done by the first author, who has introducing a new comment in a very humble way). a first degree in Mathematics, a PhD in a related disci- pline and more than 10 years experience studying math- We show a visual representation of how these categories ematical reasoning. Her analysis was discussed at length relate to each other and to different kinds of examples and during different stages of analysis with the second in Figure 3. author, who has a PhD in Mathematics and over 10 years experience as a professional research mathemati- Context: cian. Any differences of opinion were resolved, allowing Hypothesized cases to explore a conjecture were very us to align our ideas. common, and these arose in the context of introduc- ing notation, re-representation, meta-comments about 3.3 Findings proof, explanation, justification, induction and using them to asking questions. Simple supporting examples In order to build a substantive theory of example-use were used to explore a conjecture. in mathematical research, we followed the stages of grounded theory as described below. Counterexamples were largely used in the context of explanation, asking questions and proof development, Open coding including induction, case splits and constructive argu- ments. They were also used in the context of errors In the first stage, we identified and coded code definitions being pointed out. of interest via open coding. In the example below, a participant gives a local conjecture about the shortest Examples were also suggested as useful cases, rather route and points out a problem example straightaway. than directly supporting or refuting a given conjecture. This comment excerpt is coded as “Counterexample to For instance, worst case scenarios were brought up in the a local conjecture” context of proof strategies. I had hoped that the shortest would always There were no examples found being used in the context be (mk , mk+1 ), but this doesn’t seem to be of rhetorical questions or requests for help, and very few true: consider A = (9, 10, 11, 30) and M = in the context of directly considering the plausibility of (11, 30, 49). a conjecture. Figure 1: Example showing the codification of a comment using dedoose [2] Figure 2: Interrelationships identified in Stages 2 and 3 between codes relevant to our examples category, where arcs represent subset relationships. value words reference to a hypothesised cases previous comment hypothesised cases social hypothesised cases language pleasantries simple supporting case metaphors humble language Profile of example worst-case scenarios simple context supporting examples proof strategies conjecture exploration hypothesised cases counterexamples explanation, notation, re-representation, asking questions, meta, explanation, justification, proof development, errors induction, questions Figure 3: A diagrammatic model of our findings, showing the three central themes that emerged and how they were exhibited. Specific types of example in each case are shown in italics. Language: either by number, by the content, or using the name of the participant who had said it. Value words were used describing some examples, for instance, examples were described as degenerate or as Humble language was used occasionally when exploring particularly useful (value words) in a given context. The a simple case which supports a sub-conjecture: “(I may quote below discusses an example, breaking it down into make mistakes here.)” various scenarios and using them to discuss proof strat- egy and in particular when induction was “a bit of a sledgehammer” (metaphor), and when it was “more ap- 4 Further work and conclusions propriate” (each time showing why). Emotive words such as “cheating” are used in a lighthearted way: The new records of mathematical reasoning, our ethno- graphic studies highlighting the importance of the ex- Let’s take ai = i for i = 1, 2, 3, 4. So we’re ample in such reasoning, alongside the development of trying to get to 10 in steps of 1,2,3,4 and there sophisticated model generation systems, mean that we are three landmines. are now in a position to build on our insight into the use of examples in mathematics and extend it in a compu- If there’s a landmine on any of 1,2,3,4, then by tational way. We plan to automate those roles (ii) and 47 (@liuxiaochuan) they must be on 4, or 4 and (iii) for which we found no corresponding system in §2, 3, o r 4 and 3 and 2. In the third case we can and to build on results from §3 to guide us in construct- go to 1 and then to 5, and then were done by ing a system which can useful contribute examples to a induction (two steps and zero obstacles, so per- mathematics research conversation. haps induction was a bit of a sledgehammer). If there are obstacles on 4 and 3, then induction Building a system which can do even a limited aspect of is more appropriate – we can either get to 5 in “human-like” mathematics will open the way for a new two steps and are then done, or theres an ob- form of mixed-initiative, collaborative reasoning between stacle at 5, in which case we can go 2,6,7,10. If human and software participants in interactions which theres just an obstacle at 4, things get harder, are both naturalistic but formally constrained and well- since then we need to know what goes on after defined. This has the potential to impact both the pro- 4. But then we can cheat and say that at least fessional practice and pedagogy of mathematics. one number between 6 and 9 is an obstacle so we can run things in reverse. The only case not covered is then when the obstacles are at 4,5,6. Acknowledgments Social pleasantries: This work was supported by EPSRC grants EP/P017320/1 and EP/K040251/2. We are grate- When people were discussing a hypothesized case to ex- ful to our anonymous reviewers for their thoughtful plore, they frequently referred to a previous comment – comments. References [20] M. Nielsen. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, [1] http://events.inf.ed.ac.uk/sicsa-mcp/. USA, 2011. [2] Dedoose. www.dedoose.com. [21] A. Pease. A computational model of lakatos-style [3] Minipolymath3 project. reasoning. Philosophy of Mathematics Education http://polymathprojects.org/2011/07/19/minipolymath3- Journal, ISSN 1465-2978 (Online)(27), April 2013. project-2011-imo/. [22] A. Pease and U. Martin. Seventy four minutes [4] The polymath blog. http://polymathprojects.org/. of mathematics: An analysis of the third mini- polymath project. In Proceedings of the AISB Sym- [5] Mathoverflow. http://mathoverflow.net, September posium on Mathematical Practice and Cognition II, 2009. pages 19–29, 2012. [6] A Bundy. Automated theorem provers: a practical [23] A. Pease and U. Martin. Summary of an ethno- tool? Ann Math Artif Intell, 61:3–14, 2011. graphic study of the third mini-polymath project. [7] S. Colton. Automated Theory Formation in Pure In Computability in Europe. Informal presentation, Mathematics. Springer-Verlag, 2002. 2012. [8] S. Colton and A. Pease. The TM system for re- [ ] 24 G. Pólya. Mathematics and plausible reasoning: pairing non-theorems. In Selected papers from the Induction and analogy in mathematics, volume I. IJCAR’04 disproving workshop, Electronic Notes in Princeton University Press, 1954. Theoretical Computer Science, volume 125(3). Else- [25] G. Pólya. Mathematical Discovery. John Wiley and vier, 2005. Sons, New York, 1962. [9] T Hales et al. A revision of the proof of the kepler [26] A. Strauss and C. Juliet. Grounded theory method- conjecture. Discrete & Comp Geom, 44, 2010. ology: An overview. In N. Denzin and Y. Lincoln, [10] G Gonthier. Advances in the formalization of the editors, Handbook of Qualitative Research (1st ed), odd order theorem. Proc. of Interactive Theorem pages 273–284. Sage Publications, Thousand Oaks, Proving, 6898, 2011. CA., 1994. [11] T. Gowers and M. Nielsen. Massively collaborative [27] H. Zhang and J. Zhang. Mace4 and sem: A compar- mathematics. Nature, 461(7266):879–881, 2009. ison of finite model generators. In M. P. Bonacina and M. E. Stickel, editors, Automated Reasoning [12] W. T. Gowers. Rough structure and classification. and Mathematics: Essays in Memory of William GAFA (Geometric And Functional Analysis), Spe- W. McCune, pages 101–130. cial volume – GAFA2000(1–0), 2000. [13] J Harrison. andbook of practical logic and auto- mated reasoning. 2009. [14] M. Johansson, L. Dixon, and A. Bundy. Conjecture synthesis for inductive theories. Journal of Auto- mated Reasoning, 47(3):251–289, 2011. [15] I. Lakatos. Proofs and Refutations. Cambridge Uni- versity Press, Cambridge, 1976. [16] Ursula Martin and Alison Pease. Mathematical practice, crowdsourcing, and social machines. In Jacques Carette, David Aspinall, Christoph Lange, Petr Sojka, and Wolfgang Windsteiger, editors, In- telligent Computer Mathematics, volume 7961 of Lecture Notes in Computer Science, pages 98–119. Springer Berlin Heidelberg, 2013. [17] W. McCune. MACE 2 reference manual. Techni- cal Report ANL/MCS-TM-249,, Argonne National Laboratories, 2001. [18] W. McCune and R. Padmanabhan. Automated Deduction in Equational Logic and Cubic Curves, LNAI, 1095. Springer-Verlag, 1996. [19] W.W. McCune. Otter 3.0 Reference Manual and Guide. Technical Report ANL-94/6, Argonne Na- tional Laboratory, Argonne, USA, 1994.