Automating “human-like” example-use in mathematics

                                   Alison Pease1 , Ursula Martin2
                                        1
                                          University of Dundee
                                        2
                                          University of Oxford
                            A.Pease@dundee.ac.uk, Ursula.Martin@cs.ox.ac.uk


                      Abstract                             3 being supporting examples of the conjecture that all
                                                           integers have an even number of divisors, and 4 being
    We describe two studies into ways in which             a counterexample. The study found that examples are
    human mathematicians use mathematical ex-              used for different reasons at different points in a con-
    amples in their research. In the first study           versation, for instance to understand a conjecture, to
    we bring together theoretical and empirical ap-        test it, or extend it. For instance, consider the following
    proaches to studying ways in which examples            conversation, taken from an online forum for solving a
    are used in mathematical research, concluding          conjecture [3]:
    that examples are used for conjecture inven-
    tion, understanding, plausibility-testing, dis-            “If the points form a convex polygon, it is easy.”
    proof and modification. Where possible we                  [Anonymous July 19, 2011 @ 8:08 pm]
    describe corresponding efforts in automating
    these aspects of reasoning. In our second study            “Yes. Can we do it if there is a single point not
    we present an investigation based on grounded              on the convex hull of the points?” [Thomas H
    theory into example-use during an online math-             July 19, 2011 @ 8:09 pm]
    ematical conversation. These studies suggest               “Say there are four points: an equilateral tri-
    ways in which “human-like” example-use in                  angle, and then one point in the center of the
    mathematics could be further automated.                    triangle. No three points are collinear. It seems
                                                               to me that the windmill can not use the cen-
                                                               ter point more than once! As soon as it hits
1   Introduction                                               one of the corner points, it will cycle indefi-
                                                               nitely through the corners and never return to
At an event that we organised a few years ago [1], lead-       the center point. I must be missing something
ing mathematicians flagged the importance of collabo-          here...” [Jerzy July 19, 2011 @ 8:17 pm]
rative systems that “think like a mathematician”, han-
dle unstructured approaches such as the use of “sloppy”        “This isn’t true it will alternate between the
natural language and the exchange of informal knowl-           centre and each vertex of the triangle.” [Joe
edge and intuition not recorded in papers, and engage          July 19, 2011 @ 8:21 pm]
diverse researchers in creative problem solving. This
accords with work of cognitive scientists, sociologists,   Here we see people raising simple examples to under-
philosophers and the narrative accounts of mathemati-      stand a conjecture, and proposing and discussing other
cians themselves, which highlight the paradoxical nature   examples in order to explore and test the conjecture.
of mathematical practice — while the goal of mathemat-
ics is to discover mathematical truths justified by rig-   In this paper we bring together work on example-use in
orous argument, mathematical discovery involves “soft”     mathematics research and relate it to automated rea-
aspects such as creativity, informal argument, error and   soning. In the first study we consider a course-grained
analogy.                                                   empirical study (§2.1) and review theoretical ideas on
                                                           example-use (§2.2), describing resulting roles for exam-
We followed the above event with an empirical study of     ples and corresponding automated systems (§2.3). In the
what mathematicians talk about [22], and found that ex-    second we conduct a fine-grained analysis of a mathemat-
amples form the biggest single category. These may be      ical research conversation (§3) and show the importance
examples of a concept, such as the set of natural num-     of context, language and social pleasantries for talking
bers being an example of a group, and the numbers 3, 4,    about different kinds of example. We conclude by con-
and 5 an example of a Pythagorean triple, or support-      sidering next steps in building collaborative systems for
ing or counterexamples to a conjecture, such as 2 and      contributing examples in mathematical research (§4).
1.1    Motivation                                             concepts or conjectures: mathematical objects, such as
                                                              specific numbers, graphs or groups, which have particu-
The simulation of mathematical reasoning has been a           lar properties. For instance, the number 2 is an example
driving force throughout the history of Artificial In-        of the concept prime number, a supporting example of
telligence research, yet despite significant successes (eg    the fundamental theorem of arithmetic, and a counterex-
[10; 9; 13]) it has not achieved widespread adoption by       ample to the conjecture that all primes are odd. Exam-
mathematicians. An oft-cited reason for this is that cur-     ples were used for different purposes at different stages of
rent systems cannot do mathematics in the way that            the discussion. One of the first comments was a simple
humans do: machine proofs are thought to be unclear,          supporting example of the conjecture – the only example
uninspiring and untrustworthy, as opposed to human            explicitly raised in this context. Other supporting ex-
proofs which can be deep, elegant and explanatory. Tra-       amples were raised as elaboration or as highlighting the
ditionally there have been two barriers to developing sys-    necessity of a condition in order to explore the condition.
tems which produce “human-like” mathematics: firstly,         One comment contained an argument as to why a par-
it is difficult to know what this is; and secondly, it is     ticular example could not exist. Some examples – both
difficult to automate [6; 12]. Recent developments in         support and counterexamples – were raised as clarifica-
online collaborative mathematics, such as the Polymath        tion. 83% of the comments we categorised as concerning
and MathOverflow projects [4; 5] provide a remarkable         examples were about counterexamples (or examples of
opportunity for overcoming the first barrier: by provid-      undetermined status).
ing a working record of the backstage of mathematics,
complete with mistakes, deadends and unfinished work,         In a study of a sample mathoverflow conversations we
these constitute a rich evidence base for understanding       found that in a third of the responses explicit exam-
live mathematical practice. At the same time, we be-          ples were given, as evidence for, or counterexamples to,
lieve that we can start to overcome the second barrier        conjectures. Requests for examples of a phenomenon
by focusing in on a specific and particularly prominent       or an object with particular properties also featured in
aspect of mathematical practice – example-use in math-        our breakdown of questions as one of three predominant
ematics. This will allow us to (a) build on the increas-      kinds [16]. These analyses suggest that examples play a
ing sophistication of model generators in automated rea-      fundamental role in mathematical collaboration – a con-
soning, and (b) formulate constrained, measurable and         servative estimate puts it at a third of all mathematical
achievable goals for automated “human-like” example-          conversations that we analysed.
use in mathematics.
                                                              2.2    Theoretical work
2     Study 1: Theories of example-use in                     The small amount of theoretical work on the use of ex-
      mathematical research                                   amples in mathematical research – itself based on real-
                                                              world case studies – supports our findings. Polya wrote
2.1    An empirical study                                     extensively about the value of examples in conjecture
                                                              generation and testing [24], while Lakatos followed up
The mathematical community have developed systems             Polya’s ideas with a demonstration of the role that coun-
for “crowdsourcing” (albeit among a highly specialised        terexamples play, driving negotiation and development
crowd) the production of mathematics through collab-          of concepts, conjectures and proofs [15].
oration and sharing [20]. The Polymath and MiniPoly-
math collaborative proofs, a new idea led by Gowers and       In his “Induction in solid geometry” [25, Part III, pp
Tao, are of particular interest to us: these use a blog       35- ] Polya details how examples are invoked at differ-
and wiki for collaboration among mathematicians from          ent points to suggest, evaluate, develop and prove the
different backgrounds and have led to major advances          Descartes-Euler conjecture that for any polyhedra, the
[11]. Also of interest are discussion fora which allow        number of vertices (V) minus the number of edges (E)
rapid informal interaction and problem solving; in seven      plus the number of faces (F) is equal to 2. He starts
years the community question answering system for re-         with simple examples of polyhedra to find regularities
search mathematicians MathOverflow has around 70,000          and formulating initial conjectures. Once a conjecture
users and has hosted over 90,000 conversations. These         has been found, plausibility testing is performed with
systems provide substantial and unprecedented evidence        examples getting gradually more complex, looking for
for studying mathematical practice, allowing the aug-         examples which support rather than refute the conjec-
mentation of traditional ethnography with a variety of        ture. If it stands up to these then more severe exam-
empirical techniques for analysing the texts and network      ples are looked for - with the focus going from find-
structures of the interactions.                               ing supporting examples to looking for counterexamples,
                                                              leading to “very difficult” cases. If a conjecture sur-
In an ethnographic analysis of such a record [23], we clas-   vives this test, then a proof is sought. Lakatos takes
sified each conversation turn as relating to different as-    over at the this point, describing a rational reconstruc-
pects of mathematical activity and found that the largest     tion in which counterexamples drive conjecture and con-
single category was examples. Here we mean examples of        cept change and are used to strengthen a proof via var-
ious responses. These include three main methods of            keeping with the dialectical aspect described by Lakatos,
theorem formation – all triggered by counterexamples:          HRL is implemented in an agent architecture. Each
monster-barring (concerned with concept development);          agent has a copy of the HR system, and starts with a dif-
exception-barring (concerned with conjecture develop-          ferent database of examples to work with, and different
ment), and the method of proofs and refutations (con-          interestingness measures. Agents send conjectures, con-
cerned with proof development).                                cepts, counterexamples, or requests to a central agent,
                                                               who choreographs a discussion, using the example-based
2.3    Examples in Automated Reasoning                         methods prescribed by Lakatos to modify faulty conjec-
                                                               tures.
Example, or model, generation is one of the successes
of formal verification, interactive theorem proving, and
automated reasoning, with substantial research under-          3     Study 2: A fine-grained study of
lying these achievements. Techniques underlying such                 example-use in mathematics research
systems include methods based on first order reasoning,
constraint satisfaction, and propositional logic: see [27]     3.1     Source material
for an overview.
                                                               To apply empirical methods to the study of mathe-
We summarise our empirical studies in §2.1 and theo-           matical explanation we looked for a suitable source of
retical work presented in §2.2 as using examples for the       data which, ideally, would capture the live production of
following purposes: (i) conjecture invention; (ii) under-      mathematics rather than the finished outcome in text-
standing a conjecture; (iii) plausibility-testing; (iv) dis-   book or journal paper; would exhibit examples in prac-
proving or modifying a conjecture. Simulations of (i)          tice through capturing mathematical collaboration; and
and (iv) can found in automated reasoning: to the best         could be argued to represent the activities of a reasonable
of our knowledge there is little or no automated work          subset of the mathematical community. The dataset we
on (ii) or (iii). The first purpose is scientific induc-       chose was the first Mini-Polymath project, in 2009, an
tion, which underlies the machine learning paradigm.           online collaboration on a blog to solve problems drawn
Colton’s theory formation system HR [7] also uses ex-          from International Mathematical Olympiads.
amples, or objects of interest (such as specific groups
or numbers), to drive theory development. The sys-             The first Mini-Polymath project started on July 20th,
tem uses production rules to form new concepts from            2009 @ 6:02 am and ended August 15th, 2010 @ 3:30
old ones; measures of interestingness to drive a heuristic     pm. The problem statement came from Problem 6 of
search; empirical pattern-based conjecture making tech-        the Math Olympiad that year:
niques to find relationships between concepts, and third             Let a1 , a2 , . . . , an be distinct positive integers
party logic systems such as the Otter theorem prover[19],            and let M be a set of n − 1 positive integers not
the Mace model generator[17]. Otter is a first order res-            containing s = a1 +a2 +. . .+an . A grasshopper
olution theorem prover which has been used for many                  is to jump along the real axis, starting at the
discovery tasks in algebraic domains, e.g., [18]. Mace               point 0 and making n jumps to the right with
is a model generator which employs the Davis-Putnam                  lengths a1 , a2 , . . . , an in some order. Prove that
method for generating models to first order sentences.               the order can be chosen in such a way that the
The IsaCosy system by Johansson et al. is another in-                grasshopper never lands on any point in M .
stances of example-driven theory formation software [14]
which performs inductive theory formation by synthe-           The solution was found on 21st July, 2009@ 11:16 am af-
sising conjectures from the available constants and free       ter 201 comments. The total conversation contained 356
variables.                                                     comments and 32,430 words, and there were between
                                                               81 and 100 participants (some participants were anony-
Along with colleagues, Pease has developed two soft-           mous). We analysed the first 160 comments, which is
ware systems based on the fourth purpose above.                80% of the conversation leading up to the solution.1
The Theorem Modifier system (TM) [8] uses Lakatos’s
exception-barring methods to provide a flexible auto-
mated theorem-proving system. This is able to take in          3.2     Method
a conjecture, try to prove it and if unsuccessful (either
because the conjecture is too hard to prove or because it      We used an approach based on grounded theory [26]
is false), use supporting or counter-examples to produce       to conduct a fine-grained study. This is a data-driven
modified versions of the conjecture which it can prove.        method to systematically build integrated sets of con-
For instance, given the non-theorem that all groups are        cepts in a topic where little is known. Researchers keep
Abelian, TM states that it cannot prove the original re-       an open mind in order to build a theory which is purely
sult, but it has discovered that all self-inverse groups are      1
                                                                    The reason for 80% rather than the full 100% was purely
Abelian. The HRL system [21] is a computational rep-           pragmatic: such close analysis is extremely time-consuming
resentation of Lakatos’s theory, in which all of his main      to perform and it was felt that results were sufficiently robust
methods for theory development are implemented. In             after 80%.
grounded in data rather than influenced by prior work.         The open coding process created a total of 98 loosely
As is the standard in grounded theory, we followed four        connected codes and descriptions. Since we coded the
stages:                                                        conversation into any appropriate size chunk of data and
                                                               allowed multiple codings, we could get more code appli-
 1. Open coding: Use the raw data to suggest code def-         cations than number of comments: in fact we got signif-
    initions (anchors that help to identify key points in      icantly more, with 646 applications of the codes across
    the data).                                                 our 160 comments.
 2. Axial coding: Development of concepts by combin-
    ing codes into collections of similar content.             Axial coding
 3. Selective coding: Grouping the concepts into cate-
    gories - put the data back together by making con-         In the second stage, we identified interrelationships be-
    nections across codes, categories, and concepts.           tween our codes and formed concepts by combining
                                                               codes, to describe repeated patterns of interactions and
 4. Theory building: Compare the central phenomenon            problem solving strategies in the conversation. We found
    across several dimensions, and formulate the major         23 concepts, including examples, and also error, expla-
    themes which have emerged.                                 nation, question, re-representation, references to other
Here codes, concepts and categories are different levels       mathematics, metaphors and requests for help (we show
of abstraction and are the building blocks for a grounded      interrelationships for our examples category in Figure 2).
theory.
We used a software tool for grounded theory and mixed          Selective coding/Theory building
methods research, dedoose [2], shown in Figure 1. Here
we see the codification of a comment. It can be seen           In the third and fourth stages, we grouped concepts into
that we sometimes applied multiple codes within a single       categories, making connections across codes, categories,
comment, or sentence; even applying overlapping codes          and concepts. The following main categories emerged:
if appropriate. In this example for instance, the words        context (often a mathematical object such as a conjec-
“ugle”, “elusive” and “hope” (which occurs twice) are          ture or a concept), the language used (for instance, emo-
coded as emotion or value words, which has been cate-          tive or values language, or metaphors), and the social
gorised under language.                                        pleasantries needed to keep the conversation flowing (for
                                                               instance, thanking a participant for their contribution or
The coding was mainly done by the first author, who has        introducing a new comment in a very humble way).
a first degree in Mathematics, a PhD in a related disci-
pline and more than 10 years experience studying math-         We show a visual representation of how these categories
ematical reasoning. Her analysis was discussed at length       relate to each other and to different kinds of examples
and during different stages of analysis with the second        in Figure 3.
author, who has a PhD in Mathematics and over 10
years experience as a professional research mathemati-         Context:
cian. Any differences of opinion were resolved, allowing
                                                               Hypothesized cases to explore a conjecture were very
us to align our ideas.
                                                               common, and these arose in the context of introduc-
                                                               ing notation, re-representation, meta-comments about
3.3    Findings                                                proof, explanation, justification, induction and using
                                                               them to asking questions. Simple supporting examples
In order to build a substantive theory of example-use          were used to explore a conjecture.
in mathematical research, we followed the stages of
grounded theory as described below.                            Counterexamples were largely used in the context of
                                                               explanation, asking questions and proof development,
Open coding                                                    including induction, case splits and constructive argu-
                                                               ments. They were also used in the context of errors
In the first stage, we identified and coded code definitions   being pointed out.
of interest via open coding. In the example below, a
participant gives a local conjecture about the shortest        Examples were also suggested as useful cases, rather
route and points out a problem example straightaway.           than directly supporting or refuting a given conjecture.
This comment excerpt is coded as “Counterexample to            For instance, worst case scenarios were brought up in the
a local conjecture”                                            context of proof strategies.
      I had hoped that the shortest would always               There were no examples found being used in the context
      be (mk , mk+1 ), but this doesn’t seem to be             of rhetorical questions or requests for help, and very few
      true: consider A = (9, 10, 11, 30) and M =               in the context of directly considering the plausibility of
      (11, 30, 49).                                            a conjecture.
                    Figure 1: Example showing the codification of a comment using dedoose [2]


Figure 2: Interrelationships identified in Stages 2 and 3 between codes relevant to our examples category, where arcs
represent subset relationships.
                                 value words                                                                                      reference to a
                                                       hypothesised cases                                                       previous comment

                                                                                                                                  hypothesised cases
                                                                                                               social
                                hypothesised cases        language
                                                                                                            pleasantries          simple supporting case


                                metaphors                                                                                       humble language
                                                                                           Profile of
                                                                                           example


                                                                                                         worst-case scenarios
                                                              simple                        context
                                                        supporting examples
                                                                                                                      proof strategies
                                         conjecture
                                         exploration                        hypothesised cases           counterexamples


                                                                                                                  explanation,
                                                                notation, re-representation,                   asking questions,
                                                               meta, explanation, justification,           proof development, errors
                                                                    induction, questions


Figure 3: A diagrammatic model of our findings, showing the three central themes that emerged and how they were
exhibited. Specific types of example in each case are shown in italics.

Language:                                                                                        either by number, by the content, or using the name of
                                                                                                 the participant who had said it.
Value words were used describing some examples, for
instance, examples were described as degenerate or as                                            Humble language was used occasionally when exploring
particularly useful (value words) in a given context. The                                        a simple case which supports a sub-conjecture: “(I may
quote below discusses an example, breaking it down into                                          make mistakes here.)”
various scenarios and using them to discuss proof strat-
egy and in particular when induction was “a bit of a
sledgehammer” (metaphor), and when it was “more ap-                                              4      Further work and conclusions
propriate” (each time showing why). Emotive words
such as “cheating” are used in a lighthearted way:                                               The new records of mathematical reasoning, our ethno-
                                                                                                 graphic studies highlighting the importance of the ex-
    Let’s take ai = i for i = 1, 2, 3, 4. So we’re                                               ample in such reasoning, alongside the development of
    trying to get to 10 in steps of 1,2,3,4 and there                                            sophisticated model generation systems, mean that we
    are three landmines.                                                                         are now in a position to build on our insight into the use
                                                                                                 of examples in mathematics and extend it in a compu-
    If there’s a landmine on any of 1,2,3,4, then by                                             tational way. We plan to automate those roles (ii) and
    47 (@liuxiaochuan) they must be on 4, or 4 and                                               (iii) for which we found no corresponding system in §2,
    3, o r 4 and 3 and 2. In the third case we can                                               and to build on results from §3 to guide us in construct-
    go to 1 and then to 5, and then were done by                                                 ing a system which can useful contribute examples to a
    induction (two steps and zero obstacles, so per-                                             mathematics research conversation.
    haps induction was a bit of a sledgehammer).
    If there are obstacles on 4 and 3, then induction                                            Building a system which can do even a limited aspect of
    is more appropriate – we can either get to 5 in                                              “human-like” mathematics will open the way for a new
    two steps and are then done, or theres an ob-                                                form of mixed-initiative, collaborative reasoning between
    stacle at 5, in which case we can go 2,6,7,10. If                                            human and software participants in interactions which
    theres just an obstacle at 4, things get harder,                                             are both naturalistic but formally constrained and well-
    since then we need to know what goes on after                                                defined. This has the potential to impact both the pro-
    4. But then we can cheat and say that at least                                               fessional practice and pedagogy of mathematics.
    one number between 6 and 9 is an obstacle so
    we can run things in reverse. The only case not
    covered is then when the obstacles are at 4,5,6.
                                                                                                 Acknowledgments

Social pleasantries:                                                                             This work was supported by EPSRC grants
                                                                                                 EP/P017320/1 and EP/K040251/2. We are grate-
When people were discussing a hypothesized case to ex-                                           ful to our anonymous reviewers for their thoughtful
plore, they frequently referred to a previous comment –                                          comments.
References                                               [20] M. Nielsen. Reinventing Discovery: The New Era
                                                              of Networked Science. Princeton University Press,
[1] http://events.inf.ed.ac.uk/sicsa-mcp/.
                                                              USA, 2011.
[2] Dedoose. www.dedoose.com.                            [21] A. Pease. A computational model of lakatos-style
[3] Minipolymath3                               project.      reasoning. Philosophy of Mathematics Education
     http://polymathprojects.org/2011/07/19/minipolymath3- Journal, ISSN 1465-2978 (Online)(27), April 2013.
     project-2011-imo/.                                  [22] A. Pease and U. Martin. Seventy four minutes
[4] The polymath blog. http://polymathprojects.org/.          of mathematics: An analysis of the third mini-
                                                              polymath project. In Proceedings of the AISB Sym-
[5] Mathoverflow. http://mathoverflow.net, September
                                                              posium on Mathematical Practice and Cognition II,
     2009.                                                    pages 19–29, 2012.
[6] A Bundy. Automated theorem provers: a practical      [23] A. Pease and U. Martin. Summary of an ethno-
     tool? Ann Math Artif Intell, 61:3–14, 2011.              graphic study of the third mini-polymath project.
[7] S. Colton. Automated Theory Formation in Pure             In Computability in Europe. Informal presentation,
     Mathematics. Springer-Verlag, 2002.                      2012.
[8] S. Colton and A. Pease. The TM system for re-        [  ]
                                                          24 G. Pólya. Mathematics and plausible reasoning:
     pairing non-theorems. In Selected papers from the        Induction and analogy in mathematics, volume I.
     IJCAR’04 disproving workshop, Electronic Notes in        Princeton University Press, 1954.
     Theoretical Computer Science, volume 125(3). Else-  [25] G. Pólya. Mathematical Discovery. John Wiley and
     vier, 2005.                                              Sons, New York, 1962.
[9] T Hales et al. A revision of the proof of the kepler [26] A. Strauss and C. Juliet. Grounded theory method-
     conjecture. Discrete & Comp Geom, 44, 2010.              ology: An overview. In N. Denzin and Y. Lincoln,
[10] G Gonthier. Advances in the formalization of the         editors, Handbook of Qualitative Research (1st ed),
     odd order theorem. Proc. of Interactive Theorem          pages 273–284. Sage Publications, Thousand Oaks,
     Proving, 6898, 2011.                                     CA., 1994.
[11] T. Gowers and M. Nielsen. Massively collaborative   [27] H. Zhang and J. Zhang. Mace4 and sem: A compar-
     mathematics. Nature, 461(7266):879–881, 2009.            ison of finite model generators. In M. P. Bonacina
                                                              and M. E. Stickel, editors, Automated Reasoning
[12] W. T. Gowers. Rough structure and classification.        and Mathematics: Essays in Memory of William
     GAFA (Geometric And Functional Analysis), Spe-           W. McCune, pages 101–130.
     cial volume – GAFA2000(1–0), 2000.
[13] J Harrison. andbook of practical logic and auto-
     mated reasoning. 2009.
[14] M. Johansson, L. Dixon, and A. Bundy. Conjecture
     synthesis for inductive theories. Journal of Auto-
     mated Reasoning, 47(3):251–289, 2011.
[15] I. Lakatos. Proofs and Refutations. Cambridge Uni-
     versity Press, Cambridge, 1976.
[16] Ursula Martin and Alison Pease. Mathematical
     practice, crowdsourcing, and social machines. In
     Jacques Carette, David Aspinall, Christoph Lange,
     Petr Sojka, and Wolfgang Windsteiger, editors, In-
     telligent Computer Mathematics, volume 7961 of
     Lecture Notes in Computer Science, pages 98–119.
     Springer Berlin Heidelberg, 2013.
[17] W. McCune. MACE 2 reference manual. Techni-
     cal Report ANL/MCS-TM-249,, Argonne National
     Laboratories, 2001.
[18] W. McCune and R. Padmanabhan. Automated
     Deduction in Equational Logic and Cubic Curves,
     LNAI, 1095. Springer-Verlag, 1996.
[19] W.W. McCune. Otter 3.0 Reference Manual and
     Guide. Technical Report ANL-94/6, Argonne Na-
     tional Laboratory, Argonne, USA, 1994.