Logical and Psychological Analysis of Deductive Mastermind Nina Gierasimczuk1 , Han van der Maas2 , and Maartje Raijmakers2 1 Institute for Logic, Language and Computation, University of Amsterdam, Nina.Gierasimczuk@gmail.com, 2 Department of Psychology, University of Amsterdam Abstract. The paper proposes a way to analyze logical reasoning in a deductive version of the Mastermind game implemented within the Math Garden educational system. Our main goal is to derive predictions about the cognitive difficulty of game-plays, e.g., the number of steps needed for solving the logical tasks or the working memory load. Our model is based on the analytic tableaux method, known from proof theory. We associate the difficulty of the Deductive Mastermind game-items with the size of the corresponding logical tree derived by the tableau method. We dis- cuss possible empirical hypotheses based on this model, and preliminary results that prove the relevance of our theory. Keywords: Deductive Mastermind, Mastermind game, deductive rea- soning, analytic tableaux, Math Garden, educational tools 1 Introduction and Background Computational and logical analysis has already proven useful for the investi- gations into the cognitive difficulty of linguistic and communicative tasks (see [1–5]). We follow this line of research by adapting formal logical tools to directly analyze the difficulty of non-linguistic logical reasoning. Our object of study is a deductive version of the Mastermind game. Although the game has been used to investigate the acquisition of complex skills and strategies in the domain of reasoning about others [6], as far as we know, it has not yet been studied for the difficulty of the accompanying deductive reasoning. Our model of reasoning is based on the proof-theoretical method of analytic tableaux for propositional logic, and it gives predictions about the empirical difficulty of game-items. In the remaining part of this section we give the background of our work: we ex- plain the classical and static versions of the Mastermind game, and describe the main principles of the online Math Garden system. In Section 2 we introduce Deductive Mastermind as implemented within Math Garden. Section 3 gives a logical analysis of Deductive Mastermind game-items using the tableau method. Finally, Section 4 discusses some hypotheses drawn on the basis of our model, and gives preliminary results. In the end we briefly discuss the directions for further work. 2 1.1 Mastermind Game Mastermind is a code-breaking game for two players. The modern game with pegs was invented in 1970 by Mordecai Meirowitz, but the game resembles an earlier pen and paper game called Bulls and Cows. The Mastermind game, as known today, consists of a decoding board, code pegs of k colors, and feedback pegs of black and white. There are two players, the code-maker, who chooses a secret pattern of ` code pegs (color duplicates are allowed), and the code-breaker, who guesses the pattern, in a given n rounds. Each round consists of code-breaker making a guess by placing a row of ` code pegs, and of code-maker providing the feedback of zero to ` key pegs: a black key for each code peg of correct color and position, and a white key for each peg of correct color but wrong position. After that another guess is made. Guesses and feedbacks continue to alternate until either the code-breaker guesses correctly, or n incorrect guesses have been made. The code-breaker wins if he obtains the solution within n rounds; the code-maker wins otherwise. Mastermind is an inductive inquiry game that involves trials of experimentation and evaluation. As such it leads to the interesting question of the underlying logical reasoning and its difficulty. Existing mathematical results on Mastermind do not provide any answer to this problem—most focus has been directed at finding a strategy that allows winning the game in the smallest number of rounds (see [7–10]). Static Mastermind [11] is a version of the Mastermind game in which the goal is to find the minimum number of guesses the code-breaker can make all at once at the beginning of the game (without waiting for the individual feedbacks), and upon receiving them all at once completely determine the code in the next guess. In the case of this game some strategy analysis has been conducted [12], but, more importantly, Static Mastermind has been given a computational complex- ity analysis [13]. The corresponding Static Mastermind Satisfiability Decision Problem has been defined in the following way. Definition 1 (Mastermind Satisfiability Decision Problem). Input A set of guesses G and their corresponding scores. Question Is there at least one valid solution? Theorem 1. Mastermind Satisfiability Decision Problem is NP-complete. This result gives an objective computational measure of the difficulty of the task. NP-complete problems are believed to be cognitively hard [14–16], hence this result has been claimed to indicate why Mastermind is an engaging and popular game. It does not however give much insight into the difficulty of reasoning that might take place while playing the game, and constructing the solution. 1.2 Math Garden This work has been triggered by the idea of introducing a dedicated logical reasoning training in primary schools through the online Math Garden system 3 (Rekentuin.nl or MathsGarden.com, see [17]). Math Garden is an adaptive train- ing environment in which students can play various educational games especially designed to facilitate the development of their abstract thinking. Currently, it consists of 15 arithmetic games and 2 complex reasoning games. Students play the game-items suited for their level. A difficultly level is appropriate for a stu- dent if she is able to solve 70% items of this level correctly. The difficulty of tasks and the level of the students’ play are being continuously estimated according to the Elo rating system, which is used for calculating the relative skill levels of players in two-player games such as chess [18]. Here, the relative skill level is computed on the basis of student v. game-item opposition: students are rated by playing, and items are rated by getting played. The rating depends on accuracy and speed of problem solving [19]. The rating scales go from − infinity to + infinity. If a child has the same rating as an item this means that the child can solve the item with probability .5. In general, the absolute values of the ratings have no straightforward meaning. Hence, one result of children playing within Math Garden is a rating of all items, which gives item difficulty parameters. At the same time every child gets a rating that reflects her or his reasoning ability. One of the goals of this project is to understand the empirically established item difficulty parameters by means of a logical analysis of the items. Figure 5 in the Appendix depicts the educational and research context of Math Garden. 2 Deductive Mastermind Now let us turn to Deductive Mastermind, the game that we designed for the Math Garden system (its corresponding name within Math Garden is Flower- code). Figure 1 shows an example item, revealing the basic setting of the game. Each item consists of a decoding board (1), short feedback instruction (2), the domain of flowers to choose from while constructing the solution (3) and the timer in the form of disappearing coins (4). The goal of the game is to guess the right sequence of flowers on the basis of the clues given on the decoding board. Each row of flowers forms a conjecture that is accompanied by a small board on the right side of it. The dots on the board code the feedback about the conjecture: one green dot for each flower of correct color and position, one orange dot for each flower of correct color but wrong position, and one red dot for each flower that does not appear in the secret sequence at all. In order to win, the player is supposed to pick the right flowers from the given domain (3), place them in the right order on the decoding board, right under the clues, and press the OK button. She must do so before the time is up, i.e., before all the coins (4) disappear. If the guess is correct, the number of coins that were left as she made the guess is added to her overall score. If her guess is wrong, the same number is subtracted from it. In this way making fast guesses is punished, but making a fast correct response is encouraged [19]. Unlike the classical version of the Mastermind game, Deductive Mastermind does not require the player to come up with the trial conjectures. Instead, Flower- code gives the clues directly, ensuring that they allow exactly one correct solution. 4 Fig. 1. Deductive Mastermind in the Flowercode setting Hence, Deductive Mastermind reduces the complexity of classical Mastermind by changing from an inductive inference game into a very basic logical-reasoning game. On the other hand, when compared to Static Mastermind, Deductive Mas- termind differs with respect to the goal. In fact, by guaranteeing the existence of exactly one correct solution, Deductive Mastermind collapses the postulated complexity from Theorem 1, since the question of the Static Mastermind Satis- fiability Problem becomes void. It is fair to say that Deductive Mastermind is a combination of the classical Mastermind game (the goal of the game is the same: finding a secret code) and the Static Mastermind game (it does not involve the trial-and-error inductive inference experimentation). Its very basic setting al- lows access to atomic logical steps of non-linguistic logical reasoning. Moreover, Deductive Mastermind is easily adaptable as a single-player game, and hence suitable for the Math Garden system. The simple setting provides educational justification, as the game trains very basic logical skills. The game has been running within the Math Garden system since Novem- ber 2010. It includes 321 game-items, with conjectures of various lengths (1-5 flowers) and number of colors (from 2 to 5). By January 2012, 2,187,354 items had been played by 28,247 primary school students (grades 1-6, age: 6-12 years) in over 400 locations (schools and family homes). This extensive data-collecting process allows analyzing various aspects of training, e.g., we can access the in- dividual progress of individual players on a single game, or the most frequent mistakes with respect to a game-item. Most importantly, due to the student- item rating system mentioned in Section 1.2, we can infer the relative difficulty of game-items. Within this hierarchy we observed that the present game-item domain contains certain “gaps in difficulty”—it turns out that our initial diffi- culty estimation in terms of non-logical aspects (e.g., number of flowers, number of colors, number of lines, the rate of the hypotheses elimination, etc.) is not precise, and hence the domain of items that we generated does not cover the whole difficulty space. Providing a logical apparatus that predicts and explains the difficulty of Deductive Mastermind game-items can help solving this problem and hence also facilitate the training effect of Flowercode (see Appendix, Figure 7). 5 3 A Logical Analysis Each Deductive Mastermind game-item consists of a sequence of conjectures. Definition 2. A conjecture of length l over k colors is any sequence given by a total assignment, h : {1, . . . , `} → {c1 , . . . , ck }. The goal sequence is a distin- guished conjecture, goal : {1, . . . , `} → {c1 , . . . , ck }. Every non-goal conjecture is accompanied by a feedback that indicates how similar h is to the given goal assignment. The three feedback colors: green, orange, and red, described in Section 2, will be represented by letters g, o, r. Definition 3. Let h be a conjecture and let goal be the goal sequence, both of length l over k colors. The feedback f for h with respect to goal is a sequence a b c g . . . g o . . . o r . . . r = g a ob r c , z }| { z }| { z }| { where a, b, c ∈ {0, 1, 2, 3, . . .} and a + b + c = `. The feedback consists of: – exactly one g for each i ∈ G, where G = {i ∈ {1, . . . `} | h(i) = goal(i)}. – exactly one o for every i ∈ O, where O = {i ∈ {1, . . . , `}\G | there is a j ∈ {1, . . . , `}\G, such that i 6= jand h(i) = goal(j)}. – exactly one r for every i ∈ {1, . . . , `}\(G∪O). 3.1 The informational content of the feedback How to logically express the information carried by each pair (h, f )? To shape the intuitions let us first give a second-order logic formula that encodes any feedback sequence g a ob rc for any h with respect to any goal: ∃G⊆{1, . . . `}(card(G)=a ∧ ∀i∈G h(i)=goal(i) ∧ ∀i∈G / h(i)6=goal(i) ∧ ∃O⊆{1, . . . `}\G (card(O)=b ∧ ∀i∈O ∃j∈{1, . . . `}\G(j6=i ∧ h(i)=goal(j)) ∧ ∀i∈{1, . . . `}\(G∪O) ∀j∈{1, . . . `}\G h(i)6=goal(j))). Since the conjecture length, `, is fixed for any game-item, it seems sensible to give a general method of providing a less engaging, propositional formula for any instance of (h, f ). As literals of our Boolean formulae we take h(i) = goal(j), where i, j ∈ {1, . . . `} (they might be viewed as propositional variables pi,j , for i, j ∈ {1, . . . `}). With respect to sets G, O, and R that induce a partition of {1, . . . `}, we define ϕgG , ϕoG,O , ϕrG,O , the propositional formulae that correspond to different parts of feedback, in the following way: – ϕgG := i∈G h(i)=goal(i) ∧ j∈{1,...,`}\G h(j)6=goal(j), V V – ϕoG,O := V W i∈O ( j∈{1,...,`}\G,i6=j h(i) = goal(j)), – ϕrG,O := V i∈{1,...`}\(G∪O),j∈{1,...`}\G,i6=j h(i) 6= goal(j). 6 Observe that there will be as many substitutions of each of the above schemes of formulae, as there are ways to choose the corresponding sets G and O. To fix the domain of those possibilities we set G := {G|G ⊆ {1, . . . , `} ∧ card(G)=a}, and, if G ⊆ {1, . . . , `}, then OG = {O|O ⊆ {1, . . . , `}\G ∧ card(O)=b}. Finally, we can set Bt(h, f ), the Boolean translation of (h, f ), to be given by: _ g _ Bt(h, f ) := (ϕG ∧ (ϕoG,O ∧ ϕrG,O )). G∈G O∈OG Example 1. Let us take ` = 2 and (h, f ) such that: h(1):=c1 , h(2):=c2 ; f :=or. Then G={∅}, O{∅} ={{1}, {2}}. The corresponding formula, Bt(h, f ), is: (goal(1)6=c1 ∧ goal(2)6=c2 ) ∧ ((goal(1)=c2 ∧ goal(2)6=c1 ) ∨ (goal(2)=c1 ∧ goal(1)6=c2 )) Each Deductive Mastermind game-item consists of a sequence of conjectures together with their respective feedbacks. Let us define it formally. Definition 4. A Deductive Mastermind game-item over ` positions, k colors and n lines, DM (l, k, n), is a set {(h1 , f1 ), . . . , (hn , fn )} of pairs, each consist- ing of a single conjecture together with its corresponding feedback. Respectively, Bt(DM (l, k, n)) = Bt({(h1 , f1 ), . . . , (hn , fn )}) = {Bt(h1 , f1 ), . . . , Bt(hn , fn )}. Hence, each Deductive Mastermind game-item can be viewed as a set of Boolean formulae. Moreover, by the construction of the game-items we have that this set is satisfiable, and that there is a unique valuation that satisfies it. Now let us focus on a method of finding this valuation. 3.2 Analytic Tableaux for Deductive Mastermind In proof theory, the analytic tableau is a decision procedure for propositional logic [20–22]. The tableau method can determine the satisfiability of finite sets of formulas of propositional logic by giving an adequate valuation. The method builds a formulae-labeled tree rooted at the given set of formulae and unfolding breaks these formulae into smaller formulae until contradictory pairs of literals are produced or no further reduction is possible. The rules of analytic tableaux for propositional logic, that are relevant for our considerations are as follows.3 ϕ∧ψ ϕ∨ψ ∧ ∨ ϕ, ψ ϕ ψ By our considerations from Section 3 we can now conclude that applying the analytic tableaux method to the Boolean translation of a Deductive Mastermind game-item will give the unique missing assignment goal. In the rest of the paper we will focus on 2-pin Deductive Mastermind game-items (where ` = 2), in particular, we will explain the tableau method in more detail on those simple examples. 3 We do not need the rule for negation because in our formulae only literals are negated. 7 2-pin Deductive Mastermind Game-items Since the possible feedbacks consist of letters g (green), o (orange), and r (red), in principle for the 2-pin Deductive Mastermind game-items we get six possible feedbacks: gg, oo, rr, go, gr, or. From those: gg is excluded as non-applicable; go is excluded because there are only two positions. Let us take a pair (h, f ), where h(1)=ci , h(2)=cj , then, depending on the feedback, the corresponding boolean formulae are given in Figure 2. We can compare the complexity of those feedbacks just by looking at their tree repre- sentations created from the boolean translations via the tableau method. Those Feedback Boolean translation oo goal(1) 6= ci ∧ goal(2) 6= cj ∧ goal(1) = cj ∧ goal(2) = ci rr goal(1) 6= ci ∧ goal(2) 6= cj ∧ goal(1) 6= cj ∧ goal(2) 6= ci gr (goal(1) = ci ∧ goal(2) 6= cj ) ∨ (goal(2) = cj ∧ goal(1) 6= ci ) or (goal(1) 6= ci ∧ goal(2) 6= cj ) ∧ ((goal(1) = cj ∧ goal(2) 6= ci ) ∨ (goal(2) = ci ∧ goal(1) 6= cj )) ci , cj ci , c j ci , cj ci , c j oo rr or gr goal(1)6=ci goal(1)6=ci goal(1)6=c1 goal(2)6=c2 goal(2)6=cj goal(2)6=cj goal(2)=c1 goal(1)=c2 goal(1)=cj goal(1)6=cj goal(1)=ci goal(2)=cj goal(1)6=c2 goal(2)6=c1 goal(2)=ci goal(2)6=ci goal(2)6=cj goal(1)6=ci goal(2)6=c2 goal(2)6=c1 Fig. 2. Formulae and their trees for 2-pin Deductive Mastermind feedbacks representations clearly indicate that the order of the tree-difficulty for the four possible feedbacks is: oo0) and a cluster of easier items (item ratings <0). In the cluster of easy items, there are items with 2, 3, 4, and 5 colors. The cluster of difficult items consists of items with 3, 4 and 5 colors. The two clusters could be expressed fully in terms of model-relevant features. It appeared that items are easy in the following two cases: (1) no or feedback and no gr feedback; (2) no or feedback, at least one gr feedback, and all colors are included in at least 10 one of the conjectures. Items are difficult otherwise. This shows that or steps make the item relatively difficult, but only if or is required to solve the item. A second aspect that makes an item difficult is the exclusion of one of the colors in the solution from the hypotheses rows. 5 Conclusions and Future Work In this paper we proposed a way to use a proof-theoretical concept to analyze the cognitive difficulty of logical reasoning. The first hypotheses that we have drawn from the model gave a reasonably good prediction of item-difficulties, but the fit is not perfect. However, it must be noted that several non-logical factors may play a role in the item difficulty as well. For example, the motor requirements to give the answer also introduce some variation in item difficulty, which depends not only on accuracy but also on speed. The minimal number of clicks required to give an answer varies between items. We did not take these aspects into account so far. In particular, the two difficulty clusters observed in the empirical study can be explained with the use of our method. Our further work can be split into two main parts. We will continue this study on the theoretical level by analyzing various complexity measures that can be obtained on the basis of the tableau system. We also plan to compare the fit with empirical data of the tableau-derived measures and other possible logical formalizations of level difficulty (e.g., a resolution-based model). On the empirical side we first would like to extend our analysis to game-items that con- sist of conjectures of higher length. This will allow comparing difficulty of tasks of different size and measure the trade-off between the size and the structure of the tasks. We will also study this model in a broader context of other Math Garden games. This would allow comparing individual abilities within Deductive Mastermind and the abilities within other games that have been correlated for instance with the working memory load. Finally, it would be interesting to see whether the children are really using the proposed difficulty order of feedbacks in finding an optimal tableau—this could be checked in an eye-tracking experiment (see [23, 24]). References 1. Szymanik, J.: Quantifiers in TIME and SPACE. Computational Complexity of Generalized Quantifiers in Natural Language. PhD thesis, Universiteit Van Ams- terdam (2009) 2. Gierasimczuk, N., Szymanik, J.: Branching quantification v. two-way quantifica- tion. Journal of Semantics 26(4) (2009) 367–392 3. Szymanik, J., Zajenkowski, M.: Comprehension of simple quantifiers. Empirical evaluation of a computational model. Cognitive Science 34(3) (2010) 521–532 4. Zajenkowski, M., Styla, R., Szymanik, J.: A computational approach to quantifiers as an explanation for some language impairments in schizophrenia. Journal of Communication Disorders 44(6) (2011) 595–600 11 5. Van Rooij, I., Kwisthout, J., Blokpoel, M., Szymanik, J., Wareham, T., Toni, I.: Intentional communication: Computationally easy or difficult? Frontiers in Human Neuroscience 5 (2011) 1–18 6. Verbrugge, R., Mol, L.: Learning to apply theory of mind. Journal of Logic, Language and Information 17(4) (2008) 489–511 7. Knuth, D.E.: The computer as master mind. Journal of Recreational Mathematics 9(1) (1977) 1–6 8. Irving, R.W.: Towards an optimum Mastermind strategy. Journal of Recreational Mathematics 11 (1978-79) 81–87 9. Koyama, M., Lai, T.: An optimal Mastermind strategy. Journal of Recreational Mathematics 25 (1993) 251—256 10. Kooi, B.: Yet another Mastermind strategy. ICGA Journal 28(1) (2005) 13–20 11. Chvatal, V.: Mastermind. Combinatorica 325–329 (1983) 12. Greenwell, D.L.: Mastermind. Journal of Recreational Mathematics 30 (1999-2000) 191–192 13. Stuckman, J., Zhang, G.: Mastermind is NP-complete. INFOCOMP Journal of Computer Science 5 (2006) 25–28 14. Van Rooij, I.: The tractable cognition thesis. Cognitive Science 32 (2008) 939–984 15. Szymanik, J.: Computational complexity of polyadic lifts of generalized quantifiers in natural language. Linguistics and Philosophy 33(3) (2010) 215–250 16. Ristad, E.S.: The Language Complexity Game. The MIT Press (1993) 17. Van der Maas, H.L.J., Klinkenberg, S., Straatemeier, M.: Rekentuin.nl: combinatie Van oefenen en toetsen. Examens (2010) 10–14 18. Elo, A.: The Rating of Chessplayers, Past and Present. Arco (1978) 19. Klinkenberg, S., Straatemeier, M., Van der Maas, H.L.J.: Computer adaptive practice of maths ability using a new item response model for on the fly ability and difficulty estimation. Computers in Education 57 (2011) 1813–1824 20. Beth, E.W.: Semantic entailment and formal derivability. Mededelingen Van de Koninklijke Nederlandse Akademie Van Wetenschappen. Afdeling Letterkunde 18(13) (1955) 309–342 21. Smullyan, R.: First-order logic. Springer-Verlag, Berlin (1968) 22. Van Benthem, J.: Semantic tableaus. Nieuw Archief voor Wiskunde 22 (1974) 44–59 23. Ghosh, S., Meijering, B., Verbrugge, R.: Logic meets cognition: Empirical reasoning in games. In Boissier, O., Fallah-Seghrouchni, A.E., Hassas, S., Maudet, N., eds.: MALLOW, CUER Workshop Proceedings (2010) 24. Ghosh, S., Meijering, B.: On combining cognitive and formal modeling: a case study involving strategic reasoning. In Van Eijck, J., Verbrugge, R., eds.: Proceed- ings of the Workshop on Reasoning About Other Minds: Logical and Cognitive Perspectives (RAOM-2011), Groningen, CEUR Workshop Proceedings (2011) 12 Appendix The appendix contains three figures. The first one illustrates the educational and research context of the Math Garden system (Figure 5); the second shows the distribution of the empirical difficulty of all items in Deductive Mastermind (Figure 6); and the third gives the frequency of players and game-items with respect to the overall rating (Figure 7). We refer to those figures in the proper text of the article. Investigators instruction methods new items, data tasks game playing report Student Math Garden.com Teacher instruction Fig. 5. Math Garden educational and research context Fig. 6. The distribution of the empirically established game-item difficulties for all 321 game-items in the Deductive Mastermind game. The item number (x-axis) is some arbitrary number of the game-item. The y-axis shows the item ratings. For example, the item presented in Figure 4 is number 23 and its rating is 4.2. 13 Fig. 7. The frequency of players (green, y-axis on the left) and game-items (blue, y-axis on the left) with respect to the overall rating (x-axis).