<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Cognitive-Plausible Explanations for Board Game Agents with Genetic Programming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manuel Eberhardinger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Rupp</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Richoux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Maucher</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Setareh Maghsudi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIST</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ruhr-University Bochum</institution>
          ,
          <addr-line>Bochum</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Stuttgart Media University, Institute for Applied Artificial Intelligence</institution>
          ,
          <addr-line>Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Technical University of Applied Sciences Mannheim</institution>
          ,
          <addr-line>Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Recent advances in board game agents have led to superhuman performances in various games, but their decision-making processes remain opaque even to experienced players and are not cognitive-plausible from a human perspective. Based on criteria for cognitive-plausible systems, we introduce an approach that emphasizes interpretable knowledge representation through pattern-based and hierarchical structures. Our approach uses Genetic Programming to learn feature programs that capture spatial relationships on the game board. In addition, it utilizes decision trees to model agent behavior based on these feature programs. The result is a hierarchical, pattern-based model for generating post-hoc explanations for board game agents. We show that our method generates comprehensible explanations in Tic-Tac-Toe. We then discuss the current limitations in generalizing the approach to more complex games using Connect 4 as an example.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable Reinforcement Learning</kwd>
        <kwd>Genetic Programming</kwd>
        <kwd>Board Games</kwd>
        <kwd>Human-like Game Playing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Although board game agents have demonstrated impressive superhuman playing strength in various
domains [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], they are not fully comprehensible in terms of why an action is taken, even for professional
players. Furthermore, from a cognitive science perspective, the decision-making process of these agents
does not resemble the way humans play games [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Mándziuk proposes multiple “facets of
cognitiveplausible playing systems” about knowledge acquisition and representation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The method we propose
addresses both postulates from Mańdziuk [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] concerning knowledge representation: (P1) “Game-related
concepts may be efectively represented and processed with the use of pattern-based representation.”
(P2) “game-related knowledge should be represented in a hierarchichal structure with various
interand intra-level connections.” To that end, we present an approach for explaining board game agents by
learning feature programs, i.e., programs that represent single or multiple grid cells and their relation
to other cells with Genetic Programming. This is similar to P1 and the work about visual routines –
a method to extract spatial information from scenes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In addition, feature programs can represent
intra-level connections of P2. By learning a decision tree on the feature activation of these programs,
our method can highlight activated features for given state-action pairs, i.e., a hierarchical structure
representing the inter-level connections of P2.
      </p>
      <p>
        An important point worth discussing in advance is whether these postulates are necessary or
suficient for human comprehensibility and cognitive-plausibility. While pattern-based and hierarchical
representations may be necessary because they correspond to human perception of recurring structures
and the organization of knowledge, they are unlikely to be suficient [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Human comprehensibility
also depends on other aspects, such as the abstraction and combination of strategies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], analogical
thinking [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and the ability to convey explanations in intuitive terms. We will show in our approach
that adherence to the postulates contributes to more interpretable models, but a broader spectrum of
mechanisms is needed to better approximate cognitive-plausibility [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>We show that the proposed method generates comprehensible explanations for Tic-Tac-Toe. In
addition, we present preliminary results for Connect 4 and discuss the limitations that need to be
overcome to apply the method to more complex games.</p>
      <p>Our contributions are:
• We present first steps towards a cognitive-plausible approach to explain board game agents.
• We show a proof of concept for generating explanations using the game Tic-Tac-Toe as an example.
• We conduct a case study which outlines the improvements necessary to apply the method on
more complex games such as Connect 4.</p>
      <p>The remainder of the paper is structured as follows: In Section 2 we review related work about
genetic programming in the context of board games, as well as program synthesis approaches applied
to explainability and interpretability in games. Section 3 introduces the domain-specific language and
the representation of programs. The data collection process and methodology are described in Section 4.
Subsequently, Section 5 presents and discusses the results of our case studies on Tic-Tac-Toe and Connect
4. Finally, Section 6 summarizes our contributions and outlines directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The majority of research that uses GP in the context of board games, search for heuristics for evaluating
game states. These heuristics are evolved over mathematical functions for games such as Lose Checkers,
Reversi, or Seven Wonders [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. However, searching for heuristics is undesired, as a search
algorithm is still necessary. Besides, using heuristics in a search algorithm is not fully explainable, and,
in most cases, is cognitively implausible. Nevertheless, heuristics for evaluating game states, are more
interpretable than using a combination of deep learning and search algorithms, similar to AlphaZero
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Silver et al. learn logical programs which are extracted from decision trees for simple, game-related
tasks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], an inductive logic programming algorithm is introduced that finds comprehensible
game strategies by providing the game rules as building blocks for logic programs for simple board
games. Currently, our method does not use game rules, as they can be derived from the state-action
pairs. However, our method is more dificult to learn without game rules.
      </p>
      <p>
        The use of programs for explainability in the context of game AI was introduced for a maze-runner
agent and two simplified Atari games: Space Invaders and Asterix [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In contrast to generating
explanations, large language models are used to directly evolve programs as policies for a wide range
of digital games, as well as programs as heuristic functions for board games [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        The closest work to ours is by Soemers et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], in which game tactics are extracted from linear
models trained with self-play and converted into decision and regression trees. To be able to compare
game traces collected from human participants with those collected by black-box agents in future work,
we decided to generate post-hoc explanations from game traces without using the logits of trained
policies, i.e., the raw output predictions of trained machine learning models before turning them into
probabilities with the softmax function. Therefore, we only use state-action pairs collected from game
agents without logits of a trained oracle.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Program, Domain-specific Language &amp; Problem Domain</title>
      <p>
        Programs are represented by a typed domain-specific language (DSL) which closely resembles the
Lisp programming language [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The search space is constructed by using a uniformly distributed
probabilistic grammar [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] shown in Table 1. Our DSL is inspired by the work about visual routines
[
        <xref ref-type="bibr" rid="ref13 ref5">5, 13</xref>
        ], which uses a fixed set of basic operations to assemble diferent perception functions to reason
integer values int
2D grid observation of the game board board
possible objects on the board object
an object on the board with coordinates so (stateObject)
standard if-clause bool → func → func → func
checks if two objects are equal stateObject → object → bool
cell object with x and y coordinate int → int → cell
get a object on the board board → cell → so
get the object type of a stateObject so → object
get a feature and its activation for a given object board → so → object → feature
negates a Boolean value bool → bool
conjunction of two Boolean values bool → bool → bool
disjunction of two Boolean values bool → bool → bool
Listing 1 An example feature program generated from the DSL. $1 is the input state of the board game.
This program is activated if the cells on position (2,0) and (1,0) are the same, so this feature could be
used in combination with another feature to infer if position (0,0) lead to a win or lose of a game in
Tic-Tac-Toe.
1 (get-board-feature
2 $1
3 (get $1 (cell 2 0 ))
4 (get-game-piece (get $1 (cell 1 0 )))
5 )
about properties of shapes and spatial relations. Since our focus is on board games with a grid-like
board and game pieces, the building blocks of our DSL consist of control flow production rules (if
clauses and Boolean operators), integers, and methods to get pieces on the board and compare them.
The primitive get-board-feature(board, stateObject, object) -&gt; feature, gets one or
multiple cells and checks if they are of the same object as the third parameter. If they are the same the
feature program is activated. stateObject are always cells of the current state of the board game,
while object is always a type provided by the game or environment. The board feature is later able to
check if for a specific state this feature program is activated, so board features are actually independent
from games and board states and, thus, transferable to other games in the future.
      </p>
      <p>Listing 1 shows an example feature program generated from the DSL. $1 is the input state of the
board game. This program is activated if the cells on position (2,0) and (1,0) are the same, so this feature
could be used in combination with another feature to infer if position (0,0) lead to a win or lose of a
game in Tic-Tac-Toe
Problem Domain To simplify the problem domain, we currently restrict ourselves to simple,
traditional board games which are fully observable and turn-based two player games such as Tic-Tac-Toe
and Connect 4. These games allow us to verify that the created explanations are also correct, since the
decision making processes of these games are relatively simple compared to modern games, which
depend on complex game mechanics.
Deterministic Agent</p>
      <p>Plays</p>
      <p>Environment</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1. Overview</title>
        <p>Collected
State-Action</p>
        <p>Pairs
Genetic Programming</p>
        <p>Decision Tree
Feature Filtering
by Gini Importance</p>
        <p>Explain State-Action Pairs</p>
        <p>Feature Activation</p>
        <p>Post-hoc
Explanations</p>
        <p>The goal of this work is to generate post-hoc explanations for black-box board game agents. Figure 1
shows an overview of the proposed approach. First, we collect gameplay data by using an agent that
plays in an environment, here it is Tic-Tac-Toe or Connect 4. The collected state-action pairs are then
used to learn explanations with genetic programming in combination with a decision tree to select
important features. Once the important features have been learned, the state-action pairs can be used
to generate post-hoc explanations of the data by highlighting the activated features. Since the decision
tree grows too large, we will only show selected paths in order to describe the results in Table 2. In the
following, we will explain the diferent parts of the method.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Deterministic Agent &amp; Data Collection</title>
        <p>To ensure a better validation, we restricted our method to deterministic minimax agents, since
decisionmaking is not based on stochasticity, and is thus easier to interpret compared to MCTS agents or neural
networks. It is also possible to use the agent to validate the methodology with deterministic data since
the same action is always taken for a given state.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Genetic Programming for Learning Features</title>
        <p>
          The implementation of our tree-based genetic programming algorithm is based on [
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ]. To initialize
the population, we randomly select trees and specify their return type. Then, we recursively sample
their subtrees for each parameter. To perform a mutation, we randomly select a node and a tree and
sample a new subtree for the node’s return type. A one-point crossover is implemented, merging one
tree with a random sub-tree of another tree that has the same return type. Tournament selection is used
to select two candidates for crossover or one candidate for mutation from the population. For more
details on the genetic programming algorithm, please refer to [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>The fitness function maximizes the number of times features appear in the set of provided states :
  =
{︃ ∑︀∈ () , if  ∈/</p>
        <p>||
0,
otherwise
(1)
where  is the currently evolved feature program, which returns 1 if the feature is activated and 0 if
the feature is not activated. We also use a regularizer to prevent the trees from growing too large. We
◁ set of all program activations for a single state
◁ returns a program activation after execution</p>
        <p>◁  &gt; 0
◁ explanations
Algorithm 1 Overview of the whole approach.
implement a feature program database  that stores the features with the highest fitness every  = 50
generations for a total of  generations. This results in | | =  feature programs in  .</p>
        <p>If an evolved feature program is already present in the feature database, then we always return 0
as the fitness. This ensures that these features will most likely not be selected for further evolution.
We evolve the same population after saving the best features in the feature database so that we do not
have to re-initialize the population. This allows the GP algorithm to run like an open-ended evolution,
adding new features throughout all iterations. This is also possible thanks to the probabilistic grammar,
which defines programs through recursion.</p>
        <p>A drawback of counting how often features occur in the states is that programs such as
(get-board-feature $1 (get $1 (cell 1 0)) (get-game-piece (get $1 (cell 1
0)))) have maximum fitness because they compare the same cell. However, these programs do not
contain any information for the decision-making process and are therefore filtered out in the next step.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. From Features To Explanations</title>
        <p>
          After collecting features for a predetermined number of generations , we run all feature programs 
on the given states. Next, we train a decision tree  using a standard stochastic greedy decision tree
learner [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] that maps the output of the feature programs to the corresponding actions  for each state
in . This decision tree represents the hierarchical structure of how features are connected. When we
calculate the feature importance , i.e., the Gini importance, it becomes clear that only a small subset of
the found features ˜ ⊆   is actually used to map the activated features to the actions. Therefore,
when training the decision tree, we filter the features and keep only those with an importance of  &gt; 0.
We train a second decision tree ˜ on the subset of ˜ , to have a concise mapping of the nodes in
the decision tree and the programs in ˜ , since only the best subset ˜ of previously found programs
is used.
        </p>
        <p>We use the collected states as input to create post-hoc explanations and traverse the decision paths  ˜,
with  ˜⊆ ˜ of the decision tree to make the activated or non-activated feature programs visible. The
(get-board-feature $1 (get $1 (cell 1 Is (1,0) occupied by enemy No
0))(if (eq-obj? (get $1 (cell 1 0)) or empty?
Empty) Empty Enemy ))
(get-board-feature $1 (get $1 (if
(eq-obj? (get $1 (cell 2 0))
Enemy)(cell 1 0)(cell 2 0))) Empty)</p>
        <p>If (2,0) is occupied by en- Yes
emy, check if (1,0) is empty.</p>
        <p>Otherwise check if (2,0) is
empty?
(get-board-feature $1 (get $1 (cell 0 Is (0,0) the player’s piece?
0 )) Player )
Yes
(get-board-feature $1 (get $1 (cell 1 Is (1,0) occupied by enemy Yes
0))(if (eq-obj? (get $1 (cell 1 0)) or empty?
Empty) Empty Enemy ))
(get-board-feature $1 (get $1 (cell 1 If (0,2) is empty, treat (1,2) as Yes
2))(if (eq-obj? (get $1 (cell 0 2)) Empty. Is (0,2) not empty?
Empty ) Empty (get-game-piece (get $1
(cell 1 2)))))
(get-board-feature $1 (get $1 (if
(eq-obj? (get $1 (cell 0 1))
Empty)(cell 0 1)(cell 2 1))) Empty)</p>
        <p>Is (0,1) empty? If not, is (2,1) No
empty? (checks if either is)
(get-board-feature $1 (get $1 (cell 0 Is (0,2) empty?
2)) Empty)
Yes
pseudo code of this procedure is given in Algorithm 1.</p>
        <p>The following case studies provide examples of these types of explanations. Natural language
explanations for these programs are afterwards created by GPT-4o [23].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results &amp; Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Case Study: Tic-Tac-Toe Explanations</title>
        <p>To demonstrate the efectiveness of the proposed approach, we selected Tic-Tac-Toe as the test
environment because the simple game logic enables us to easily interpret the actions chosen by the minimax
agent. When two minimax agents play against each other, the game always ends in a tie. Therefore, we
collected data by having a random agent play against the minimax agent. In total, both agents played
100 games, which resulted in 319 state-action pairs. We only collected data after the minimax agent
made a move with the X symbol because we only wanted to explain the minimax agent’s decisions.</p>
        <p>Next, we ran the genetic programming (GP) algorithm on the states for  = 50, 000 generations,
maintaining a population size of 200 and a tournament selection size of 20. Every  = 50 generations,
the best feature program in the population was saved in the feature database. After the GP algorithm is
completed, there are a total of | | = 1000 feature programs in the database. After filtering the features
by the calculated feature importance score of the decision tree, only |˜ | = 21 feature programs remain
for explaining the decision-making process. Table 2 shows examples of the reasoning process, and
provides an explanation of the programs in natural language, as well as a short concise description of
the reasoning process in the caption.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Case Study: Connect 4</title>
        <p>We also ran experiments on the game Connect 4. The data collection process is similar to that of
TicTac-Toe. However, we do not let a minimax agent play against a random agent, as most games end after
the minimax agent has played four pieces. We let two minimax agents with diferent search depths play
against each other in order to collect data which resulted in 439 state-action pairs. For the GP algorithm,
we used  = 100, 000 generations, maintaining the same population and tournament size throughout.
This resulted in | | = 2000 feature programs. After training the decision tree and filtering the
programs by Gini importance, only |˜ | = 379 remained. We created post hoc explanations for a few
example states, but the decision tree and path were too large and incomprehensible. Upon examining
the important features, we noticed that many of them only considered one or two cells, which was
suficient for Tic-Tac-Toe since the game’s complexity is much smaller. We identified multiple key issues
that need improvement for learning features with genetic programming:
• Currently, the fitness function only counts how often a feature appears in states. To reduce the
number of necessary features, the number of cells the feature looks at should also be maximized.
• The DSL should be adapted to include more high-level functions that are also plausible from a
cognitive science viewpoint, such as counting pieces.</p>
        <p>• Including the game rules in the DSL, could also lead to better features and runtime.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        In this work, we introduced an approach and showed preliminary results for creating cognitive-plausible
explanations of board game agents. Our approach is based on visual routines [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and has a hierarchical,
pattern-based structure that resembles a more human-intuitive way of playing games [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], which we
also discussed for suficiency and necessity. We showed the branching of a decision path for explanations
of two collected state-action pairs in the game Tic-Tac-Toe and outlined how to scale the approach to
more complex games.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgements</title>
      <p>The research collaboration was facilitated by COST Action CA22145 - GameTable, supported by COST
(European Cooperation in Science and Technology).</p>
    </sec>
    <sec id="sec-8">
      <title>8. Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to improve formulation,
grammar and readability. After using this tool/service, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine
Learning research 12 (2011) 2825–2830.
[23] A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda,
A. Hayes, A. Radford, et al., Gpt-4o system card, arXiv preprint arXiv:2410.21276 (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          , I. Antonoglou,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          et al.,
          <article-title>A general reinforcement learning algorithm that masters chess, shogi, and go through self-play</article-title>
          ,
          <source>Science</source>
          <volume>362</volume>
          (
          <year>2018</year>
          )
          <fpage>1140</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Meta</given-names>
            <surname>Fundamental AI Research Diplomacy</surname>
          </string-name>
          <article-title>Team (FAIR), A</article-title>
          . Bakhtin,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brown</surname>
          </string-name>
          , E. Dinan, G. Farina,
          <string-name>
            <given-names>C.</given-names>
            <surname>Flaherty</surname>
          </string-name>
          et al.,
          <article-title>Human-level play in the game of Diplomacy by combining language models with strategic reasoning</article-title>
          ,
          <source>Science</source>
          <volume>378</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1126/science.ade9097.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mańdziuk</surname>
          </string-name>
          ,
          <article-title>Towards cognitively plausible game playing systems</article-title>
          ,
          <source>IEEE Computational Intelligence Magazine</source>
          <volume>6</volume>
          (
          <year>2011</year>
          ). doi:
          <volume>10</volume>
          .1109/
          <string-name>
            <surname>MCI</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <volume>940626</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mańdziuk</surname>
          </string-name>
          ,
          <article-title>Human-like intuitive playing in board games</article-title>
          ,
          <source>in: Neural Information Processing: 19th International Conference, ICONIP</source>
          <year>2012</year>
          ,
          <year>2012</year>
          , Proceedings,
          <source>Part II 19</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , Visual routines,
          <source>Cognition</source>
          <volume>18</volume>
          (
          <year>1984</year>
          )
          <fpage>97</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Corballis</surname>
          </string-name>
          ,
          <article-title>The recursive mind: The origins of human language, thought, and civilizationupdated edition</article-title>
          , Princeton University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Thielscher,
          <article-title>Representing and reasoning about game strategies</article-title>
          ,
          <source>Journal of Philosophical Logic</source>
          <volume>44</volume>
          (
          <year>2015</year>
          )
          <fpage>203</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          ,
          <article-title>Towards analogy-based explanations in machine learning</article-title>
          ,
          <source>in: International Conference on Modeling Decisions for Artificial Intelligence</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fürnkranz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kliegr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>On cognitive preferences and the plausibility of rule-based models</article-title>
          ,
          <source>Machine Learning</source>
          <volume>109</volume>
          (
          <year>2020</year>
          )
          <fpage>853</fpage>
          -
          <lpage>898</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Benbassat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sipper</surname>
          </string-name>
          ,
          <article-title>Evolving lose-checkers players using genetic programming</article-title>
          ,
          <source>in: Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games</source>
          , IEEE,
          <year>2010</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Benbassat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sipper</surname>
          </string-name>
          ,
          <article-title>Evolving board-game players with genetic programming</article-title>
          ,
          <source>in: Proc. of the 13th annual conference companion on Genetic and evolutionary computation</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>739</fpage>
          -
          <lpage>742</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Robilliard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fonlupt</surname>
          </string-name>
          ,
          <article-title>Towards human-competitive game playing for complex board games with genetic programming</article-title>
          ,
          <source>in: Artificial Evolution: 12th International Conference</source>
          , Evolution Artificielle,
          <string-name>
            <surname>EA</surname>
          </string-name>
          <year>2015</year>
          , Lyon, France,
          <source>October 26-28</source>
          ,
          <year>2015</year>
          .
          <source>Revised Selected Papers 12</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Lew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Kaelbling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <article-title>Few-shot bayesian imitation learning with logical program policies</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>10251</fpage>
          -
          <lpage>10258</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hocquette</surname>
          </string-name>
          ,
          <article-title>Machine discovery of comprehensible strategies for simple games using meta-interpretive learning</article-title>
          ,
          <source>New Generation Computing</source>
          <volume>37</volume>
          (
          <year>2019</year>
          )
          <fpage>203</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eberhardinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maghsudi</surname>
          </string-name>
          ,
          <article-title>Learning of generalizable and interpretable knowledge in grid-based reinforcement learning environments</article-title>
          ,
          <source>in: Proc. of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>19</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>203</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eberhardinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dockhorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Perez-Liebana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Gaina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Çakmak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maghsudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <article-title>From code to play: Benchmarking program search for games using large language models</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . doi:
          <volume>10</volume>
          .1109/TG.
          <year>2025</year>
          .
          <volume>3614499</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D. J. N. J.</given-names>
            <surname>Soemers</surname>
          </string-name>
          , S. Samothrakis, É. Piette,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stephenson</surname>
          </string-name>
          ,
          <article-title>Extracting tactics learned from self-play in general games</article-title>
          ,
          <source>Information Sciences 624</source>
          (
          <year>2023</year>
          )
          <fpage>277</fpage>
          -
          <lpage>298</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.ins.
          <year>2022</year>
          .
          <volume>12</volume>
          .080.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>J. McCarthy</surname>
          </string-name>
          ,
          <article-title>Recursive functions of symbolic expressions and their computation by machine</article-title>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <source>Communications of the ACM</source>
          <volume>3</volume>
          (
          <year>1960</year>
          )
          <fpage>184</fpage>
          -
          <lpage>195</lpage>
          . doi:
          <volume>10</volume>
          .1145/367177.367199.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schutze</surname>
          </string-name>
          ,
          <source>Foundations of Statistical Natural Language Processing</source>
          , MIT Press,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eberhardinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rupp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maghsudi</surname>
          </string-name>
          ,
          <article-title>Unveiling the decision-making process in reinforcement learning with genetic programming</article-title>
          ,
          <source>in: Advances in Swarm Intelligence</source>
          , Springer Nature Singapore, Singapore,
          <year>2024</year>
          , pp.
          <fpage>349</fpage>
          -
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , P. Maes, T. Darrell, Evolving visual routines,
          <source>Artificial Life 1</source>
          (
          <year>1994</year>
          )
          <fpage>373</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          , P. Prettenhofer,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>