Towards Explainable Decision Making with Neural
Program Synthesis and Library Learning
Manuel Eberhardinger1,2,∗ , Johannes Maucher1 and Setareh Maghsudi2
1
    Hochschule der Medien Stuttgart, Germany
2
    University of Tuebingen, Germany


                                         Abstract
                                         Generating explanations for the behavior of artificial or living agents is still an underexplored problem.
                                         We propose a new framework to make agent behavior explainable based on program synthesis. Programs
                                         have the advantage that they are inherently interpretable and can be verified for correctness. In detail, we
                                         use neural program synthesis with a language model in combination with library learning. In addition,
                                         we introduce a domain-agnostic curriculum that is necessary for the system to learn at all. We compare
                                         our methods with traditional and state-of-the-art program synthesis systems and justify the necessity of
                                         the curriculum and library learning module in ablation studies, followed by an analysis of the extracted
                                         libraries.

                                         Keywords
                                         Neural Program Synthesis, Library Learning, Imitation Learning, Explainable Reinforcement Learning


1. Introduction
Humans can easily explain other agents’ behavior, living or artificial, after observing a single
demonstration. However, generating explanations post-hoc in reinforcement learning (RL)
after seeing an agent interact in an environment is still an underexplored problem. Moreover,
it is unclear how to produce an informative explanation that helps to understand the agent’s
reasoning for selecting a specific action for a particular state.
    In this work, we propose using program synthesis to create post-hoc explanations for the
agent’s behavior after seeing a trajectory of the action sequence. We argue that programs are
inherently interpretable and therefore are a good choice for generating explanations of the
agents’ decision making process. This is possible by constructing a program call graph, which
can be traversed to follow the ”reasoning process” of the agent. Furthermore, we propose a new
neural program synthesis framework for RL. This new framework combines code generation
with a language model and library learning guided by a curriculum. Programs are generated by
imitating state-action pairs, which serve as input-output examples for the inductive program
synthesis task. Library learning extracts functions from programs synthesized from previously
solved tasks. The use of library learning enables a more detailed analysis of the extracted
NeSy 2023, 17th International Workshop on Neural-Symbolic Learning and Reasoning, Certosa di Pontignano, Siena,
Italy
∗
    Corresponding author.
Envelope-Open eberhardinger@hdm-stuttgart.de (M. Eberhardinger); maucher@hdm-stuttgart.de (J. Maucher);
setareh.maghsudi@uni-tuebingen.de (S. Maghsudi)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
functions and thus a better understanding of the agents’ behavior. The curriculum is domain-
agnostic, and works in theory on all environments, as the curriculum is based on the action
sequence lengths to imitate. This curriculum is necessary for the system to build up the functions
in the library, because without a curriculum the system is not able to solve tasks at all.
   To the best of our knowledge, this is the first work that proposes to combine neural program
synthesis with library learning for reasoning about the decision making process in reinforcement
learning environments. Our contributions include:

    • Introducing a framework for learning reusable and interpretable knowledge that can
      reason about agent behavior in reinforcement learning environments
    • A case study on data collected from a partial observable maze environment (see the
      environment on the top left in Figure 2)
    • A comparison of different program synthesis algorithms, including enumerative search,
      neural-guided enumerative search, and a fine-tuned language model with and without
      library learning
    • An analysis of extracted functions of the generated libraries


2. Related Work
Program Synthesis and Library Learning Program synthesis has a long history in the ar-
tificial intelligence research community [1, 2]. In recent years, many researchers have combined
deep learning with program synthesis to make program search more feasible by reducing or
guiding the search space [3, 4, 5, 6]. In contrast to the heuristic-based search algorithms, one can
also use language models to synthesize programs from text prompts [7, 8, 9, 10, 11]. Another
promising method is learning a library of functions from previously solved problems. These
functions are then reusable in an updated domain-specific language to solve more challenging
problems [12, 13, 14, 15, 16].

Explainable Reinforcement Learning There exists a variety of methods in the explainable
reinforcement learning (XRL) domain. In a recent comprehensive survey [17], the authors
divide XRL into four explainable categories: model, reward, state and task. Programmatic
policies, where a policy is represented by a program, are part of the model-based explanations
[18, 19, 20, 21, 22]. Other works in this category synthesize finite state machines to represent
policies [23] or use models based on decision trees [24, 25]. Our method belongs to the same
category since we explain sub-trajectories of policies, and our main goal in the future is to
extract a program that can represent the full policy.


3. Background
Our work is based on several research topics, which we combine to learn structured and reusable
knowledge in grid-based reinforcement learning environments. In this section, we provide a
brief introduction of those topics.
Neural Program Synthesis One of the core component in this work is the neural program
synthesizer. We use CodeT5, a finetuned T5 model[26] on multiple programming languages
and code related tasks. In [26], Raffel et al. introduced the T5 model with an encoder-decoder
architecture and unified different natural language processing (NLP) tasks into a single one by
converting them into a text-to-text format. That allows the authors to treat every problem in
one way, i.e., using text as input and producing text as output. In this work, CodeT5 is further
trained on Lisp programs to synthesize programs in the provided domain-specific language by
converting the agent’s observation into a text prompt and synthesizing programs in text format
as output.

Library Learning The goal of learning libraries is to build a library of specialized concepts
in a particular domain that allow programs to be expressed in a concise way. That is similar to
software engineers using open source libraries to improve their programming efficiency, since
someone else implemented the needed concepts in a specific domain. We use the DreamCoder
[13] library learning module to extract functions from solved tasks by analyzing synthesized
programs. These programs are refactored to minimize their description length while growing
the library. Instead of only extracting syntactic structures from programs, DreamCoder refactors
programs to find recurring semantic patterns in the programs [13].

ACCEL ACCEL is a training framework for RL agents that improves their generalization
capabilities using Unsupervised Environment Design (UED) [27, 28]. UED generates a distribu-
tion of valid environments at the frontier of the agents capabilities by providing a curriculum
for the agent. Dennis et al. uses an additional agent as the environment designer to increase
the difficulty of the environment after each episode [27]. ACCEL improves this method by
using an evolutionary approach to adapt the current environments [28]. In contrast to UED our
curriculum depends not on another agent or method.


4. Problem Formulation
Our goal is to make the agents’ decision making process interpretable by imitating trajectories
collected from black-box neural network policies with programmatic policies. Ultimately, our
main goal is to extract programs that can explain decisions and solve the environment. Therefore,
we intend to lay the foundation for a complete policy extraction algorithm in this paper.

Program and Domain-specific Language This work considers programs defined in a
typed domain-specific language (DSL) which is based on the Lisp programming language [29].
The main components of the DSL are control flows, the actions the agent can use, and also
modules to perceive the agent’s environment. Since we work with grid environments, the
agent’s perception consists of modules to determine certain positions on the grid and compare
them with the available objects in the environment such as walls or empty cells. The control
flows include if-else statements and Boolean operators to formulate more complex conditions.
The full DSL is included in Appendix B.
                                                   Training Component

                                                              Generate Training
                                          Train CodeT5
                                                                  Dataset


              Program Synthesis Component                                         Symbolic Component

        Collect state-action   Imitate Data with                                              Library Learning
                                                                        Filter Module
         pairs from Oracle          CodeT5                                                         Module


Figure 1: An overview of the overall architecture for creating explanations for a given reinforcement
learning environment. The framework can be decomposed into three components that are executed
iteratively and are guided by a curriculum. We describe the method in detail in Section 5.


Imitation Learning To simplify the problem, we limit ourselves to imitation learning [30],
instead of directly finding programs from rewards. Although directly finding programs from
rewards is an interesting challenge for future research, our main objective is to make the agent’s
behavior interpretable. To achieve this, we want to build a library of functions that allow us to
draw conclusions about the knowledge the black-box agent acquired during the training process.
We define the problem we want to solve as an imitation of sub-trajectories of state-action pairs
collected from a previously trained agent.


5. Method
Figure 1 shows a high-level overview of our approach. The framework consists of three
components and a curriculum. In addition, we need an oracle for collecting the data to be
imitated.

Training Component The first part is the training process of the CodeT5 model [9] with
randomly generated programs from the current DSL. The randomly generated programs are
executed in a given environment for 𝑡 steps to collect input-output examples, i.e., sequences of
state-action pairs to be imitated, as a training dataset. 𝑡 is chosen randomly for each program,
so we do not overfit on a specific sequence length.
   In our setup, we generate 50000 random programs. We then execute them in a randomly
selected environment from Figure 2 to collect data for imitating. We execute each program
with a random sequence length 𝑡 between 5 and 50. The programs do not have a specific target
or reward since they are sampled from the DSL. Our goal in creating a training dataset is to
exhibit the behavior of programs in a specific RL domain, i.e., how the agent is controlled by
given programs in a domain. Random program generation is limited to a maximum depth of six
of the abstract syntax tree. We train the model for five epochs in each iteration.
   CodeT5 is used without any modifications, as we generate text prompts from the state-action
pairs which the model maps to the random programs. The agent’s partial observation, a 2D
array of integers, is converted into a string representation, where each integer represents an
object in the environment, such as a wall or the goal position. Then the action is appended after
the observation. We explain the text prompt generation in Appendix C.
Figure 2: Different environments used for generating the training data. The evaluation is always
performed on the medium sized perfect maze environment which is placed on the top left.


   There are still a lot of improvements possible for the generation of the training data. Currently,
we only create one rollout of the program on a single environment. We also do not check whether
programs are semantically interesting, i.e., if clauses where the Boolean condition is a tautology,
leading to programs that consistently evaluate the same branch in the if clause.

Program Synthesis Component The second component converts the data collected from
an oracle into the desired representation of the model for the current action sequence length,
and then the model synthesizes Ρ programs for the state-action sequences to be imitated. In
our setup, 100 programs are synthesized for each text prompt, which are then passed to the
symbolic part of the framework.

Symbolic Component In this component, the filter module evaluates Ρ programs for syn-
tactic and functional correctness in the provided DSL that imitates the state-action sequence.
   The library learning module uses the correct programs to generate a library by extracting
functions from the found programs and adding them to the current DSL. It extracts functions
only if a part in a program occurs multiple times in other synthesized programs on the oracle data.
That way, the extracted functions are beenficial for the DSL since they have been synthesized
several times for different state-action sequences.

Curriculum The curriculum is only based on the action sequence length and, therefore
domain-agnostic. We start with an initial sequence length of five and increment it at the end
of each iteration after the library learning module completed the extraction phase. We repeat
all three steps until no more improvement is observable and no more functions are extracted
from solved tasks. We always sample new random programs from the DSL and run them in the
environment as the library is updated each iteration to represent more diverse programs. In
future work, we will design a more complicated curriculum that also takes into account the
number of functions extracted in each iteration or the percentage of imitated data from the
oracle.
   This curriculum strategy is based on the assumption that longer sequence lengths are more
complex than shorter ones. Programs that need to imitate three actions do not need to represent
as much information as programs that imitate five actions; Thus, the program length is shorter.
Shorter programs are easier to synthesize compared to long ones because of the smaller search
space. After building up a library of more complex functions, the model can also synthesize
programs for longer sequence lengths. In the experiments, we validated that our method cannot
build up a library without using a curriculum (see Section E).
   For a better understanding of the whole process, Algorithm 1 in the Appendix summarizes
all of the steps.

The Oracle We use the grid-world domain [31] for evaluation and restrict the problem space
to navigation tasks. We trained the ACCEL agent with the default hyperparameters from [28].
We then collected state-action pairs from the medium-sized perfect grid environment displayed
on the top left in Figure 2.
   We decided to use the ACCEL agent as an oracle, as it shows strong zero-shot capabilities
to a wide-ranging domain of tasks [28] and therefore can be used to collect data for different
problems without retraining an agent from scratch. In this work, we only conduct one case
study on the perfect maze environment. We plan to evaluate our method on multiple tasks and
domains in the future, to validate that it is domain-agnostic.


6. Experiments
We perform different ablation studies to justify the need for a language model as a program
synthesizer and a curriculum. In Section 6.2 the method for generating explanations from
programs is introduced. Finally, we perform a thorough analysis of the extracted functions in
the library.

6.1. Evaluation
We compare our approach to different program synthesis methods and justify the need for
a language model as a program synthesizer and a library learning module by the following
baselines and ablation studies:

    • Search: Program synthesis with a top-down enumerative search algorithm. We use the
      implementation from [13].
    • DreamCoder: A neural-guided search algorithm with a library learning module [13].
    • CodeT5: A language model fine-tuned on Lisp programs on our data [9].
    • LibT5: Our method which combines a CodeT5 model with library learning.

For the final evaluation, we use data collected from the same agent but on different runs to ensure
that we do not evaluate and train on the same data. Additionally, we justify the curriculum by
comparing our method to a setup without using a curriculum. The performance is measured by
                                                  1
                                    𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =      ∑ 𝑓 (Ρ, 𝜏 ),
                                                  𝑁 𝜏 ∈𝐷

                                            1,   if ∑𝜌∈Ρ 𝑔(𝜌, 𝜏 ) > 0
                               𝑓 (Ρ, 𝜏 ) = {
                                            0,   otherwise
                         𝑔(𝜌, 𝜏 ) =     𝟙⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
                                           { EXEC(𝜌, 𝑠) == 𝑎, ∀(𝑠, 𝑎) ∈ 𝜏 }
                                      is 0 after the first (𝜌,𝑠) where EXEC(𝜌,𝑠) != 𝑎
where 𝑁 is the size of the dataset 𝐷 to imitate, 𝜏 is a sub-trajectory from 𝐷 that consists of
state-action pairs (𝑠, 𝑎), and Ρ are all synthesized programs from a given method. 𝑓 (Ρ, 𝜏 ) checks
if there exists any program 𝜌 out of all synthesized programs Ρ that is correct. 𝑔(𝜌, 𝜏 ) evaluates
if a given program 𝜌 can imitate the full rollout 𝜏 and returns 1 if this is the case and otherwise
0. EXEC(𝜌, 𝑠) executes the program on a given state 𝑠 and returns an action 𝑎. The identity
function 𝟙 maps Boolean values to 0 and 1.

Fairness of evaluation Considering fundamental differences, a fair comparison of the used
algorithms can be challenging. We describe the used hardware resources and the main distinc-
tions of the experimental setup in more detail in Appendix D.

Results Figure 3 shows the final evaluation of newly collected data in the same environment
used to extract functions from found programs on the solved test tasks. Our presented method
can imitate most and the longest action sequences from the test data. CodeT5 without a library
learning module struggles to imitate longer action sequences but still outperforms enumerative
search and the DreamCoder system by a large margin. That also shows the importance of the
library learning module in addition to a language model for synthesizing programs.
   The performance of DreamCoder and the enumerative search algorithm was similar, although
we experimented with different types of encoders and hyperparameters for the neural-guided
search. That was also observed by Banburski et al., who applied DreamCoder on the Abstract
and Reasoning Corpus [32].
   When inspecting the found programs, it is also noticeable that both search methods do not
find programs that represent the agent’s decision process. The search methods always focus
their attention on the (0, 0) grid position, in contrast, the language models often look at the
position in front of the agent. That is also reflected in the library of search-based methods, as
no extracted function uses grid positions other than (0, 0). This indicates that only programs
that have checked the (0, 0) position were found. The libraries are shown in Figure 9 and Figure
10 in the Appendix. These functions are f2 and f0, and, f0 and f3, respectively.
   It is also evident from Figure 3 that a system without a curriculum cannot imitate complete
action sequences, as it can currently imitate up to sequence lengths of 30. In comparison,
complete trajectories are up to 100 steps long for this environment. We included the full
ablation study in Appendix E.

6.2. Visualization of the Decision Making Process
Since programs are inherently interpretable, we developed a method to visualize the agent’s
decision-making process by highlighting those grid positions responsible for choosing a partic-
ular action. Since one position is not always sufficient to select the correct action, we create
step-by-step explanations of the ”reasoning process” by traversing the program call graph [33]
and logging all function calls and their parameters.
                                 Evaluation of the different methods on new data
                      0.8                                                      DreamCoder
                                                                               Enumerative Search
                                                                               CodeT5
                      0.6                                                      LibT5
           Accuracy
                      0.4

                      0.2

                      0.0
                            5    10             15             20            25              30
                                                 Sequence Length
Figure 3: The evaluation of the different methods after no new functions were added to the library. The
evaluation data was collected on new rollouts of the ACCEL agent on the perfect maze environment.


        (lambda (lambda (#(lambda
          (lambda (#(lambda (lambda
            (if (not (eq-obj? $0 $1))
                forward-action left-action)))
              (#(lambda (get $0 2 1)) $0)
              (get $0 3 $1)))) 2 $1)))


Figure 4: Left: The program for a given sub-trajectory. Right: The decision-making process, when
executing the program on the state-action sequence. We show explanations for the first two states of
the sub-trajectory. The grid positions that are checked in the program are yellow. The agent’s position
is marked blue and faces to the right. Grey and black indicate walls and empty cells, respectively. The
forward action moves the agent one grid cell to the right. The left action only turns the agent in the left
direction but does not move it.


   Figure 4-left shows a synthesized program for a given sub-trajectory. Numbers with a leading
dollar sign are variables which are input parameters of a function. Each lambda in a synthesized
program corresponds to an input function parameter. So two consecutive lambdas represent two
input parameters. The number sign with brackets #(...) denotes extracted functions discovered
in a previous iteration. On the right side, we demonstrate two examples of the reasoning process
by highlighting the responsible grid cells in yellow. The agent’s position is in blue, which is the
same in all visualizations because the partial observation of the agent is aligned to the same
direction. The walls are gray, and the path through the maze is black. The program finds two
objects on the map and compares them. In the first row of the pictures, two empty cells are
compared, and therefore the agent chooses the left action. The left action only turns the agent
90∘ in the left direction, without moving it forward. In the second row, the agent compares a
wall object with an empty cell and decides to go forward, i.e., one step to the right. In Appendix
F, we show more found programs and their visual explanations of the decision-making process.
Figure 5: Examples of the extracted functions from LibT5 that are almost identical semantically but
different syntactically. The functions f49, f113 and f45 are semantically identical. These function
definitions are based on the Lisp language [29].


6.3. Inspecting the Program Library
In this section, we analyze the libraries extracted from the evaluated methods. Appendix G
includes the full libraries. Table 1 shows the number of extracted functions for the different
program synthesis methods. Our introduced method extracts 114 functions after the training,
which is the best method compared to three and seven extracted features from search-based
methods.
   The use of a language model for program search raises a new problem similar to one previously
addressed in Inductive Logic Programming. Cropper analyzed what the perfect library size is
and how to forget unnecessary programs in the library [34]. This is also necessary in our case,
as we assume that LibT5 synthesizes many programs that are semantically the same but differ
syntactically. Therefore, the library learning module extracts many similar functions and adds
them to the library.
   Figure 5 shows some example functions with the same functionality, i.e., getting the value at a
grid position and comparing it to a wall object. The functions f49, f113, and f45 are semantically
identical. That is observable also in the AlphaCode system, which clusters synthesized programs
before selecting solutions for submission to programming competitions [7]. For enumerative
search algorithms, on the other hand, it is more challenging to synthesize many semantically
identical programs since the search space is limited to local changes, and the search stops
upon finding the 𝑘 most likely correct programs. We use 𝑘 = 5, similar to the DreamCoder
hyperparameters. Depending on the problem one wants to solve, this is advantageous as we
avoid the issue of many semantically similar functions. In our case study, however, we need a
neural network to find more complex programs in the first place, but also introduce the problem
above. To mitigate this, we need to improve how the semantic benefit of extracted functions
is evaluated before adding them to the library since we currently use the DreamCoder library
learning module without modifications.


Table 1
Number of extracted functions for different program synthesis methods.
               Method              Enumerative Search    Neural-guided Search    LibT5
           Number of functions             3                      7               114
6.4. Discussion & Limitations
In our experiments, LibT5 could learn a library of functions for a navigation task with a discrete
state and action space. We could also imitate sub-trajectories of policy rollouts up to a length of
30. By traversing the program call graph of synthesized programs, we created visual explanations
for the agent’s decision-making process. We concluded our experiments with analyzing the
generated libraries for the given domain and showed that our method could extract 107 more
functions than DreamCoder.
   While LibT5 showed promising performance for grid-based environments with a small
observation space, we must evaluate its fitness for larger observation spaces. Currently, we limit
the maximum length of the text prompts that the CodeT5 model can handle. We can mitigate
that by using vision encoders, where the state-action pairs are encodable without restrictions
on the observation space.
   Additionally, for this environment, it is straightforward to define the functional primitives for
the agent’s perceptions and actions. However, that becomes challenging for continuous- state
and action spaces, or when an image represents the state. For images, we could use an object
detection model which parses the images before generating text prompts for CodeT5, similar to
[35], where an object detection model parses the image into a structural representation that is
then used in a program. For continuous representations, further research is imperative to verify
the effectiveness of this method for continuous state and action spaces.


7. Conclusion and Future Work
In this paper, we introduced LibT5, a new framework to learn structured and reusable knowledge
in grid-based reinforcement learning environments that allows reasoning about the behavior
of black-box agents. This is realizable by using a language model as a program synthesizer
combined with a library learning module. The main disadvantage of LibT5 is its dependence on
an oracle for collecting trajectories, whereas our method does not depend on much background
knowledge except for the initial functions in the DSL.
   This work opens many possibilities for future work. The main focus is a policy extraction
algorithm that can imitate the entire state-action sequences and not only parts of them. With-
out imitating the complete trajectories, the created explanations can be wrong; As such, the
programs found for longer action sequences are more reliable as they explain more of the
policy. Additionally, we want to evaluate our method on more domains to validate that it is
domain-agnostic. Games are attractive as they enable a better understanding of the strate-
gies discovered by artificial agents through self-play, such as in AlphaGo [36], where humans
struggle to comprehend the reasoning process.


Acknowledgments
The work of S. M. was supported by Grant 01IS20051 from the German Federal Ministry of
Education and Research (BMBF).
References
 [1] R. Waldinger, R. C. T. Lee, PROW: A Step Toward Automatic Program Writing, 1969.
 [2] Z. Manna, R. Waldinger, Knowledge and reasoning in program synthesis, in: Proceedings
     of the 4th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’75,
     Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1975, p. 288–295.
 [3] M. Balog, A. L. Gaunt, M. Brockschmidt, S. Nowozin, D. Tarlow, DeepCoder: Learning to
     Write Programs, 2016. URL: https://openreview.net/forum?id=ByldLrqlx.
 [4] M. Nye, L. Hewitt, J. Tenenbaum, A. Solar-Lezama, Learning to Infer Program Sketches,
     in: Proceedings of the 36th International Conference on Machine Learning, PMLR, 2019,
     pp. 4861–4870. URL: https://proceedings.mlr.press/v97/nye19a.html, iSSN: 2640-3498.
 [5] D. Ritchie, A. Thomas, P. Hanrahan, N. D. Goodman, Neurally-guided procedural models:
     amortized inference for procedural graphics programs using neural networks, in: Pro-
     ceedings of the 30th International Conference on Neural Information Processing Systems,
     NIPS’16, Curran Associates Inc., Red Hook, NY, USA, 2016, pp. 622–630.
 [6] S. Chaudhuri, K. Ellis, O. Polozov, R. Singh, A. Solar-Lezama, Y. Yue, Neurosymbolic
     Programming, Foundations and Trends® in Programming Languages 7 (2021) 158–243.
     URL: https://www.nowpublishers.com/article/Details/PGL-049. doi:1 0 . 1 5 6 1 / 2 5 0 0 0 0 0 0 4 9 ,
     publisher: Now Publishers, Inc.
 [7] Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling,
     F. Gimeno, A. D. Lago, T. Hubert, P. Choy, C. d. M. d’Autume, I. Babuschkin, X. Chen,
     P.-S. Huang, J. Welbl, S. Gowal, A. Cherepanov, J. Molloy, D. J. Mankowitz, E. S. Robson,
     P. Kohli, N. d. Freitas, K. Kavukcuoglu, O. Vinyals, Competition-level code generation
     with AlphaCode, Science 378 (2022) 1092–1097. URL: https://www.science.org/doi/abs/
     10.1126/science.abq1158. doi:1 0 . 1 1 2 6 / s c i e n c e . a b q 1 1 5 8 , _eprint: https://www.science.org/-
     doi/pdf/10.1126/science.abq1158.
 [8] A. Odena, C. Sutton, D. M. Dohan, E. Jiang, H. Michalewski, J. Austin, M. P. Bosma, M. Nye,
     M. Terry, Q. V. Le, Program Synthesis with Large Language Models, in: n/a, n/a, 2021, p.
     n/a.
 [9] Y. Wang, W. Wang, S. Joty, S. C. Hoi, CodeT5: Identifier-aware Unified Pre-trained Encoder-
     Decoder Models for Code Understanding and Generation, in: Proceedings of the 2021
     Conference on Empirical Methods in Natural Language Processing, Association for Compu-
     tational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 8696–8708. URL:
     https://aclanthology.org/2021.emnlp-main.685. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . e m n l p - m a i n . 6 8 5 .
[10] G. Poesia, A. Polozov, V. Le, A. Tiwari, G. Soares, C. Meek, S. Gulwani, Synchromesh: Reli-
     able Code Generation from Pre-trained Language Models, 2022. URL: https://openreview.
     net/forum?id=KmtVD97J43e.
[11] T. Scholak, N. Schucher, D. Bahdanau, PICARD: Parsing Incrementally for Constrained
     Auto-Regressive Decoding from Language Models, in: Proceedings of the 2021 Conference
     on Empirical Methods in Natural Language Processing, Association for Computational
     Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 9895–9901. URL:
     https://aclanthology.org/2021.emnlp-main.779. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . e m n l p - m a i n . 7 7 9 .
[12] L. Hewitt, T. A. Le, J. Tenenbaum, Learning to learn generative programs with Memoised
     Wake-Sleep, in: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence
     (UAI), PMLR, 2020, pp. 1278–1287. URL: https://proceedings.mlr.press/v124/hewitt20a.html,
     iSSN: 2640-3498.
[13] K. Ellis, C. Wong, M. Nye, M. Sablé-Meyer, L. Morales, L. Hewitt, L. Cary, A. Solar-
     Lezama, J. B. Tenenbaum, DreamCoder: bootstrapping inductive program synthesis
     with wake-sleep library learning, in: Proceedings of the 42nd ACM SIGPLAN Interna-
     tional Conference on Programming Language Design and Implementation, PLDI 2021,
     Association for Computing Machinery, New York, NY, USA, 2021, pp. 835–850. URL:
     https://doi.org/10.1145/3453483.3454080. doi:1 0 . 1 1 4 5 / 3 4 5 3 4 8 3 . 3 4 5 4 0 8 0 .
[14] K. Ellis, L. Morales, M. Sablé-Meyer, A. Solar-Lezama, J. Tenenbaum, Learning Libraries of
     Subroutines for Neurally– Guided Bayesian Program Induction, in: Advances in Neural
     Information Processing Systems, volume 31, Curran Associates, Inc., 2018. URL: https:
     //papers.nips.cc/paper/2018/hash/7aa685b3b1dc1d6780bf36f7340078c9-Abstract.html.
[15] D. Cao, R. Kunkel, C. Nandi, M. Willsey, Z. Tatlock, N. Polikarpova, babble: Learning
     Better Abstractions with E-Graphs and Anti-Unification, Proceedings of the ACM on
     Programming Languages 7 (2023) 396–424. URL: http://arxiv.org/abs/2212.04596. doi:1 0 .
     1 1 4 5 / 3 5 7 1 2 0 7 , arXiv:2212.04596 [cs].
[16] M. Bowers, T. X. Olausson, L. Wong, G. Grand, J. B. Tenenbaum, K. Ellis, A. Solar-Lezama,
     Top-Down Synthesis for Library Learning, Proceedings of the ACM on Programming
     Languages 7 (2023) 1182–1213. URL: http://arxiv.org/abs/2211.16605. doi:1 0 . 1 1 4 5 / 3 5 7 1 2 3 4 ,
     arXiv:2211.16605 [cs].
[17] Y. Qing, S. Liu, J. Song, M. Song, A survey on explainable reinforcement learning: Concepts,
     algorithms, challenges, arXiv preprint arXiv:2211.06665 (2022).
[18] A. Verma, H. Le, Y. Yue, S. Chaudhuri, Imitation-Projected Programmatic Reinforce-
     ment Learning, in: Advances in Neural Information Processing Systems, volume 32,
     Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/
     5a44a53b7d26bb1e54c05222f186dcfb-Abstract.html.
[19] A. Verma, V. Murali, R. Singh, P. Kohli, S. Chaudhuri,                                 Programmatically In-
     terpretable Reinforcement Learning,                2018. URL: https://www.semanticscholar.
     org/paper/Programmatically-Interpretable-Reinforcement-Verma-Murali/
     c43bba87b4237a93d96b2a3e91da25d91fd0bb91.
[20] G. Anderson, A. Verma, I. Dillig, S. Chaudhuri, Neurosymbolic Reinforcement Learning
     with Formally Verified Exploration, in: Advances in Neural Information Processing
     Systems, volume 33, Curran Associates, Inc., 2020, pp. 6172–6183. URL: https://proceedings.
     neurips.cc/paper/2020/hash/448d5eda79895153938a8431919f4c9f-Abstract.html.
[21] D. Trivedi, J. Zhang, S.-H. Sun, J. J. Lim, Learning to Synthesize Programs as Interpretable
     and Generalizable Policies, 2022. URL: https://openreview.net/forum?id=wP9twkexC3V.
[22] W. Qiu, H. Zhu, Programmatic Reinforcement Learning without Oracles, 2022. URL:
     https://openreview.net/forum?id=6Tk2noBdvxt.
[23] J. P. Inala, O. Bastani, Z. Tavares, A. Solar-Lezama, Synthesizing Programmatic Policies
     that Inductively Generalize, 2020. URL: https://openreview.net/forum?id=S1l8oANFDH.
[24] T. Silver, K. R. Allen, A. K. Lew, L. Pack Kaelbling, J. Tenenbaum, Few-Shot Bayesian
     Imitation Learning with Logical Program Policies, in: Proceedings of the AAAI Conference
     on Artificial Intelligence, volume 34, 2020, pp. 10251–10258. URL: https://ojs.aaai.org/index.
     php/AAAI/article/view/6587. doi:1 0 . 1 6 0 9 / a a a i . v 3 4 i 0 6 . 6 5 8 7 , iSSN: 2374-3468, 2159-5399
     Issue: 06 Journal Abbreviation: AAAI.
[25] O. Bastani, Y. Pu, A. Solar-Lezama, Verifiable Reinforcement Learning via Pol-
     icy Extraction,            in: Advances in Neural Information Processing Systems, vol-
     ume 31, Curran Associates, Inc., 2018. URL: https://papers.nips.cc/paper/2018/hash/
     e6d8545daa42d5ced125a4bf747b3688-Abstract.html.
[26] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu,
     Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Journal
     of Machine Learning Research 21 (2020) 1–67. URL: http://jmlr.org/papers/v21/20-074.html.
[27] M. Dennis, N. Jaques, E. Vinitsky, A. Bayen, S. Russell, A. Critch, S. Levine, Emergent
     complexity and zero-shot transfer via unsupervised environment design, in: Proceedings
     of the 34th International Conference on Neural Information Processing Systems, NIPS’20,
     Curran Associates Inc., Red Hook, NY, USA, 2020.
[28] J. Parker-Holder, M. Jiang, M. Dennis, M. Samvelyan, J. Foerster, E. Grefenstette, T. Rock-
     täschel, Evolving Curricula with Regret-Based Environment Design, in: Proceedings of
     the 39th International Conference on Machine Learning, PMLR, 2022, pp. 17473–17498.
     URL: https://proceedings.mlr.press/v162/parker-holder22a.html, iSSN: 2640-3498.
[29] J. McCarthy, Recursive functions of symbolic expressions and their computation by
     machine, part i, Commun. ACM 3 (1960) 184–195. URL: https://doi.org/10.1145/367177.
     367199. doi:1 0 . 1 1 4 5 / 3 6 7 1 7 7 . 3 6 7 1 9 9 .
[30] A. Hussein, M. M. Gaber, E. Elyan, C. Jayne, Imitation learning: A survey of learning
     methods, ACM Comput. Surv. 50 (2017). URL: https://doi.org/10.1145/3054912. doi:1 0 .
     1145/3054912.
[31] M. Chevalier-Boisvert, L. Willems, S. Pal, Minimalistic gridworld environment for gymna-
     sium, 2018. URL: https://github.com/Farama-Foundation/Minigrid.
[32] A. Banburski, A. Gandhi, S. Alford, S. Dandekar, S. Chin, tomaso a poggio, Dreaming
     with ARC, in: Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020. URL:
     https://openreview.net/forum?id=-gjy2V1ko6t.
[33] B. Ryder, Constructing the call graph of a program, IEEE Transactions on Software
     Engineering SE-5 (1979) 216–226. doi:1 0 . 1 1 0 9 / T S E . 1 9 7 9 . 2 3 4 1 8 3 .
[34] A. Cropper, Forgetting to Learn Logic Programs, Proceedings of the AAAI Conference
     on Artificial Intelligence 34 (2020) 3676–3683. URL: https://ojs.aaai.org/index.php/AAAI/
     article/view/5776. doi:1 0 . 1 6 0 9 / a a a i . v 3 4 i 0 4 . 5 7 7 6 , number: 04.
[35] K. Yi, J. Wu, C. Gan, A. Torralba, P. Kohli, J. B. Tenenbaum, Neural-symbolic vqa: Dis-
     entangling reasoning from vision and language understanding, in: Neural Information
     Processing Systems, 2018.
[36] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser,
     I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalch-
     brenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis,
     Mastering the game of Go with deep neural networks and tree search, Nature 529 (2016)
     484–489. doi:1 0 . 1 0 3 8 / n a t u r e 1 6 9 6 1 .
A. Algorithm
Algorithm 1 summarizes the steps for discovering a library of functions and training the CodeT5
model. The assessment if the library is improved, only checks if the improved library contains
more functions as the library from the last iteration. If no more functions are extracted in this
iteration, we will stop the training process.

  Algorithm 1: The complete training algorithm for LibT5
   Input : imitation data 𝐷, initial grammar 𝐺, environments 𝐸
   Output : library 𝐿, model 𝜃
 1 improved_library = True;
 2 sequence_length = 5
 3 number_of_programs = 50000
 4 epochs = 5
 5 program_ast_depth = 6
 6 model = CodeT5()
 7 while(improved_library) {
 8    train_dataset = generate_random_training_dataset(number_of_programs, E, G,
     program_ast_depth)
 9    model = train_model(model, train_dataset, epochs)
10    test_tasks = generate_test_tasks_from_data(D, sequence_length)
11    programs = synthesize_programs_for_test_tasks(test_tasks)
12    programs = filter_correct_programs(programs, test_tasks)
13    new_G = extract_library(programs)
14    improved_library = compare_libraries(new_G, G)
15    if (improved_library) {
16        G = new_G
17    }
18    sequence_length++
19 }
20 return model, L;


B. Domain-specific Language
Table 2 shows the initial domain-specific language, which contains only the primitives necessary
to get different cells on the grid, the control flow structures and Boolean operators. Since we
use a typed DSL, we show the types for each function or value. If our primitive is a value, only
one type appears in the type column. For functions, multiple types are combined with an arrow
→. The last type represents the return value of the function. The types before it are the types
of the input parameters. The type f u n c represents a function because if-clauses returns a new
function to execute depending on the condition since partial programs are also functions in
Lisp.
  To generate random programs, we can specify the types of program to be generated. In our
case, we always want programs of type m a p → d i r e c t i o n → a c t i o n , so a random program is
always defined from two input parameters of type m a p and d i r e c t i o n and returns a value of
type a c t i o n .

Table 2
The used domain-specific language at the beginning. The type column shows one type for values and
several types separated by an arrow for functions. The type after the last arrow is the return type of the
function. The types before it are the types of the input parameters.
  Primitive Function/Values   Description                                     Type
  left, right, forward        possible actions                                action
  0, 1, 2, 3 and 4            integer values                                  int
  2D array                    2D grid observation of the agent                map
  direction-0, direction-1    represents either north, east, south and west   direction
  direction-2, direction-3
  empty, wall, goal           possible objects on the map                     object
  if                          standard if-clause                              bool → func → func → func
  eq-direction?               checks if two directions are equal              direction → direction → bool
  eq-obj?                     checks if two objects are equal                 object → object → bool
  get                         get a object on the map for two coordinates     map → int → int → object
  not                         negates a Boolean value                         bool → bool
  and                         conjunction of two Boolean values               bool → bool → bool
  or                          disjunction of two Boolean values               bool → bool → bool


C. Text Prompts
Text prompts are generated by converting the agent’s partial observation into a string represen-
tation and then concatenating the string representation with the action. This is repeated for all
state-action pairs until each pair in the sequence is represented as a string. Then all strings are
combined into a single text prompt.
   Figure 6 shows an example of a state-action sequence of length five. On the right of the first
line is the 2D array, followed by the string representation and the selected action. On the left is
the corresponding maze represented by the 2D array. The final text prompt is then generated
by iteratively concatenating all the string representations and actions for the entire sequence.
The final text prompt for the state-action sequence is:
22222222221222212222122220 left 12222222221222222222222223 left
11121121221211122222222222 left 22222222221111122122111211 forward
22222222221111121222112111 forward

The 1 represents empty grid cells and the 2 represents wall objects on the map.


D. Experimental Setup & Hardware Resources
Table 3 shows the hyperparameters for the different methods. We use the same hyperparameters
for each iteration. We adopted the hyperparameters from the DreamCoder system to our problem
Figure 6: Text prompts are created by converting the 2D array into a string were all values are concate-
nated without spaces.


and tried out a few timeouts; Nevertheless, we did not find any improvements in increasing the
search timeout.
   For the neural-guided search, we train an encoder-decoder neural network. The encoder for
the state observations was adapted from the ACCEL agent [28]. The decoder, which predicts
which functional primitives are used in the program to be synthesized, is generated automatically
from the DreamCoder system [13]. We train the neural-guided search model for 5000 update
steps. In addition, before training a neural network, the DreamCoder system first performs
an enumerative search to obtain examples for generating random training programs. The
synthesized programs from the enumerative search are also added to the training dataset. The
library learning module is restricted to an arity of three, which means that extracted functions
can have up to three input parameters.
   We tried out different numbers of training programs and also encoders for DreamCoder,
since in Figure 3, DreamCoder and the enumerative search algorithm both resulted in the same
performance. Using the default DreamCoder parameters with 500 random training programs
achieved the same accuracy as using 5000 or 50000 programs. We concluded that our problem
to solve is not well-suited for DreamCoder compared to the hand-crafted tasks that [13] uses to
evaluate the DreamCoder system.


E. Ablation Study: Curriculum
In this section, we justify our decision to use the curriculum based on an ablation study using
the method without a curriculum. Table 4 shows the number of solved tasks. None of the
Table 3
The hyperparameters for the different methods.
             Hyperparameters         Enumerative Search     DreamCoder       CodeT5     LibT5
             Search Timeout          720 seconds            720 seconds      -          -
             Epochs                  -                      50               5          5
             Training Programs       -                      50000            50000      50000
             Arity                   3                      3                -          3

Table 4
Ablation study of the method without using a curriculum.
                                 Enumerative Search       DreamCoder      CodeT5      LibT5
                Solved Tasks     0/14                     0/14            0/14        0/14

Table 5
The sequence lengths of the collected trajectories.
         Task       1     2     3     4    5     6    7      8    9     10    11   12    13     14
         Length     131   75    73    61   112   63   145    80   96    80    50   92    87     18


methods could solve a task; Thus, we analyzed the sequence lengths of the trajectories, which
are displayed in Table 5. Since the tasks are limited and none of them were solved, no method
could extract a library from solved tasks. To build a library so that more tasks are solvable in
later iterations, we introduce a curriculum based on the sequence length. Using that, we can
generate 26 tasks with a sequence length of 5 from the first task with a sequence length of 131.
By converting all complete trajectories into sub-trajectories, the number of tasks increases from
14 to 229, i.e., sufficient to extract functions from solved tasks that form a library.


F. Synthesized Programs & Visualization of the Decision
   Making Process
We show two more examples of the agent’s decision-making process. The agent’s position
is colored in blue. Grey and black show walls and empty cells, respectively. The cells that
the agent attends are yellow. In Figure 7-left, we see a synthesized program. Numbers with
a leading dollar sign are variables which are input parameters of a function. Each lambda
in a synthesized program corresponds to an input function parameter. So two consecutive
lambdas represent two input parameters. The number sign with brackets #(...) denotes extracted
functions discovered in a previous iteration.
   The program first checks the direction the agent faces and then determines a position on the
grid and compares it to a wall object. The position on the grid depends on the agent’s direction,
so the cell the agent compares changes in the third row of pictures. Figure 8 shows another
example of the agent’s reasoning process. The program first checks if the position (1,0) is empty
and then turns 90∘ to the left if that is the case. If it is not, the agent checks the cell in front of it
(2,1) and moves forward if the position in front of the agent is empty, otherwise it turns left.
 (lambda (lambda (#(lambda
   (lambda (#(lambda
     (if $0 left-action forward-action))
       (eq-obj? wall-obj (get $0 2 $1)))))
       (if (eq-direction? direction-2 $0) 1 3) $1)))


Figure 7: Left: The program for a given sub-trajectory. Right: The decision-making process, when
executing the program on the state-action sequence. We show explanations for the first four states of
the sub-trajectory. The synthesized program first checks the direction the agent faces, then determines
a position on the grid and compares it to a wall object.


G. Extracted Libraries
In this section, we present the full libraries for the three different methods. The enumerative
search was only able to extract three functions (Figure 9), the neural-guided search builds a
library of seven functions (Figure 10), and our method was able to extract 114 functions (Figure
11).
 (lambda (lambda
   (if (eq-obj? wall-obj (get $1 1 0))
   (#(lambda (#(lambda
     (if $0 forward-action left-action))
       (eq-obj? ($0 #(lambda (get $0 2 1)))
           empty-obj)))
       (lambda ($0 $2))) (#(lambda (if (eq-obj? $0
           empty-obj)
       forward-action left-action)) wall-obj))))


Figure 8: Left: The program for a given sub-trajectory. Right: The decision-making process, when
executing the program on the state-action sequence. We show explanations for the first five states of
the sub-trajectory. This program first checks if the position (1,0) is an empty cell. If it is empty, the
agent turns 90∘ to the left, otherwise it checks the position (2,1) in front of it. If (2,1) is empty the agent
moves forward, if there is a wall the agent turns to the left.


                                                                  f2=(λ (x y z) (f1 (or
                                   f1=(λ (x) (if x
                                                                   (eq-obj? (get x 0 0)
                                   forward-action
                                                                        wall-obj)
                                     left-action))
                                                                  (eq-direction? z y))))


                            f0=(λ (x y) (eq-obj? y (get x
                                        0 0)))


Figure 9: The extracted functions from programs found by using an enumerative search algorithm.
                                                        f3=(λ (x y) (f2 (λ (z) (or (z
                                                           y) (eq-obj? wall-obj
                                                              (get x 0 0))))))


                       f2=(λ (x y) (if (x
                       (eq-direction? y))               f4=(λ (x) (f2 (λ (y) (not (y
                        forward-action                              x)))))
                          left-action))

                                                          f6=(λ (x) (f2 (λ (y) (not
                                                               (not (y x))))))


                                                       f1=(λ (x y z) (f0 x (λ (u) (and
                                                               (u empty-obj)
                                                              (eq-direction? y
                   f0=(λ (x y) (if (y (eq-obj?                      z)))))
                         (get x 0 0)))
                          left-action
                       forward-action))
                                                       f5=(λ (x) (f0 x (λ (y) (not (y
                                                               wall-obj)))))


Figure 10: The extracted functions from programs found by using the neural-guided search algorithm.
                            f111=(λ (x y) (if (eq-obj?
                              (get x y 1) empty-obj)
                                 forward-action
                                    left-action))


                            f102=(λ (x y) (if (eq-obj?
                              (get x 2 y) empty-obj)
                                                               f103=(λ (x) (f102 x 1))
                                 forward-action
                                    left-action))


                           f100=(λ (x) (if (eq-obj? (get
                                x 2 1) empty-obj)
                                 forward-action
                                   left-action))


                             f86=(λ (x y) (if (eq-obj?
                                  (get x 2 1) y)
                                 forward-action
                                   left-action))


                             f70=(λ (x) (if (eq-obj? x
                                   empty-obj)
                                                            f71=(λ (x) (f70 (get x 2 1)))
                                 forward-action
                                   left-action))                                               f83=(λ (x y) (f51 (get y 2 x)))
                                                             f51=(λ (x) (f14 x wall-obj))
                                                                                               f78=(λ (x) (f51 (get x 2 1)))
                            f68=(λ (x) (if (not (eq-obj?
                               (get x 2 1) wall-obj))       f57=(λ (x y) (f14 (get y x 1)
                                  forward-action                     wall-obj))
                                    left-action))

                                                             f64=(λ (x) (f14 (get x 2 1)
                                                                      wall-obj))
                               f63=(eq-direction?
                                   direction-2)
                                                              f69=(λ (x) (f14 wall-obj
                                                                    (get x 2 1)))

                              f35=(λ (x) (if (eq-obj?
                               wall-obj (get x 2 1))        f75=(λ (x y) (f14 (get x 2 1)
                                    left-action                           y))
                                 forward-action))

                                                             f30=(λ (x y) (f14 (f1 x) y))       f41=(λ (x) (f30 x wall-obj))


                                                           f74=(λ (x y) (f14 (f1 y) (get y
                            f13=(λ (x) (if (not (eq-obj?                1 x)))
                               wall-obj (get x 2 1)))
                                 forward-action
                                    left-action))
                                                           f92=(λ (x y) (f14 (f1 x) (get x
                                                                        3 y)))

                               f14=(λ (x y) (if (not
                                   (eq-obj? y x))          f32=(λ (x) (f14 (f1 x) (get x 1
                                  forward-action                         4)))
                                    left-action))

                                                           f65=(λ (x) (f14 (f1 x) (get x 3
                                                                         0)))
                                                                                              f109=(λ (x) (f40 (λ (y) (get x 2
                                                                                                            1))))
                                                           f105=(λ (x y) (f14 (f1 y) (get y
                                                                        3 x)))
                                                                                                   f90=(λ (x) (f48 x 1))


                                                            f40=(λ (x) (f14 (x (λ (y) (f1
                                                                   y))) wall-obj))                f62=(λ (x) (f15 x 1 3))


                                                                                                  f26=(λ (x) (f15 x 1 1))
                                                           f48=(λ (x y) (f14 (f1 x) (get x
                                                                        y 4)))

                                                                                                 f96=(λ (x y) (f15 y x 1))
                                                           f15=(λ (x y z) (f14 (f1 x) (get
                                                                       x z y)))
                                                                                                 f114=(λ (x y) (f15 x y 1))


                                                            f25=(λ (x y) (f14 (get y 1 x)
                                                                       (f1 y)))                   f16=(λ (x) (f15 x 3 1))


                                                                                                 f46=(λ (x y) (f15 x y 3))
                                                               f91=(λ (x) (f14 (x f1)
                                                                     wall-obj))

                                                                                                  f56=(λ (x) (f15 x 1 0))
                                                               f77=(λ (x) (f14 (f1 x)
                                                                     wall-obj))
                                                                                                     f20=(λ (x) (f19 x
                                                                                                        empty-obj))

                                                            f19=(λ (x y) (if (eq-obj? (f1
                                                               x) y) forward-action
                                                                    left-action))                f52=(λ (x) (f51 (f4 x 1)))


                                                                                               f107=(λ (x) (f91 (λ (y) (f4 x
                                                                 f4=(λ (x) (get x 2))
                                                                                                           1))))


                                                             f66=(λ (x) (if (eq-obj? (f1             f5=(λ (x) (f4 x 1))
                                                                   x) wall-obj)
                                                                    left-action
                                                                 forward-action))
                                                                                               f101=(λ (x) (f97 (λ (y) (f4 x
                                                                                                           1))))

                                                             f97=(λ (x) (if (not (eq-obj?
                                                                  wall-obj (x f1)))           f98=(λ (x) (f97 (λ (y) (get x 2
                                                                  forward-action                            1))))
                                                                    left-action))

                               f1=(λ (x) (get x 2 1))

                                                             f73=(λ (x) (if (not (eq-obj?
                                                                  wall-obj (f1 x)))
                                                                  forward-action
                                                                    left-action))


                                                               f76=(λ (x) (if (eq-obj?
                                                                   wall-obj (f1 x))
                                                                     left-action
                                                                  forward-action))


                                                             f22=(λ (x) (if (eq-obj? (f1
                                                                   x) empty-obj)
                                                                  forward-action
                                                                    left-action))


                                                                 f0=(λ (x y) (if (not
                                                                 (eq-obj? (get x 2 y)
                                                                      wall-obj))               f93=(λ (x y) (f0 (if x y y) 1))
                                                                   forward-action
                                                                     left-action))


                                                                f59=(λ (x y) (if (not
                                                                  (eq-obj? y (f1 x)))           f60=(λ (x) (f59 x wall-obj))
                                                                   forward-action
                                                                     left-action))
                                                                                                f3=(λ (x) (f0 (if (f2 x) x x)
                                                                                                             1))
                                                                 f6=(λ (x y) (if (not
                                                                  (eq-obj? x (f1 y)))
                                                                   forward-action                     f7=(f6 wall-obj)
                                                                     left-action))


                                                               f88=(λ (x) (if (eq-obj?
                                                                  empty-obj (f1 x))
                                                                   forward-action
                                                                     left-action))


                                                             f36=(λ (x) (if (not (eq-obj?
                                                                 (f1 x) empty-obj))
                                                                     left-action
                                                                  forward-action))


                                                             f28=(λ (x) (f27 (eq-obj? (x      f29=(λ (x) (f28 (λ (y) (get x 2
                                                                  f1) empty-obj)))                          1))))


                                                                f84=(λ (x) (if (not x)
                                                                     left-action                  f85=(λ (x) (f84 (f2 x)))
                                                                  forward-action))


                                                                                               f55=(λ (x) (f27 (or (f2 x) (f2
                                                                                                            x))))

                                                              f2=(λ (x) (eq-obj? (f1 x)
                                                                    empty-obj))                  f54=(λ (x) (if (not (f2 x))
                                                                                                        left-action
                                                                                                     forward-action))
                                                            f47=(λ (x) (f27 (eq-obj? (f1
                                                                  x) empty-obj)))


                                                              f34=(λ (x) (f27 (eq-obj?           f9=(λ (x) (f8 (not (x f2))))
                                                                 empty-obj (f1 x))))


                                                             f39=(λ (x) (f8 (eq-obj? (f1
                                                                   x) wall-obj)))
                                                                                                  f38=(λ (x) (f37 (f2 x)))

                                  f27=(λ (x) (if x
                                                            f99=(λ (x y) (f8 (eq-obj? (f1
                                  forward-action
                                                                (if y x x)) wall-obj)))
                                    left-action))


                                                            f50=(λ (x y) (f8 (eq-obj? (f1
                                                                  y) (get y x 4))))


                                                                                                f10=(λ (x) (f8 (not (f2 x))))
                                                              f79=(λ (x) (f27 (eq-obj?
                                                               empty-obj (get x 2 1))))


                                                             f87=(λ (x y) (f27 (eq-obj?        f94=(λ (x y) (f42 (get y 2 x)))
                                                                wall-obj (get y 1 x))))

                                                                                               f95=(λ (x) (f42 (get x 2 1)))      f44=(λ (x) (f43 x 1))
                                                              f67=(λ (x) (f27 (eq-obj?
                                                               (get x 2 1) empty-obj)))
                                                                                               f43=(λ (x y) (f42 (get x 2 y)))   f110=(λ (x) (f17 x 4 x))


                                                                                                f108=(λ (x) (f11 x (if (f2 x)    f18=(λ (x y) (f17 y 1 x))
                                                               f37=(λ (x) (f8 (not x)))             goal-obj wall-obj)))

                            f8=(λ (x) (if x left-action                                                                           f21=(λ (x) (f17 x 4))
                                forward-action))                                               f17=(λ (x y z) (f11 x (get z y
                                                                                                            4)))
                                                             f42=(λ (x) (f27 (eq-obj? x                                          f33=(λ (x) (f17 x 3 x))
                                                                   empty-obj)))
                                                                                              f89=(λ (x) (f11 x (get x 1 4)))
                                                                                                                                  f80=(λ (x) (f17 x 3))
                                                            f11=(λ (x y) (f8 (eq-obj? (f1
                                                                       x) y)))                 f12=(λ (x y) (f11 y (get y x
                                                                                                           4)))                  f104=(λ (x) (f17 x 1 x))

                                                               f72=(λ (x) (f8 (eq-obj?
                                                                wall-obj (get x 2 1))))          f61=(λ (x) (f23 (f4 x 1)))       f106=(λ (x) (f17 x 1))


                                                              f81=(λ (x y) (f8 (eq-obj?        f82=(λ (x) (f81 1 (if (f2 x) x
                                                                wall-obj (get y 2 x))))                     x)))


                                                            f53=(λ (x) (f8 (eq-obj? (get               f112=(f81 1)
                                                                 x 2 1) wall-obj)))

                                                                                               f58=(λ (x y) (f23 (get y x 1)))

                                                               f23=(λ (x) (f8 (eq-obj?
                                                                    wall-obj x)))              f24=(λ (x y) (f23 (get x 2 y)))    f49=(λ (x) (f24 x 1))


                                                                                               f31=(λ (x y) (f23 (get y 2 x)))        f113=(f31 1)


                                                                                               f45=(λ (x) (f23 (get x 2 1)))


Figure 11: The extracted functions from programs found by using our introduced method LibT5 (zoom
in for better visibility).