=Paper=
{{Paper
|id=Vol-3092/p12
|storemode=property
|title=FEFFuL: a Few-Examples Fitness Function Learner
|pdfUrl=https://ceur-ws.org/Vol-3092/p12.pdf
|volume=Vol-3092
|authors=Nicolo’ Brandizzi,Andrea Fanti,Roberto Gallotta,Christian Napoli
|dblpUrl=https://dblp.org/rec/conf/system/BrandizziFGN21
}}
==FEFFuL: a Few-Examples Fitness Function Learner==
<pdf width="1500px">https://ceur-ws.org/Vol-3092/p12.pdf</pdf>
<pre>
FEFFuL: a Few-Examples Fitness Function Learner
Nicolo’ Brandizzi a , Andrea Fantia , Roberto Gallottaa and Christian Napolia
a
    Department of Computer, Automation and Management Engineering, Sapienza University of Rome, via Ariosto 25 Roma 00185, Italy


                                             Abstract
                                             This paper presents FEFFuL, an architecture used to estimate the fitness value of a generated artifact in any Evolution Strategy
                                             (ES) system that would otherwise require human evaluation, i.e.: Interactive Evolutionary Computation (IEC) systems. By
                                             learning directly human preferences, the FEFFuL network aims to reduce user’s fatigue to a minimum while also adapting
                                             to new emergent artifacts. We apply here FEFFuL in the context of evaluating generated structures in the popular game
                                             Minecraft.

                                             Keywords
                                             Evolution Strategy, Fitness Estimation, Human-in-the-Loop Control,


1. Introduction                                                                                                            interaction needed.
                                                                                                                              In our experiment, we tackle the generation of inter-
In Evolutionary Computation (EC) and Evolution Strate-                                                                     esting structures in the popular videogame Minecraft.
gies (ES), a global optimization problem is tackled by                                                                     Minecraft is a sandbox construction videogame set in
taking inspiration from biologic evolution: an initial pop-                                                                a voxel-based environment with various basic building
ulation of solutions is evolved by selection and mutation,                                                                 blocks, such as wood, stone, glass, water, etc. The ma-
producing new solutions with higher values of the task’s                                                                   jor advantage of Minecraft over most Artificial Life (AL-
objective function. In this context, a particular solution                                                                 ife) domains is that surprisingly complex and functional
is referred to as an individual, while its value of the ob-                                                                structures (e.g. moving robots, word processors, etc.) can
jective function is its fitness. Genetic Algorithms (GA) are                                                               all be built from the same basic building blocks, which
a family of algorithms in EC in which individuals are en-                                                                  aligns well with the few chemical building blocks that
coded by their genotype, which contains the information                                                                    produce complex biological systems [10, 11, 12].
to reconstruct the actual solution, called phenotype[1, 2].                                                                   User fatigue is a major challenge in IEC, and since the
The genotypes are mutated and combined in different                                                                        human fatigue threshold cannot be improved, the avail-
ways to produce the next generation of solutions.                                                                          able evaluations must be exploited as much as possible.
   There are several domains in which a fitness function                                                                   Our approach to this issue builds on the techniques men-
is unknown or very hard to compute, e.g. visual [3, 4, 5, 6]                                                               tioned above, by alternating human evaluation and the
or musical [7, 8] appeal. In Interactive Evolutionary Com-                                                                 training of a fitness estimator model in a single run of
putation (IEC), this problem is overcome by using manual                                                                   the genetic algorithm. This approach allows to use few
human evaluation to compute the fitnesses of individu-                                                                     human interactions while still perfoming a high number
als in each generation. One of the main issues of IEC is                                                                   of generations. Moreover, since the fitness estimator is
user fatigue, which greatly limits the amount of human                                                                     re-aligned with human preferences almost periodically, it
evaluations available; this usually poses bounds both                                                                      can also adapt to artifacts only seen in later generations.
on the number of possible generations and the number                                                                       As a convenient side-effect, more than one human (pos-
of individuals that can be evaluated at each generation.                                                                   sibly with slightly different preferences) can contribute
To mitigate this problem the interactions are often care-                                                                  to a single run, which demonstrates how this approach
fully designed so as to minimize psychological fatigue                                                                     can be extended from IEC to Collaborative Interactive
[9]. Function approximators (e.g. neural networks) are                                                                     Evolution (CIE) [13]. Unlike previous approaches such
also often used to learn the preferences of the human                                                                      as [14] our approach leverages data collected through
experimenter [9, 7], so that this information can be used                                                                  the evolution process to further reduce user fatigue even
in subsequent runs of the GA and reduce the amount of                                                                      when in an IEC setting.
SYSTEM 2021 @ Scholar’s Yearly Symposium of Technology,
Engineering and Mathematics. July 27–29, 2021, Catania, IT                                                                 2. Related work
" brandizzi@diag.uniroma1.it (N. B. );
fanti.1650746@studenti.uniroma1.it (A. Fanti);                                                                             In this section we give a technical overview of our appli-
gallotta.1890251@studenti.uniroma1.it (R. Gallotta);
cnapoli@diag.uniroma1.it (C. Napoli)                                                                                       cation of FEFFuL to the Minecraft environment.
 0000-0002-3336-5853 (C. Napoli)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                                      75
Nicolo’ Brandizzi et al. CEUR Workshop Proceedings                                                                              75–79


2.1. Artifacts generation                                                 where dec is the decoding function. The artifact genera-
                                                                          tion is then simply
An artifact is a structure with precise width, height and
depth. These values are user defined. The artifact is the                                      𝑜 = 𝐶𝑃 𝑃 𝑁𝐺 (𝑖)
phenotype of a genome. Each genome uniquely encodes
information used to generate the artifact. We use the Neu-                where 𝑖 ∈ 𝑅3 and 𝑜 ∈ 𝑅2 are, respectively, the inputs
roEvolution of Augmenting Topologies (NEAT) [15] to                       and the outputs of the network, both constrained in the
control how these genomes evolve over the generations.                    real-valued interval [0, 1]. In order to show the artifact
Each genome is part of a population of fixed size and,                    as a structure to the user, 𝑜 is then transformed. By
thanks to the NEAT algorithm, is mutated to increase                      assigning to 𝑏 the number of admissible blocks and to 𝑟
the complexity of the resulting artifact. In fact, NEAT en-               the number of admissible block rotations, we have that
sures an ever-increasing complexification of the artifacts
and, thanks to speciation and the use of elitism, it also                                    𝑜1 = ⌊𝑜1 * (𝑏 + 1)⌋
ensures a lasting diversification of phenotypes (diver-
sity here is measured as genomic distance). Additionally                                     𝑜2 = ⌊𝑜2 * (𝑟 + 1)⌋
we set a low stagnation level to ensure low-performing                       The user has to choose the admissible blocks and val-
genomes (i.e.: genomes that generate uninstering arti-                    ues by modifying a configuration file. The resulting arti-
facts for the user) are pruned away and leave space in                    facts are then evaluated in the Minecraft game using the
the population for more interesting genomes. During                       EvoCraft [10] interface.
evolution each genome is mutated and new genomes are
created by combining two random parent genomes. Since
genomes with high fitness are more likely to reproduce                    2.2. Artifacts evaluation
and pass on their properties to their offsprings, we ensure               The generated artifacts undergo an user defined Mini-
that the entire population slowly converges to diverse,                   mum Criterion (MC) step at each generation before eval-
high-fitness solutions.                                                   uation. This step removes artifacts that don’t satisfy this
   The genomes are used to encode a Compositional Pat-                    minimum requirement. Only a predefined number of
tern Producing Networks (CPPNs) [16]. Such a network                      artifacts sampled randomly from the artifacts that pass
is composed of blocks of elementary activation func-                      the MC step are then presented to the user for direct
tions connected together by weighted connections. The                     evaluation. In our experiment, the MC consisted in re-
genomes encode the structure, weights and biases of the                   moving artifacts that didn’t contain enough air blocks
network. Mutations during evolution consist in adding or                  and enough solid blocks (both values were expressed
removing connections or blocks, perturbing the weights                    as minimum percentages). The number of maximum
and biases values and changing the activation functions.                  artifacts that could be presented to the user was set to
The CPPNs thus directly map from genotype to pheno-                       24.
type without local interaction and are applicable on a                       The user is then tasked to evaluate the generated ar-
infinite-sized input space (a typical application is, for                 tifacts. This is done by looking at the structures on the
instance, 2D image generation from pixel coordinates).                    Minecraft client applet and choosing the most interesting
Due to their architecture, CPPNs create patterned out-                    ones. During manual (human) evaluation the program
puts and can develop pure symmetry and symmetry with                      accepts a list of indexes that correspond to the selected ar-
variation, which in turn make for appealing artifacts.                    tifacts. Since the mapping from genomes to artifacts is 1:1,
   In our experiment, we use the NEAT Python [17] li-                     we can assign the fitness value to the correct correspond-
brary and the PyTorch-NEAT1 to integrate both the NEAT                    ing genome that generated the artifact. The possible
evolutionary algorithm and the CPPNs architecture. The                    fitness values are 1 for the genomes the user is interested
inputs to the networks are the scaled 𝑋, 𝑌 and 𝑍 coor-                    in and 0 for the others. We note that we automatically
dinates (∈ [0, 1]) of the artifact and the outputs are the                assign a fitness value of 0 to all the genomes that didn’t
block type and rotation (both values are output ∈ [0, 1]                  pass the MC step to discourage their traits to appear
and later scaled to the admissible values).                               again later in the generations, as these are not interest-
   The artifact generation process can be formally ex-                    ing to the user. This ensures that only the phenotypically
pressed as follows: given a population 𝑃 at the genera-                   interesting genomes survive throughout the evolution
tion 𝑔 of genomes 𝐺s, we first define the CPPN network                    process.
that the genome encodes as                                                   During human evaluation we save the artifacts and
                                                                          their fitness in a memory buffer. Once the buffer is at ca-
                    𝐶𝑃 𝑃 𝑁𝐺 = dec(𝑃𝐺𝑔 )                                   pacity, the FEFFuL network can be trained in a supervised
    1
                                                                          fashion to directly estimate the user preferences given
      The PyTorch-NEAT library is available at https://github.com/
uber-research/PyTorch-NEAT/
                                                                          the artifacts. The network itself is rather simple: a 3D


                                                                     76
Nicolo’ Brandizzi et al. CEUR Workshop Proceedings                                                                    75–79


Convolution maps the artifact to a sequence of feature           also the FEFFuL network has to reflect this behavior. We
maps that are later flattened and passed through multiple        solved this problem with two simple corrections to the
residual blocks that in the end output a value ∈ [0, 1]          network’s behavior. First, we note that the buffer is con-
that represent how likely the human would have marked            stantly updated only with user’s selected artifacts and fit-
the artifact as interesting. A graphical overview of the         nesses, effectively making it a constantly-updated dataset
FEFFuL network can be found in 1.                                we can train the network from. The corrections are the
                                                                 following:
                                                                     1. We first use the network to evaluate the artifacts
                                                                        for a given number of generations at a time. This
                                                                        value is increased as the evolution process goes
                                                                        on, thus leveraging the network more and more
                                                                        as time goes on.
                                                                     2. We enforce fine-tuning of the network after the
                                                                        aforementioned number of generations. We do
                                                                        this by prompting the user to evaluate the cur-
                                                                        rent generation, collect the preferences in the
                                                                        buffer and finetuning the network on the updated
                                                                        dataset. At this point of the process there could be
                                                                        some disalignment between user preferences and
                                                                        network estimates. This would be harmful to the
                                                                        entire experiment as it would diverge the search.
Figure 1: Architecture of the FEFFuL network                            We solve this by activating the network only if
                                                                        its accuracy over unseen artifacts is higher than a
                                                                        given threshold. Otherwise the user is prompted
   Due to the nature of the task, it is expected that the               again until the network can be activated.
buffer would be unbalanced: it is more likely that there
would be more examples of discarded artifacts than ac-
cepted artifacts. For this reason, we balance the dataset        3. Methodology and Results
before training the network. We do this by both over-
sampling accepted artifacts and then downsampling dis-           We first report the experiment settings in order to repro-
carded artifacts. The former is accomplished by augment-         duce our results at 1. These are taken from the experiment
ing artifacts by rotation: we rotate the structure along         configuration file in our repository; we are not going to
the 𝑌 -axis by 90. We can do this since the artifact must        report the settings for the NEAT algorithm which can be
be interesting regardless of the orientation. The latter         found in the neat configuration file in our repository.
is instead accomplished by removing discarded artifacts
from the dataset until a good ratio of accepted artifacts         Artifact width                                  5
                                                                  Artifact height                                 7
and discarded artifacts is reached.
                                                                  Artifact depth                                  5
   The output value of the FEFFuL module is directly as-
                                                                  Max number of artifact shown to the user        24
signed as fitness value to the corresponding genome. This         Min percentage of air blocks in an artifact     22%
gives an additional significance to the fitness value: it not     Min percentage of solid blocks in an artifact   32%
only marks the user preference but also the network’s             Admissible rotations                            NORTH, WEST, SOUTH,
confidence in its output. Thus, a low-fitness genome                                                              EAST, UP, DOWN
would have been less likely to have been picked by the            Admissible blocks                               AIR, COBBLESTONE, STONE,
user than a high-fitness one. This process is important                                                           STONE_STAIRS STONEBRICK,
during the reproduction step of the NEAT algorithm as                                                             QUARTZ_BLOCK
it heavily relies on the genome’s fitness to order the pos-       RNG seed                                        42
sible parents of new individuals.                                 Buffer capacity                                 512
                                                                  Batch size                                      32
                                                                  Number of convolution channels                  16
2.3. The alignment problem                                        Number of epochs                                100
                                                                  Interval of training                            5
Due to the ever-complexification of the genomes thanks            Activation threshold on test accuracy           0.5
to the NEAT algorithm, both the structures and the user’s
preferences change over time. An artifact that the user Table 1
deemed interesting at generation 0 may not be as in- Experiment parameter summary
teresting if found at generation 100. This implies that


                                                            77
Nicolo’ Brandizzi et al. CEUR Workshop Proceedings                                                                     75–79


  We then report the train and test accuracy and loss
evaluated at different generations at 2 and 3 respectively.
We note that, while the training metrics behave as ex-
pected, in the test we can see a rising trend for the loss
metric. The accuracy however remains well over the 80%,
which we consider a good value.


                                                                  Figure 4: Overview of the active evaluator per generation


                                                                  was not the evaluation anymore but the generation of
Figure 2: Train accuracy and loss of the FEFFuL network at        the new genomes instead.
different generations


Figure 3: Test accuracy and loss of the FEFFuL network at
different generations


   We ran the experiment for 100 generations. While this Figure 5: Artifacts produced on generation 0 (left) and on
                                                             generation 100 (right)
is a rather low number of generations, the aim of this
project was not to evolve structures to a certain level
of user satisfiability but to show that it was possible         Finally, we report a comparison between the artifacts
to use a NN to approximate user’s preferences, reduce generated on generation 0 and those generated on gen-
user fatigue and save time. The metrics prove that we eration 100 at 5. We note how the artifacts are more
succeeded in the first part of the task. For what concerns complex on generation 100 as well as in line with user’s
user fatigue, it is known that a single user can evaluate preferences during evolution.
around 20 generations worth of artifacts before starting
to feel fatigued [9]. In the entire evolution process only
29 generations worth of artifacts where evaluated by the 4. Conclusions and future work
user, whereas the remaining 71 were evaluated by the
                                                             The results shown in the previous section suggest that
FEFFuL network. We report a graph that shows which
                                                             this approach can effectively be used to mitigate the user
evaluator was used at each generation at 4.
                                                             fatigue problem in IEC tasks. More specifically, most of
   The time gain over the same amount of generations is
                                                             the interactions with the human experimenter are in the
well apparent: we estimated that the human user takes
                                                             first few generations, where the buffer is not completely
from 2 to 3 minutes to evaluate a single batch of artifacts,
                                                             full and the estimator cannot be trained. After the first
whereas the FEFFuL network only a few milliseconds. In
                                                             alignment, the human evaluations were needed very spo-
fact we noticed that the bottleneck of time consumption


                                                             78
Nicolo’ Brandizzi et al. CEUR Workshop Proceedings                                                               75–79


radically, as the estimator maintained good alignment              gorithm for individual protection applications, vol-
and usually only needed one generation to adapt to new             ume 2768, 2020, pp. 41–45.
artifacts chosen by the user. This also means that after       [6] M. Wozniak, C. Napoli, E. Tramontana, G. Capizzi,
the initial alignment, the cost to perform more genera-            G. Lo Sciuto, R. Nowicki, J. Starczewski, A mul-
tions, in terms of user evaluations needed is much lower.          tiscale image compressor with rbfnn and discrete
This can get even lower for further generations, even              wavelet decomposition, volume 2015-September,
though this descending trend would be fairly moderate              2015. doi:10.1109/IJCNN.2015.7280461.
in our specific implementation.                                [7] B. Johanson, R. Poli, Gp-music: An interactive ge-
   However, these same results also suggest that a method          netic programming system for music generation
is needed to avoid the fitness estimator from overfitting          with automated fitness raters, Genet. Program.
during the fine-tuning step, perhaps with an adaptive              (2000).
rule that either produces an optimized “expiration date” [8] J. Biles, et al., Genjam: A genetic algorithm for
for the model or that dynamically stops the fine-tuning            generating jazz solos, in: ICMC, volume 94, Ann
process when overfitting occurs.                                   Arbor, MI, 1994, pp. 131–137.
   We finally note two possible improvements to the FEF- [9] H. Takagi, Interactive evolutionary computation:
FuL network. FEFFuL could use a latent representation of           fusion of the capabilities of ec optimization and hu-
the phenotypes instead of directly mapping the artifacts           man evaluation, Proceedings of the IEEE 89 (2001)
to their fitness values. This could benefit the re-alignment       1275–1296. doi:10.1109/5.949485.
process if the generalization power of the model over [10] G. S. R. Djordje, et al., EvoCraft: A New Challenge
unseen artifact improves. This comes with the added                for Open-Endedness, arXiv (2020). URL: https://
benefit of keeping a smaller buffer, thus requiring even           arxiv.org/abs/2012.04751v1. arXiv:2012.04751.
less human evaluation at the beginning of the experi- [11] M. Woźniak, D. Połap, C. Napoli, E. Tramontana,
ment. Finally The Minimum Criterion could be enforced              Application of bio-inspired methods in intelligent
only in generations evaluated by the human, so that the            gaming systems, Information Technology and Con-
advantage of having a fitness estimator can be exploited           trol 46 (2017) 150–164. doi:10.5755/j01.itc.46.
to evaluate the entire population and not just a subset.           1.13872.
This would also ensure that no interesting artifact is dis- [12] D. Połap, M. Wózniak, C. Napoli, E. Tramontana,
carded a priori during the MC step, allowing a diverse             Is swarm intelligence able to create mazes?, In-
set of good solutions to remain in the population.                 ternational Journal of Electronics and Telecom-
                                                                   munications 61 (2015) 305–310. doi:10.1515/
                                                                   eletel-2015-0039.
References                                                    [13] S. R. Szumlanski, A. S. Wu, C. E. Hughes, Con-
                                                                   flict resolution and a framework for collaborative
  [1] D. Połap, M. Woźniak, C. Napoli, E. Tramontana,
                                                                   interactive evolution, in: AAAI, 2006, pp. 512–517.
      R. Damaševičius, Is the colony of ants able to rec-
                                                              [14] P. G. de Prado Salas, S. Risi, Collaborative inter-
      ognize graphic objects?, Communications in Com-
                                                                   active evolution in minecraft, in: Proceedings of
      puter and Information Science 538 (2015) 376–387.
                                                                   the Genetic and Evolutionary Computation Con-
      doi:10.1007/978-3-319-24770-0_33.
                                                                   ference Companion, GECCO ’18, Association for
  [2] D. Połap, M. Woźniak, C. Napoli, E. Tramontana,
                                                                   Computing Machinery, New York, NY, USA, 2018,
      Real-time cloud-based game management system
                                                                   p. 127–128. URL: https://doi.org/10.1145/3205651.
      via cuckoo search algorithm, International Journal
                                                                   3205723. doi:10.1145/3205651.3205723.
      of Electronics and Telecommunications 61 (2015)
                                                              [15] K. O. Stanley, Efficient Evolution of Neural Net-
      333–338. doi:10.1515/eletel-2015-0043.
                                                                   works Through Complexification, Ph.D. thesis, De-
  [3] J. Secretan, et al., Picbreeder: evolving pictures col-
                                                                   partment of Computer Sciences, The University of
      laboratively online, in: Proceedings of the SIGCHI
                                                                   Texas at Austin, 2004. URL: http://nn.cs.utexas.edu/
      Conference on Human Factors in Computing Sys-
                                                                   ?stanley:phd2004.
      tems, 2008, pp. 1759–1768.
                                                              [16] K. O. Stanley,        Compositional pattern pro-
  [4] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
                                                                   ducing networks: A novel abstraction of de-
      M. Woźniak, A novel neural networks-based tex-
                                                                   velopment, Genetic Programming and Evolv-
      ture image processing algorithm for orange defects
                                                                   able Machines 8 (2007) 131–162. URL: https://
      classification, International Journal of Computer
                                                                   doi.org/10.1007/s10710-007-9028-8. doi:10.1007/
      Science and Applications 13 (2016) 45–60.
                                                                   s10710-007-9028-8.
  [5] R. Avanzato, F. Beritelli, M. Russo, S. Russo, M. Vac-
                                                              [17] S. Sarti, G. Ochoa, A neat visualisation of neuroevo-
      caro, Yolov3-based mask and face recognition al-
                                                                   lution trajectories., in: EvoApplications, 2021, pp.
                                                                   714–728.


                                                          79

</pre>