=Paper=
{{Paper
|id=Vol-3092/p12
|storemode=property
|title=FEFFuL: a Few-Examples Fitness Function Learner
|pdfUrl=https://ceur-ws.org/Vol-3092/p12.pdf
|volume=Vol-3092
|authors=Nicolo’ Brandizzi,Andrea Fanti,Roberto Gallotta,Christian Napoli
|dblpUrl=https://dblp.org/rec/conf/system/BrandizziFGN21
}}
==FEFFuL: a Few-Examples Fitness Function Learner==
FEFFuL: a Few-Examples Fitness Function Learner Nicolo’ Brandizzi a , Andrea Fantia , Roberto Gallottaa and Christian Napolia a Department of Computer, Automation and Management Engineering, Sapienza University of Rome, via Ariosto 25 Roma 00185, Italy Abstract This paper presents FEFFuL, an architecture used to estimate the fitness value of a generated artifact in any Evolution Strategy (ES) system that would otherwise require human evaluation, i.e.: Interactive Evolutionary Computation (IEC) systems. By learning directly human preferences, the FEFFuL network aims to reduce user’s fatigue to a minimum while also adapting to new emergent artifacts. We apply here FEFFuL in the context of evaluating generated structures in the popular game Minecraft. Keywords Evolution Strategy, Fitness Estimation, Human-in-the-Loop Control, 1. Introduction interaction needed. In our experiment, we tackle the generation of inter- In Evolutionary Computation (EC) and Evolution Strate- esting structures in the popular videogame Minecraft. gies (ES), a global optimization problem is tackled by Minecraft is a sandbox construction videogame set in taking inspiration from biologic evolution: an initial pop- a voxel-based environment with various basic building ulation of solutions is evolved by selection and mutation, blocks, such as wood, stone, glass, water, etc. The ma- producing new solutions with higher values of the task’s jor advantage of Minecraft over most Artificial Life (AL- objective function. In this context, a particular solution ife) domains is that surprisingly complex and functional is referred to as an individual, while its value of the ob- structures (e.g. moving robots, word processors, etc.) can jective function is its fitness. Genetic Algorithms (GA) are all be built from the same basic building blocks, which a family of algorithms in EC in which individuals are en- aligns well with the few chemical building blocks that coded by their genotype, which contains the information produce complex biological systems [10, 11, 12]. to reconstruct the actual solution, called phenotype[1, 2]. User fatigue is a major challenge in IEC, and since the The genotypes are mutated and combined in different human fatigue threshold cannot be improved, the avail- ways to produce the next generation of solutions. able evaluations must be exploited as much as possible. There are several domains in which a fitness function Our approach to this issue builds on the techniques men- is unknown or very hard to compute, e.g. visual [3, 4, 5, 6] tioned above, by alternating human evaluation and the or musical [7, 8] appeal. In Interactive Evolutionary Com- training of a fitness estimator model in a single run of putation (IEC), this problem is overcome by using manual the genetic algorithm. This approach allows to use few human evaluation to compute the fitnesses of individu- human interactions while still perfoming a high number als in each generation. One of the main issues of IEC is of generations. Moreover, since the fitness estimator is user fatigue, which greatly limits the amount of human re-aligned with human preferences almost periodically, it evaluations available; this usually poses bounds both can also adapt to artifacts only seen in later generations. on the number of possible generations and the number As a convenient side-effect, more than one human (pos- of individuals that can be evaluated at each generation. sibly with slightly different preferences) can contribute To mitigate this problem the interactions are often care- to a single run, which demonstrates how this approach fully designed so as to minimize psychological fatigue can be extended from IEC to Collaborative Interactive [9]. Function approximators (e.g. neural networks) are Evolution (CIE) [13]. Unlike previous approaches such also often used to learn the preferences of the human as [14] our approach leverages data collected through experimenter [9, 7], so that this information can be used the evolution process to further reduce user fatigue even in subsequent runs of the GA and reduce the amount of when in an IEC setting. SYSTEM 2021 @ Scholar’s Yearly Symposium of Technology, Engineering and Mathematics. July 27–29, 2021, Catania, IT 2. Related work " brandizzi@diag.uniroma1.it (N. B. ); fanti.1650746@studenti.uniroma1.it (A. Fanti); In this section we give a technical overview of our appli- gallotta.1890251@studenti.uniroma1.it (R. Gallotta); cnapoli@diag.uniroma1.it (C. Napoli) cation of FEFFuL to the Minecraft environment. 0000-0002-3336-5853 (C. Napoli) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 75 Nicolo’ Brandizzi et al. CEUR Workshop Proceedings 75–79 2.1. Artifacts generation where dec is the decoding function. The artifact genera- tion is then simply An artifact is a structure with precise width, height and depth. These values are user defined. The artifact is the 𝑜 = 𝐶𝑃 𝑃 𝑁𝐺 (𝑖) phenotype of a genome. Each genome uniquely encodes information used to generate the artifact. We use the Neu- where 𝑖 ∈ 𝑅3 and 𝑜 ∈ 𝑅2 are, respectively, the inputs roEvolution of Augmenting Topologies (NEAT) [15] to and the outputs of the network, both constrained in the control how these genomes evolve over the generations. real-valued interval [0, 1]. In order to show the artifact Each genome is part of a population of fixed size and, as a structure to the user, 𝑜 is then transformed. By thanks to the NEAT algorithm, is mutated to increase assigning to 𝑏 the number of admissible blocks and to 𝑟 the complexity of the resulting artifact. In fact, NEAT en- the number of admissible block rotations, we have that sures an ever-increasing complexification of the artifacts and, thanks to speciation and the use of elitism, it also 𝑜1 = ⌊𝑜1 * (𝑏 + 1)⌋ ensures a lasting diversification of phenotypes (diver- sity here is measured as genomic distance). Additionally 𝑜2 = ⌊𝑜2 * (𝑟 + 1)⌋ we set a low stagnation level to ensure low-performing The user has to choose the admissible blocks and val- genomes (i.e.: genomes that generate uninstering arti- ues by modifying a configuration file. The resulting arti- facts for the user) are pruned away and leave space in facts are then evaluated in the Minecraft game using the the population for more interesting genomes. During EvoCraft [10] interface. evolution each genome is mutated and new genomes are created by combining two random parent genomes. Since genomes with high fitness are more likely to reproduce 2.2. Artifacts evaluation and pass on their properties to their offsprings, we ensure The generated artifacts undergo an user defined Mini- that the entire population slowly converges to diverse, mum Criterion (MC) step at each generation before eval- high-fitness solutions. uation. This step removes artifacts that don’t satisfy this The genomes are used to encode a Compositional Pat- minimum requirement. Only a predefined number of tern Producing Networks (CPPNs) [16]. Such a network artifacts sampled randomly from the artifacts that pass is composed of blocks of elementary activation func- the MC step are then presented to the user for direct tions connected together by weighted connections. The evaluation. In our experiment, the MC consisted in re- genomes encode the structure, weights and biases of the moving artifacts that didn’t contain enough air blocks network. Mutations during evolution consist in adding or and enough solid blocks (both values were expressed removing connections or blocks, perturbing the weights as minimum percentages). The number of maximum and biases values and changing the activation functions. artifacts that could be presented to the user was set to The CPPNs thus directly map from genotype to pheno- 24. type without local interaction and are applicable on a The user is then tasked to evaluate the generated ar- infinite-sized input space (a typical application is, for tifacts. This is done by looking at the structures on the instance, 2D image generation from pixel coordinates). Minecraft client applet and choosing the most interesting Due to their architecture, CPPNs create patterned out- ones. During manual (human) evaluation the program puts and can develop pure symmetry and symmetry with accepts a list of indexes that correspond to the selected ar- variation, which in turn make for appealing artifacts. tifacts. Since the mapping from genomes to artifacts is 1:1, In our experiment, we use the NEAT Python [17] li- we can assign the fitness value to the correct correspond- brary and the PyTorch-NEAT1 to integrate both the NEAT ing genome that generated the artifact. The possible evolutionary algorithm and the CPPNs architecture. The fitness values are 1 for the genomes the user is interested inputs to the networks are the scaled 𝑋, 𝑌 and 𝑍 coor- in and 0 for the others. We note that we automatically dinates (∈ [0, 1]) of the artifact and the outputs are the assign a fitness value of 0 to all the genomes that didn’t block type and rotation (both values are output ∈ [0, 1] pass the MC step to discourage their traits to appear and later scaled to the admissible values). again later in the generations, as these are not interest- The artifact generation process can be formally ex- ing to the user. This ensures that only the phenotypically pressed as follows: given a population 𝑃 at the genera- interesting genomes survive throughout the evolution tion 𝑔 of genomes 𝐺s, we first define the CPPN network process. that the genome encodes as During human evaluation we save the artifacts and their fitness in a memory buffer. Once the buffer is at ca- 𝐶𝑃 𝑃 𝑁𝐺 = dec(𝑃𝐺𝑔 ) pacity, the FEFFuL network can be trained in a supervised 1 fashion to directly estimate the user preferences given The PyTorch-NEAT library is available at https://github.com/ uber-research/PyTorch-NEAT/ the artifacts. The network itself is rather simple: a 3D 76 Nicolo’ Brandizzi et al. CEUR Workshop Proceedings 75–79 Convolution maps the artifact to a sequence of feature also the FEFFuL network has to reflect this behavior. We maps that are later flattened and passed through multiple solved this problem with two simple corrections to the residual blocks that in the end output a value ∈ [0, 1] network’s behavior. First, we note that the buffer is con- that represent how likely the human would have marked stantly updated only with user’s selected artifacts and fit- the artifact as interesting. A graphical overview of the nesses, effectively making it a constantly-updated dataset FEFFuL network can be found in 1. we can train the network from. The corrections are the following: 1. We first use the network to evaluate the artifacts for a given number of generations at a time. This value is increased as the evolution process goes on, thus leveraging the network more and more as time goes on. 2. We enforce fine-tuning of the network after the aforementioned number of generations. We do this by prompting the user to evaluate the cur- rent generation, collect the preferences in the buffer and finetuning the network on the updated dataset. At this point of the process there could be some disalignment between user preferences and network estimates. This would be harmful to the entire experiment as it would diverge the search. Figure 1: Architecture of the FEFFuL network We solve this by activating the network only if its accuracy over unseen artifacts is higher than a given threshold. Otherwise the user is prompted Due to the nature of the task, it is expected that the again until the network can be activated. buffer would be unbalanced: it is more likely that there would be more examples of discarded artifacts than ac- cepted artifacts. For this reason, we balance the dataset 3. Methodology and Results before training the network. We do this by both over- sampling accepted artifacts and then downsampling dis- We first report the experiment settings in order to repro- carded artifacts. The former is accomplished by augment- duce our results at 1. These are taken from the experiment ing artifacts by rotation: we rotate the structure along configuration file in our repository; we are not going to the 𝑌 -axis by 90. We can do this since the artifact must report the settings for the NEAT algorithm which can be be interesting regardless of the orientation. The latter found in the neat configuration file in our repository. is instead accomplished by removing discarded artifacts from the dataset until a good ratio of accepted artifacts Artifact width 5 Artifact height 7 and discarded artifacts is reached. Artifact depth 5 The output value of the FEFFuL module is directly as- Max number of artifact shown to the user 24 signed as fitness value to the corresponding genome. This Min percentage of air blocks in an artifact 22% gives an additional significance to the fitness value: it not Min percentage of solid blocks in an artifact 32% only marks the user preference but also the network’s Admissible rotations NORTH, WEST, SOUTH, confidence in its output. Thus, a low-fitness genome EAST, UP, DOWN would have been less likely to have been picked by the Admissible blocks AIR, COBBLESTONE, STONE, user than a high-fitness one. This process is important STONE_STAIRS STONEBRICK, during the reproduction step of the NEAT algorithm as QUARTZ_BLOCK it heavily relies on the genome’s fitness to order the pos- RNG seed 42 sible parents of new individuals. Buffer capacity 512 Batch size 32 Number of convolution channels 16 2.3. The alignment problem Number of epochs 100 Interval of training 5 Due to the ever-complexification of the genomes thanks Activation threshold on test accuracy 0.5 to the NEAT algorithm, both the structures and the user’s preferences change over time. An artifact that the user Table 1 deemed interesting at generation 0 may not be as in- Experiment parameter summary teresting if found at generation 100. This implies that 77 Nicolo’ Brandizzi et al. CEUR Workshop Proceedings 75–79 We then report the train and test accuracy and loss evaluated at different generations at 2 and 3 respectively. We note that, while the training metrics behave as ex- pected, in the test we can see a rising trend for the loss metric. The accuracy however remains well over the 80%, which we consider a good value. Figure 4: Overview of the active evaluator per generation was not the evaluation anymore but the generation of Figure 2: Train accuracy and loss of the FEFFuL network at the new genomes instead. different generations Figure 3: Test accuracy and loss of the FEFFuL network at different generations We ran the experiment for 100 generations. While this Figure 5: Artifacts produced on generation 0 (left) and on generation 100 (right) is a rather low number of generations, the aim of this project was not to evolve structures to a certain level of user satisfiability but to show that it was possible Finally, we report a comparison between the artifacts to use a NN to approximate user’s preferences, reduce generated on generation 0 and those generated on gen- user fatigue and save time. The metrics prove that we eration 100 at 5. We note how the artifacts are more succeeded in the first part of the task. For what concerns complex on generation 100 as well as in line with user’s user fatigue, it is known that a single user can evaluate preferences during evolution. around 20 generations worth of artifacts before starting to feel fatigued [9]. In the entire evolution process only 29 generations worth of artifacts where evaluated by the 4. Conclusions and future work user, whereas the remaining 71 were evaluated by the The results shown in the previous section suggest that FEFFuL network. We report a graph that shows which this approach can effectively be used to mitigate the user evaluator was used at each generation at 4. fatigue problem in IEC tasks. More specifically, most of The time gain over the same amount of generations is the interactions with the human experimenter are in the well apparent: we estimated that the human user takes first few generations, where the buffer is not completely from 2 to 3 minutes to evaluate a single batch of artifacts, full and the estimator cannot be trained. After the first whereas the FEFFuL network only a few milliseconds. In alignment, the human evaluations were needed very spo- fact we noticed that the bottleneck of time consumption 78 Nicolo’ Brandizzi et al. CEUR Workshop Proceedings 75–79 radically, as the estimator maintained good alignment gorithm for individual protection applications, vol- and usually only needed one generation to adapt to new ume 2768, 2020, pp. 41–45. artifacts chosen by the user. This also means that after [6] M. Wozniak, C. Napoli, E. Tramontana, G. Capizzi, the initial alignment, the cost to perform more genera- G. Lo Sciuto, R. Nowicki, J. Starczewski, A mul- tions, in terms of user evaluations needed is much lower. tiscale image compressor with rbfnn and discrete This can get even lower for further generations, even wavelet decomposition, volume 2015-September, though this descending trend would be fairly moderate 2015. doi:10.1109/IJCNN.2015.7280461. in our specific implementation. [7] B. Johanson, R. Poli, Gp-music: An interactive ge- However, these same results also suggest that a method netic programming system for music generation is needed to avoid the fitness estimator from overfitting with automated fitness raters, Genet. Program. during the fine-tuning step, perhaps with an adaptive (2000). rule that either produces an optimized “expiration date” [8] J. Biles, et al., Genjam: A genetic algorithm for for the model or that dynamically stops the fine-tuning generating jazz solos, in: ICMC, volume 94, Ann process when overfitting occurs. Arbor, MI, 1994, pp. 131–137. We finally note two possible improvements to the FEF- [9] H. Takagi, Interactive evolutionary computation: FuL network. FEFFuL could use a latent representation of fusion of the capabilities of ec optimization and hu- the phenotypes instead of directly mapping the artifacts man evaluation, Proceedings of the IEEE 89 (2001) to their fitness values. This could benefit the re-alignment 1275–1296. doi:10.1109/5.949485. process if the generalization power of the model over [10] G. S. R. Djordje, et al., EvoCraft: A New Challenge unseen artifact improves. This comes with the added for Open-Endedness, arXiv (2020). URL: https:// benefit of keeping a smaller buffer, thus requiring even arxiv.org/abs/2012.04751v1. arXiv:2012.04751. less human evaluation at the beginning of the experi- [11] M. Woźniak, D. Połap, C. Napoli, E. Tramontana, ment. Finally The Minimum Criterion could be enforced Application of bio-inspired methods in intelligent only in generations evaluated by the human, so that the gaming systems, Information Technology and Con- advantage of having a fitness estimator can be exploited trol 46 (2017) 150–164. doi:10.5755/j01.itc.46. to evaluate the entire population and not just a subset. 1.13872. This would also ensure that no interesting artifact is dis- [12] D. Połap, M. Wózniak, C. Napoli, E. Tramontana, carded a priori during the MC step, allowing a diverse Is swarm intelligence able to create mazes?, In- set of good solutions to remain in the population. ternational Journal of Electronics and Telecom- munications 61 (2015) 305–310. doi:10.1515/ eletel-2015-0039. References [13] S. R. Szumlanski, A. S. Wu, C. E. Hughes, Con- flict resolution and a framework for collaborative [1] D. Połap, M. Woźniak, C. Napoli, E. Tramontana, interactive evolution, in: AAAI, 2006, pp. 512–517. R. Damaševičius, Is the colony of ants able to rec- [14] P. G. de Prado Salas, S. Risi, Collaborative inter- ognize graphic objects?, Communications in Com- active evolution in minecraft, in: Proceedings of puter and Information Science 538 (2015) 376–387. the Genetic and Evolutionary Computation Con- doi:10.1007/978-3-319-24770-0_33. ference Companion, GECCO ’18, Association for [2] D. Połap, M. Woźniak, C. Napoli, E. Tramontana, Computing Machinery, New York, NY, USA, 2018, Real-time cloud-based game management system p. 127–128. URL: https://doi.org/10.1145/3205651. via cuckoo search algorithm, International Journal 3205723. doi:10.1145/3205651.3205723. of Electronics and Telecommunications 61 (2015) [15] K. O. Stanley, Efficient Evolution of Neural Net- 333–338. doi:10.1515/eletel-2015-0043. works Through Complexification, Ph.D. thesis, De- [3] J. Secretan, et al., Picbreeder: evolving pictures col- partment of Computer Sciences, The University of laboratively online, in: Proceedings of the SIGCHI Texas at Austin, 2004. URL: http://nn.cs.utexas.edu/ Conference on Human Factors in Computing Sys- ?stanley:phd2004. tems, 2008, pp. 1759–1768. [16] K. O. Stanley, Compositional pattern pro- [4] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana, ducing networks: A novel abstraction of de- M. Woźniak, A novel neural networks-based tex- velopment, Genetic Programming and Evolv- ture image processing algorithm for orange defects able Machines 8 (2007) 131–162. URL: https:// classification, International Journal of Computer doi.org/10.1007/s10710-007-9028-8. doi:10.1007/ Science and Applications 13 (2016) 45–60. s10710-007-9028-8. [5] R. Avanzato, F. Beritelli, M. Russo, S. Russo, M. Vac- [17] S. Sarti, G. Ochoa, A neat visualisation of neuroevo- caro, Yolov3-based mask and face recognition al- lution trajectories., in: EvoApplications, 2021, pp. 714–728. 79