On Children’s Exploration, Aha! Moments and Explanations in Model Building for Self-Regulated Problem-Solving Vicky Charisi∗1 , Natalia Díaz-Rodríguez∗2 , Barbara Mawhin3 and Luis Merino4 1 Joint Research Centre, European Commission, Seville, Spain. 2 DaSCI Andalusian Institute in Data Science and Computational Intelligence, Univers. of Granada, Spain 3 Human Factors Department, EBT-Salient Aero Foundation, Spain 4 Service Robotics Laboratory, University Pablo de Olavide, Seville, Spain Abstract In certain problem-solving tasks that require Human-AI interactions, a mutual understanding of the reasoning behind the performed actions can benefit both humans and artificial agents. However, identifying and predicting the cognitive strategies involved in such a hybrid setting, especially in novel, self-regulated exploratory tasks, is a challenging endeavour. Our aim is to identify behavioural properties relevant to young children’s cognitive strategies that are present in problem-solving, with an emphasis on the Aha! moment as an intermediate step between exploratory actions, that typically relate to the development of tacit knowledge, and the generation of explanations that requires explicit knowledge. We use data from existing, previously published, behavioural studies with children 5 to 7 years old to explore these mechanisms in two self- regulated problem-solving tasks. In addition, we reflect on our observations of an Artificial Agent (Q-learning algorithm) that learns to solve the same task. Our findings indicate that while in current reinforcement learning practice, detecting the moment of the cognitive transformation of the problem representation normally translates into observing convergence curves of the objective functions being optimized, in young children this involves more complex behavioural properties, such as verbal metacognition. These behavioural processes can be used as a proxy for the identification of the Aha! moment. Finally, we propose a conceptual map which integrates the observed behaviours that are used to detect, communicate and corroborate learning both in humans and machines and we discuss the association of children’s exploratory behaviours, the Aha! moments and ultimately their explanation generation. Keywords Explainability, Child development, Human intelligence, Problem-solving, Behavioural indicators, Explainable AI 1. Introduction retrieval reached by an analytical, multistep strategy, through which the solver searches long-term memory For effective hybrid environments where humans collab- for potential algorithms, mental schemas, analogies or orate with Artificial Intelligence (AI) systems to make a factual knowledge. decision, a mutual understanding of the reasoning behind In this paper we seek to clarify what behavioural man- certain actions or recommendations can be of catalytic ifestations indicate the occurrence of the Aha! moment importance. in children performing certain problem-solving tasks and Explainability is one of the features that supports mu- instantiate a conceptual map of strategies which are used tual understanding and trust development [1], and can be to detect, communicate and corroborate learning both considered as an interface through which machine learn- in humans and machines. The ultimate goal is provid- ing models can be explained towards a customized and ing a richer test-bed of procedural protocols and tests diverse set of audiences [2], debugged, and audited. For to more broadly assess learning in machines, beyond a the generation of explanations, though, implicit knowl- single metric or loss optimization. edge should become explicit, which often includes the cognitive process known as the Aha! moment or in- sight. We adopt the definition of the Aha! moment in 1.1. Inspiration by children’s problem solving as a sudden transformation of the prob- problem-solving lem representation [3, 4]; this differs from the solution Reverse engineering human intelligence can usefully in- EBeM’22: IJCAI-ECAI Workshop on AI Evaluation Beyond Metrics, form AI and machine learning. The exploration of fun- July 25, 2022, Vienna, Austria. damental cognitive processes that can be informative (∗ ) Equal contribution. for AI approaches often requires focusing on infants or Envelope-Open vasiliki.charisi@ec.europa.eu (V. Charisi∗ ) young children in the context of structured or unstruc- Orcid 0000-0001-7677-027X (V. Charisi∗ ); 0000-0003-3362-9326 tured activities [5, 6, 7]. Self-regulated play, for example, ∗ (N. Díaz-Rodríguez ); 0000-0003-2857-4922 (B. Mawhin); 0000-0003-4927-8647 (L. Merino) that allows children to perform exploratory actions and © 2022 Copyright for this paper by its authors. Use permitted under Creative come up with insights and discoveries in problems they Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 generated has previously been correlated with the de- Figure 1: Behavioural properties as a proxy for the identification of the Aha! moment in children’s problem solving process. The proposed conceptual map includes behaviours for the evaluation of the transition from tacit knowledge (appearing in the phase of exploration) toward explicit knowledge (appearing in the phase of exploitation). The properties include non-verbal behaviours and verbal metacognitive manifestations (reasoning, planning and reflection). The Aha! moment appears as part of the transition from tacit to explicit knowledge and functions as an indicator for the generation of explanations. velopment of their implicit knowledge and their gradual reorganization of component acts and modularization. understanding of the surrounding physical world [8]. Although Bruner’s examples came from infants in the However, what cognitive process are mobilized for the first year of life, his ideas have been applied to the acquisi- transformation of tacit into explicit knowledge in young tion of more complex skills beyond infancy. Additionally, children? And what behavioural properties can be used he argued that play is the best way to promote develop- as a proxy for the identification of those processes? ment as it can occur with any physical material or with Based on a series of behavioural studies with children imagination, alone or with others and can take place 5 to 7 years of age, we identify behavioural properties rel- in various settings [10]. The connection of play with evant to cognitive processes that are present in problem- the development of fundamental cognitive processes and solving tasks, with an emphasis on identifying the Aha! human learning has been well-established [11, 12, 13]. moments, as an intermediate step between exploratory Self-directed and intrinsically motivated goal generation actions and the generation of explanations (see Fig. 1), and problem-solving are among children’s cognitive tools aiming to inform current and future approaches on ex- that affect their overall development [7, 14]. In free play, plainable AI (XAI). children set novel goals, discover unexpected informa- tion, and invent problems they would not otherwise en- counter. In this context, children apply exploratory pro- 2. Relevant Work cesses that allow them to progressively reduce uncer- tainty about their environment [14]. 2.1. Problem-Solving in Young Children In this context, a problem is defined as a situation in To understand the fundamentals of problem-solving as which a solver needs to change a given state to a desired a cognitive process, developmental psychologists have one but there are obstacles. There are different types of extensively explored the involved faculties and the ways problems such as the routine problem vs. the non-routine they interact with each other. To this end, classic and con- problem. The first one refers to a situation in which the temporary work has examined various tasks that were solver knows a solution method whereas the second is used depending on the child’s age and areas of interest. when the solver has to create a solution method. There is Bruner, for example, laid out a plan for the development also the well-defined problem where the state, goal and of skilled action [9]. First there is intention, then an as- set of operators are clearly defined. It is opposed to the sembling of “constituent acts”. They initially occur out ill-defined problem where the elements are not clearly of order but later become properly sequenced to reach defined. the goal. Bruner emphasised the role of exploratory be- The problem solving process occurs when a person has haviour and play prior to achieving skilled action. Flex- to invent a way to solve it following two main stages: the ibility and higher order acts become possible through problem representation and the problem solution. The solvers need to comprehend the problem and create a model of the problem situation. Then, they have to build The process named scaffolding is described as a process a solution by using processes of planning, executing and that enables a child or novice to solve a task or achieve a they have to monitor it using awareness and control. It goal that would be beyond his unassisted efforts [20]. To implies cognitive and metacognitive processes. Problem achieve more complex tasks (like problem-solving), it is solving is always domain-specific but the thinking by necessary to combine simpler skills in order to achieve analogy strategy seems to be almost always successful. a higher level of competence. This promotes cognitive Thinking of a related problem already known and even growth. The shared space of an activity involving col- better, already solved, helps for success. An application laboration mechanisms between peers is also at great of this is the heuristics which allow a solver to go faster importance whether it is a human or an artificial agent to an acceptable solution even if it is not perfectly ac- [21, 22]. curate. Considering the bounded rationality of humans, heuristics allows us to make judgements, choices and 2.3. Insight in Problem-Solving adapt our behaviours efficiently. This is closely related to the concepts of “social learning” and “adaptation” in Most commonly, this phenomenon is called the “Aha!” human development. experience describing the moment when a person gets In order to solve a problem, two mental representations the solution to a problem that up to this point had left are needed: one of the current state and one of the goal her puzzled. In cognitive science it is referred to as in- state. As it is goal-oriented and contextualized, a plan sight problem solving and it is accompanied by a feeling detailing the solution step by step is required. A constant of satisfaction for the solver. It has been related to cre- monitoring process is also required as each move has ative thinking [23, 24] and includes an exploratory phase consequences that can bring the solver closer to or further where divergent thinking takes place, especially during the early stages of the problem solving process. This al- to the desired goal state. It also requires mental flexibility lows the person to produce new ideas or connect existing and thus, inhibitory control [15]. If a first chosen solution seems to be inappropriate, the solver has to adapt his ideas. The second phase is the convergent thinking phase strategy. where a solution should be elected by synthesizing, ana- lyzing and monitoring the matching degree of the current result to the expected one. Although the experience of 2.2. Social learning insight is sudden and can seem disconnected from the im- Social learning is a crucial component of human intel- mediately preceding thought, recent research shows that ligence, allowing us to rapidly adapt to new scenarios, insight is the culmination of a series of brain states and learn new tasks, and communicate knowledge that can processes operating at different time scales. Elucidation be built on by others [16]. The work of Lev Vygotsky of these precursors suggests interventional opportunities who put forward this view already in the 1920s takes for the facilitation of insight [3], including concurrent into account factors such as the language development verbalization [25]. As for every problem solving, most and cultural influences in the cognitive development of of these strategies rely on a constant restructuring of children [17]. From his perspective, mental functioning the mental representation of the problem. One way in and development rely on an interdependence between which explicit knowledge manifests itself is through the individual and social processes. When learners, whatever formation of causal inferences and the generation of ex- their age, participate in joint activities, they gain new planations that, in research with children, are used for abilities and strategies to better understand the world detecting gaps in their causal knowledge. and adapt to it. This process is also mediated by signs and tools such as language and mnemonic techniques. 2.4. Explanation generation Vygotsky folds them in the category of semiotics means. They are considered as a cornerstone for knowledge co- Regarding explanation generation, there is a large body construction and can help independent problem-solving of works in various fields. But it always implies an ex- activity. This leads to the difference of what a learner plainer and an explainee with their own respective char- can do with or without help as he described under the acteristics. Of particular interest across the fields is the concept of the Zone of Proximal Development (ZPD). The role of the Theory of Mind ie. the ability of a person social interaction with the use of linguistic and cultural to attribute mental states to the consequent behaviours tools facilitate the internalization of knowledge and its of herself or others [26]. The selection and evaluation transformation into cognitive tools supporting the de- processes of explanations depend on the explainer and velopment of new cognitive functions. The latter aspect explainee, but also on the characteristics of the context. has been considered for the design of artificial agents The nature of an interaction for explaining is different that are able to interact with others and internalize these in kindergarten between the teacher and a young child interactions in a similar way as humans [18, 19]. than between the cockpit desk and the pilots during a flight. The role of beliefs has also been raised recently as a cornerstone. An explanation does not necessarily needs to be consistent with a person’s beliefs but should help promoting a revision component [27] thus allowing the evolution of the internal representations. Human explanations from social sciences became an integrated part of Artificial Intelligence (AI) through the XAI field in order to provide explanatory agents and to facilitate interactions between humans and machines. 3. Methodological approach Figure 2: Self-regulated music-making setting: a. The Re- We aimed to identify behavioural properties in young actable, a table-top interface for sound synthesis and the children’s problem-solving process that have been al- touchscreen version of it with two participants; b. The Sibelius located on the transition from tacit knowledge to the Groovy, a music-making software for children and the setting development of explicit knowledge and the generation of with two children. explanations. We used two types of problem-solving ac- tivities, an open-ended task (computer-supported music- knowledge. For each study we first describe the original making) and the cognitive task of the Tower of Hanoi goal of the study, analysis of the data that are relevant (ToH). We considered the above-mentioned theories and to the current work and the corresponding findings and we conducted three behavioural studies to explore chil- we reflect on how this contributes to the purposes of this dren’s processes in various settings. In addition, w to work. solve the same problem of the ToH. For the purpose of this paper, we take a case-study methodological approach. 4.1. Study 1: Identification of behaviours Case studies are in-depth investigations of a single per- son, group, event or community which are approached The scope of the study was to identify behaviours that from a qualitative perspective [28]. All the included case- emerge spontaneously when children are involved in an studies meet the following criteria: (i) The sample con- open-ended problem solving activity and to observe their sists of children aged 5 to 7 years old, and (ii) the setting development over time. As such, the setting of the study facilitates children’s self-regulated activities. In order for was based on ethnographic methodological principles us to ensure the necessary variability among the case and there was no adult intervention during the activities. studies, we included (i) open vs. non open-ended tasks Open-ended tasks without adult intervention provide the and (ii) different types of social contexts (collaboration space for children to pursue their goals in self-regulated with two children, collaboration of one child with a robot, and intrinsically motivated manner. We designed a natu- hybrid collaboration with two children and a robot, see ralistic behavioural study in a school-setting with 𝑁 = 16 Figure 5). young children (5-6 years old) who were invited to com- It should be noted that any comparison among the pose music in pairs with the use of two dedicated screen- studies was outside the scope of this paper; rather, our based software packages in a weekly basis over a period goal is to make a synthesis of the results as appeared in of maximum 8 weeks (Fig. 5.) The children were only different settings. For this reason, we only provide the asked to create music with the sounds provided by the necessary overall findings for each study and we adopt digital tool. No other intervention was performed from a qualitative reflective approach for one representative the experimenter. The observations included 1795.51𝑚𝑖𝑛 case-study per experiment. The selection of the case of video data which were transcribed based on an anno- studies was based on their relevance to the purpose of tation scheme with a taxonomy of behaviours in relation this work and on their representativeness of children’s to children’s cognitive process, social interactions and average behaviour in specific settings. affective engagement. For the purposes of this paper we only focus on the first category. A detailed description of the study appears in [29]. 4. Empirical Studies: A Selection of Use Cases 4.1.1. Data analysis For the elaboration of the data we used the approach of This section presents a line of empirical evidence that microgenetic analysis [30]. The microgenetic method have contributed to our identification of behavioural indi- is defined by three properties: (a) observations span a cators that facilitated the transition from tacit to explicti period of rapid change in competency; (b) the density of Code Behaviour Occurrence (%) 4.1.2. Reflection C1 Spontaneous musicking 11.05 Computer-supported music composition was selected as C2 Sound exploration 15.87 an open-ended task which does not include a predefined C3 Assessment 27.38 objective final “solution”; rather, it involves decision- C4 Reasoning 18.6 C5 Deliberate musicking 13.06 making based on subjective criteria and self-regulated C6 Planning 14.04 goal identification and provides the context for the emer- gence of a variety of processes and interactions. We Table 1 identify two major findings relevant to the scope of this The taxonomy of behaviours that emerged during the open- paper; first, despite the unstructured and the highly ex- ended self-regulated task of children’s collaborative music- ploratory nature of this task, we observed that children making and the percentage of occurrence per behaviour. exhibited behaviours that correspond to “making” and to “reflecting”. Spontaneous and exploratory actions were mixed with deliberate actions and planning while the latter were supported by assessment and reasoning. Sec- ond, the collaborative setting of this study facilitated children’s verbal interactions and negotiations during their decision-making process and consequently their reasoning and reflection on their actions. These process correspond to the mobilization of theirverbal metacog- nition part of which was the generation of explanations during the negotiation of their task-related decisions. This means that given the opportunity (in this case col- laborative setting), children as young as 5 years old ac- Figure 3: Average percentage of children’s behaviours in tively engage in self-initiated reflection on their actions Study 1, Making (C1, C2, C5 and C6) and Reflecting (C3 and and imagine the future outcomes while being able to C4). explain their reasoning to the collaborator. However, we observed that they often lacked the verbal abilities observations is high relative to the rate of change; and (c) and the terminology for accurate explanations. For this the observations are subjected to an intensive, trial-trial reason, they mobilised other available modalities, such analysis to infer the processes that give rise to change. as gestures, and used the affordances of the graphical The annotation of the data was based on children’s verbal user interface of the tool provided to complement their and non-verbal behaviours and the corpus included 7063 explanatory behaviours. annotated behaviours. The taxonomy of the behaviours that related to children’s cognitive processes and the percentage of their occurrence appear in Fig. 1. 4.2. Study 2: An indication for the Aha! The results indicate that despite the fact that the par- moment ticipant children were of a relatively young age - which is The goal of this experiment was to test the impact of typically related to exploratory actions - the behaviours the type of a robot intervention on children’s problem- of deliberate musicking (C5) and planning (C6) appeared solving process. We used the cognitive task of the Tower slightly more than the exploratory behaviours of sponta- of Hanoi (ToH) [31] which is used to measure children’s neous musicking (C1) and sound exploration (C2). planning abilities and inhibitory control. To reach the Furthermore, a grouping of the behaviours that corre- optimal solution, it requires participants to involve in- spond to reflective actions (C3 and C4) and the ones that hibition of impulsive moves that superficially bring the correspond to active music-making (C1, C2, C5 and C6) child closer to the goal, but are unhelpful for the longer- reveals that the “reflecting” behaviours occurred 46.18% term solution [32]. We designed a experiment with three of the total cognitive behaviours, while the active music- phases: a baseline (single child), an intervention (manipu- making behaviours occurred 53.82% (see Fig. 3). These lation of the robot’s behaviour) and an evaluation (child’s results indicate that despite the young age of the partici- voluntary interaction with the robot) for 𝑁 = 20 chil- pants, reflecting and reasoning about the musical choices dren 5 to 7 years old. For the intervention phase, we had appear as an integral part in children’s cognitive engage- two conditions; in Condition1, the robot and the child ment with music-making. solved the task in a turn-taking setting and in Condi- tion2 we designed a child-initiated voluntary interaction with the robot. In this paper, we focus on a single child’s problem-solving process to explore behavioural proper- Figure 4: A child’s performance of the Tower of Hanoi task over time (in seconds) (x-axis) with the duration of each move (in seconds), in addition to a moving average of the last three movements (y-axis). We observe that throughout the task the child exhibited a mixture of optimal (blue lines) / suboptimal (pink lines) and slow (lower) / fast (higher) moves. ties relevant for our understanding of the transition from instability, the incremental optimization and the perfor- exploratory actions to the transformation of the prob- mance stabilization. During the exploratory phase, the lem representation, which requires the involvement of children were reinforced by the results of their actions inhibitory control and the stabilization of the optimal which eventually guided them to the restructuring of the performance. The details of the study appear in [33]. problem representation and consequently the use of the strategy which is based on inhibitory control. After the 4.2.1. Data analysis Aha! moment, we observe a stabilisation of the optimal moves which indicates learning. One of the limitations of We evaluated the task performance in relation to the this study was the fact that it was not designed in a way trajectory of optimal and suboptimal movements over the to facilitate the child’s verbalisation of their thoughts, course of the task. The optimal movements are defined reasoning and reflections. For this reason, we were not as the ones that lead to the solution of the task with able to make any inferences regarding the children’s rea- the minimum number of movements. In addition, we soning, their verbal metacognition and the generation measured the relevant speed of the movements in relation of possible explanations during the problem-solving pro- to the baseline of each participant. Given the assumption cess. that during the task the children sustained the necessary attention, we identify point A in Fig. 4 as the point that separates the phase of mostly suboptimal moves 4.3. Study 3: Social Interaction and (red peaks) with the phase of mostly optimal movements Explanations (blue peaks), which are also carried out faster. The purpose of study 3 was to explore the role of a social robot on children’s collective problem-solving and the 4.2.2. Reflection child-child social dynamics in a setting of two children We observed exploratory behaviours that typically were and one robot (see Fig. 5). We built upon study 2 and characterised by increased number of suboptimal moves. we used the same task, the ToH task and the same robot. We identify as an Aha! moment, the point when a trans- We designed a controlled 2X2 experimental study with formation of the mental representation of the problem 𝑁 = 86 children who all participated in a baseline session occurs which, in this task, is behaviourally manifested (without robot), an intervention (with the manipulation by the mobilization of inhibition as a strategy for the of robot behaviour, in terms of its cognitive reliability optimal solution of the task, meaning that the child in- and expressivity) and an evaluation session (with child- hibits the impulsive move and performs the less obvious initiated form of interaction) to solve the Tower of Hanoi one that will lead to the stabilization of optimal solution task with an incremental difficulty level in the different of the task (see point A in Fig. 4). This is a cognitive experimental configurations without any expert’s inter- strategy that in the age-group of the present studies does vention. For the purposes of this paper, we focus on the not appear intuitively. As shown in Figure 1, behavioural findings on the patterns of children’s social interactions properties that appear in the problem-solving process in and verbal negotiations and explanations during the col- the context of the given tasks include the performance lective task performance. The detailed research design, analysis and findings of the study appear in [22]. Figure 5: Setting of Study 3: Two children collectively solve the Tower of Hanoi task in a turn-taking or child-initiated Figure 6: Asking for help scenario with different ask for help voluntary interaction with the robot. values: the LA2 tries to solve the game alone while being able to ask for help whenever its best action is not good enough 4.3.1. Data analysis (plot not on logarithmic scale as the agent asks for help at most 7 times) We observed that the setting of the study facilitated child- child social interaction and verbal reflection, reasoning catalytic for the facilitation of their task-related planning and planning appeared to be an integral part of the pro- as part of explanation generation. This was more evident cess which was lacking from study 2. To measure the in the sessions with the robot. One possible explanation team disparity, we define social interaction, 𝑆, as the for this is the fact that one of the conditions involved number of task-related interactions between children. a robot that suggested suboptimal movements. In that 𝑆1 + 𝑆 2 case, the children engaged in child-child negotiations 𝑆= and explanation generation to collectively take a deci- 𝐿 sion for the next move. Our observations indicate that where 𝑆𝑛 with 𝑛 = 1, 2 refers to the number of times two cognitive strategies were involved in children’s ex- child 𝑛 addresses their peer with a task-related verbal planations, planning as a part of an a priori explanation or non-verbal (i.e. pointing and gestures) behaviours of their reasoning for a certain decision and reflecting and L refers to the number of movements needed by the as a part of an a posteriori explanation. We need yet to team to solve the task. Our analysis showed that children analyse the association of the strategy of planning in had a higher 𝑆 rate during the sessions with the robot, the context of explanatory behaviours and its relation namely the Intervention (𝑀 = 0.16, 𝑆𝐷 = 0.14) and the to a preceding Aha! moment. It should be noted that Evaluation (𝑀 = 0.13, 𝑆𝐷 = 0.092) which differed signif- additional non-verbal manifestations, such as pointing icantly from the Baseline session (𝑀 = 0.06, 𝑆𝐷 = 0.09) and gestures, were mobilised in the cases that a child did with 𝑝 = 0.08 and 𝑝 = 0.015. Among the verbal manifes- not have the verbal maturity to formulate the planning tations we identified the utterances related to planning or the explanation. as one of the strategies children used to negotiate for the next movement on the ToH task. We identified the balance between children in the planning of the move- 4.4. Study 4: Multi-agent setting ments, and defined a planning disparity metric, as the This study in [34] consists of the same non-open-ended absolute difference in the number of interactions initi- task (ToH) and collaborative setting: one learning agent ated by each child of the team: Our analysis showed that (LA) and one helping agent with focus on the voluntary there was a significant difference in task performance interaction among artificial agents. In order to explore (𝑈 = 297, 𝑝 < 0.001) between teams with a balanced plan- if algorithms benefit from asking for help in collabora- ning performing better (𝑁 = 19, 𝑀 = 0.51, 𝑆𝐷 = 0.40) tive problem-solving, as children do, two hypotheses are compared to groups with an unbalanced planning be- tested: haviour (𝑛 = 18, 𝑀 = 1, 61, 𝑆𝐷 = 0.98). In this case, H1: Canonical interventions from an expert speed up planning was used as part of the explanation formation learning. which was observed to be one of the strategies for chil- H2: Getting help on demand from an expert accelerates dren’s negotiations in problem-solving. finding the optimal solution compared to not on demand. The expert intervention occurs in 2 different scenarios: 4.3.2. Reflection 1) LA1 solves the task in collaboration with the help- ing agent in a “turn-taking” scenario, which results in Children’s social verbal and non-verbal interaction dur- a canonical cognitive intervention from the expert. 2) ing the problem-solving process in Study 3 appeared LA2 solves the task independently, having the option to 2. The lack of a natural language interactive commu- ask for help of the expert whenever (if) this is needed nication interface impedes questioning the sys- resulting in an on demand intervention. Two parameters tem about its confidence in real time, or uncer- are assessed: 1) Canonical intervention (help) rate (every tainty estimates. 2, 3 or 4 turns), and 2) Ask-for-help threshold (from 0 3. The dependence of a happy end solving the to 1). The last parameter was created to simulate what task constraints explanations during the learn- happens when a child asks for help: if the best policy ing phase. How can an agent communicate to value is lower than the ask for help parameter, the expert its developer or user its struggles in solving the will play instead of the LA. task when it has not yet achieved a satisfactory performance? Could common AI practices (data 4.4.1. Data analysis augmentation, fine-tuning), be accessible to the agent for it to communicate and be able to choose From reinforcement learning (RL) plots, as training to change them? episodes evolve as a function of the mean number of 4. Most deep RL models rely on baselines of other moves to solve the task, some interpretations are ex- agents to assess their worth, lacking comparisons tracted: in multi-agent settings [36] with children learn- From scenario 1 it is observed that the LA is more ing. efficient when it is helped by the expert in a turn taking scenario, and that it is even more effective when helped 5. The inability of current RL algorithms to commu- every 4 turns rather than every 2. The importance of nicate the continual progress beyond reporting exploring on its own is showcased by the agent, rather a sole reward value obtained at the end of a con- than always having the optimal solution. vergence curve makes it challenging for LAs to Even when all approaches converge in both scenarios, explain their skills to solve the credit assignment in scenario 2, the agent that asks for help becomes also problem, their difficulties or agility to complete faster and more effective: help is most useful at the be- sub-tasks, their acting self-confidence, or learned ginning of learning. After asking for help many times savvy behaviours. during the first episodes it starts solving the task by itself, 6. The lack of alignment of explanations of LAs with resulting in an increase of inefficient moves. The agent meta-learning and trustworthy AI dimensions seems to gain confidence in movements which, while im- (such as the trust calibration meta-information perfect, allow the task resolution by exploring different taxonomy [1]) should be accounted for in the states. explanation generation process, in the same way Compared to the LA not being helped, the asking-for- as mechanisms to ensure the reproducibility of help agent is a lot more efficient, but there is not much insights-built explanations. variation among the canonical and the help-asking con- Reflection figurations. This is probably due to the rather simple This is the only study not involving kids, but follow- simulation of the trigger for the request of help. Simu- ing the same protocol as in the ToH studies. While RL lating the child’s behaviour is a complex task and more model developers communicate a model learning conver- emphasis needs to be placed on accurately describing it, gence due to reaching plateaus in learning curves, these including how to simulate the ”asking for help” function. changes should as well reflect key changes in kid be- Adding mechanisms such as intrinsic motivation, about haviour. However, we showed this is not always evident the LA’s desire to solve the game on its own, could make to map. Providing the learning agent with signals such as the comparison more accurate. The agent asks for help the Aha! moment to adjust its self learning /hyperparam- when it considers that a movement is not good enough eter changes could be paramount to avoid blind manual to be played, whereas the actual mechanisms that drive engineering (on e.g., reward function crafting) processes the child to ask for help are more complex. where no common procedures exist. We believe these We identify emergent issues that should be part of the are the explainable dimensions that XAI for RL should explanation for the design of a LA, and that are not part of work on (identifying Aha! moments, categorising prob- the modelled problem nor a representation is accessible lems, difficulty, environments, collaboration/competition for the agent and thus, for its explanation: dimensions, etc). 1. The lack of a multimodal input space [35] for the The difficulties to explain the learning process of a agent to perform in the action-interaction loop single LA could be reduced by involving interaction with of RL can restrain the agent from exhibiting the other agents. One could attend to social interaction [16] correct behaviour and communicating as humans and social influence as intrinsic motivation [37] learning would expect. metrics. Both showed to enhance learning in multiagent settings. Explanations should reflect the needs for these incen- 2. Children understand but sometimes lack the cog- tives that agents depend on to progress. Once an agent nitive and metacognitive skills to explain. Expla- learned, it is not enough that the agent performs tasks nations are subject to both biological and artificial in less time, and better, but also that it uses other human systems’ understanding of properties of a given factors or social outcome metrics such as in [38, 39]. task, and in young children explanations are sub- ject to their verbal abilities. Children often use gestures such as pointing, which means that ex- 5. Conclusions, Limitations and planations that can support human-AI interaction Future work are subject to tools responsible for social interac- tion. Aiming towards human-level AI requires a We presented a set of behavioural studies and an exper- broader set of key social skills for complex embod- iment with a Q-learning algorithm in order to identify ied communication in multimodal settings within behavioural properties that relate to the Aha! moment, constantly evolving social worlds [41]. i.e., the moment of the restructuring of the problem rep- 3. The hybrid use of strategies of “planning” and resentation (Fig. 1). These behavioural properties appear “insight”: Questioning the false dilemma of logi- as part of the transition from exploratory to explanatory cal reasoning vs machine learning, we argue for behaviours (tacit to explicit knowledge). They include a synergy between these two paradigms in order task-related observations such as performance instabil- to obtain hybrid AI systems. ity, incremental optimization and stabilization as well as 4. Both social interaction [16] and social influence verbal metacognitive manifestations (observed only in as intrinsic motivation [37] show to be enhancers two of our studies with children) that involve reasoning, for learning in multi-agent settings. reflection and planning. These behavioural properties seem to facilitate the Aha! moment and eventually the This paper is our first attempt to synthesise the results generation of explanations by children. of our research on children’s problem solving in different In current RL practice, detecting this moment normally settings and combine them with our research on XAI. translates into observing convergence curves of the ob- However, due to space limitations, we are limited to pro- jective functions being optimized (normally reaching a vide overviews without in-depth analysis. We aim to plateau in cumulative reward or optimized loss, usually tackle the latter in our future work. both). This is an external signal not usually leveraged We hope this work is useful beyond developmental by the agent. Although there are exceptions such as the robotics and AI, i.e., facilitating an effective and ethical use of artificial curiosity signals for self learning of the deployment of RL systems, e.g. from energy building agent [40]), in regular AI model development practice, management to AI for health, where evaluating single we must highlight the need for easier mechanisms to reward functions simply does not reflect nor assess the convey actions that demonstrate the difficulties of the complexity of the system nor the difficulties it has to deal agent until convergence plateaus and/or a sufficient levelwith. of an XAI metric are reached. Future work should aim to involve more tangible eval- We summarise the main points we propose to consider uation metrics that both 1) optimize technical robust- in approaches evaluating XAI, as follows: ness more broadly, and 2) reflect a human-centered view 1. The Aha! moment (or problem representation re- where machine learning factors are questioned, moni- structuring) acts as the intermediate step between tored and explained in parallel ways to how children non-explainable and explainable behaviours. In learn. Evaluation mappings across human and machine a deeper view, since the explanation acts as an learning will allow us to better assess the trade-offs be- interface between the model and a given target tween AI assisted decision making and policies. audience, the Aha! moment is a trigger signal for a model to start elaborating explanations. More effort should be put into specifying the meaning 6. Acknowledgments of the Aha! moment in various tasks that RL Charisi is supported by the HUMAINT project of the models are currently tackling. Defining and de- JRC; Díaz-Rodríguez by IJC2019-039152-I funded by tecting high level policies characterising an Aha! MCIN/AEI /10.13039/501100011033 by “ESF Investing in moment (e.g. in terms of key/exploratory action your future” and Google Research Scholar Program; and sequences) can be signs we should be able to not Merino by Programa Operativo FEDER Andalucia 2014- only programmatically detect, but also communi- 2020 through the project DeepBot (PY20_00817). cate. In this way, we can achieve explainable and reliable models, since Aha! signs must act as an additional proxy to attain trustworthy systems. References Learning (ICML) (2021). [17] L. S. Vygotsky, M. Cole, Mind in society: Develop- [1] G. J. Cancro, S. Pan, J. Foulds, Tell me something ment of higher psychological processes, Harvard that will help me trust you: A survey of trust cali- university press, 1978. bration in human-agent interaction, arXiv preprint [18] C. Colas, T. Karch, C. Moulin-Frier, P.-Y. Oudeyer, arXiv:2205.02987 (2022). Vygotskian autotelic artificial intelligence: Lan- [2] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Ben- guage and culture internalization for human-like netot, S. Tabik, A. Barbado, S. García, S. Gil-López, AI, arXiv preprint arXiv:2206.01134 (2022). D. Molina, R. Benjamins, et al., Explainable artifi- [19] J. Lindblom, T. Ziemke, Social situatedness of natu- cial intelligence (xai): Concepts, taxonomies, op- ral and artificial intelligence: Vygotsky and beyond, portunities and challenges toward responsible AI, Adaptive Behavior 11 (2003) 79–96. Information fusion 58 (2020) 82–115. [20] D. Wood, J. S. Bruner, G. Ross, The role of tutoring [3] J. Kounios, M. Beeman, The aha! moment: The cog- in problem solving., Child Psychology & Psychiatry nitive neuroscience of insight, Current directions & Allied Disciplines (1976). in psychological science 18 (2009) 210–216. [21] L. Kerawalla, D. Pearce, J. O’Connor, R. Luckin, [4] H. Stuyck, A. Cleeremans, E. Van den Bussche, Aha! N. Yuill, A. Harris, Setting the stage for collabora- under pressure: The aha! experience is not con- tive interactions: Exploration of separate control strained by cognitive load, Cognition 219 (2022) of shared space., in: AIED, 2005, pp. 842–844. 104946. [22] V. Charisi, L. Merino, M. Escobar, F. Caballero, [5] L. B. Smith, L. K. Slone, A developmental approach R. Gomez, E. Gómez, The effects of robot cogni- to machine learning?, Frontiers in psychology 8 tive reliability and social positioning on child-robot (2017) 2124. team dynamics, in: 2021 IEEE International Con- [6] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. ference on Robotics and Automation (ICRA), IEEE, Gershman, Building machines that learn and think 2021, pp. 9439–9445. like people, Behavioral and brain sciences 40 (2017). [23] T. I. Lubart, C. Mouchiroud, Creativity: A source [7] P.-Y. Oudeyer, L. B. Smith, How evolution may work of difficulty in problem solving, The psychology of through curiosity-driven developmental process, problem solving (2003) 127–148. Topics in Cognitive Science 8 (2016) 492–502. [24] W. Carpenter, The aha! moment: The science be- [8] M. M. Andersen, J. Kiverstein, M. Miller, A. Roep- hind creative insights, in: Toward Super-Creativity- storff, Play in predictive minds: A cognitive theory Improving Creativity in Humans, Machines, and of play (2021). Human-Machine Collaborations, IntechOpen, 2019. [9] J. S. Bruner, Organization of early skilled action, [25] Y. Chu, J. N. MacGregor, Human performance on Child development (1973) 1–11. insight problem solving: A review, The Journal of [10] P. Barrouillet, Theories of cognitive development: Problem Solving 3 (2011) 6. From piaget to today, Developmental Review 38 [26] T. Miller, Explanation in artificial intelligence: In- (2015) 1–12. sights from the social sciences, Artificial intelli- [11] M. H. Siegel, R. W. Magid, M. Pelz, J. B. Tenenbaum, gence 267 (2019) 1–38. L. E. Schulz, Children’s exploratory play tracks the [27] M. Shvo, T. Q. Klassen, S. A. McIlraith, Towards discriminability of hypotheses, Nature communi- the role of theory of mind in explanation, in: In- cations 12 (2021) 1–9. ternational Workshop on Explainable, Transpar- [12] J. Chu, L. E. Schulz, Play, curiosity, and cogni- ent Autonomous Agents and Multi-Agent Systems, tion, Annual Review of Developmental Psychology Springer, 2020, pp. 75–93. 2 (2020) 317–343. [28] R. K. Yin, et al., Design and methods, Case study [13] A. Gopnik, Childhood as a solution to explore– research 3 (2003). exploit tensions, Philosophical Transactions of the [29] V. Charisi, C. C. Liem, E. Gomez, Novelty-based Royal Society B 375 (2020) 20190502. cognitive processes in unstructured music-making [14] M. Pelz, C. Kidd, The elaboration of exploratory settings in early childhood, in: 2018 Joint IEEE play, Philosophical Transactions of the Royal Soci- 8th International Conference on Development and ety B 375 (2020) 20190503. Learning and Epigenetic Robotics (ICDL-EpiRob), [15] J. M. Unterrainer, A. M. Owen, Planning and IEEE, 2018, pp. 218–223. problem solving: from neuropsychology to func- [30] R. S. Siegler, icrogenetic analyses of learning, in: tional neuroimaging, Journal of Physiology-Paris Handbook of child psychology: Cognition, percep- 99 (2006) 308–317. tion, and language, John Wiley & Sons Inc, 2006, p. [16] K. Ndousse, D. Eck, S. Levine, N. Jaques, Learning 464–510. social learning, Internation Conference on Machine [31] H. A. Simon, The functional equivalence of prob- lem solving skills, Cognitive psychology 7 (1975) 268–288. [32] N. A. Zook, D. B. Davalos, E. L. DeLosh, H. P. Davis, Working memory, inhibition, and fluid intelligence as predictors of performance on tower of hanoi and london tasks, Brain and cognition 56 (2004) 286–292. [33] V. Charisi, E. Gomez, G. Mier, L. Merino, R. Gomez, Child-robot collaborative problem-solving and the importance of child’s voluntary interaction: a de- velopmental perspective, Frontiers in Robotics and AI 7 (2020) 15. [34] A. Bennetot, V. Charisi, N. Díaz-Rodríguez, Should artificial agents ask for help in human-robot collab- orative problem-solving?, Brain-PIL Workshop at ICRA, Paris/Remote (2020). [35] A. Holzinger, M. Dehmer, F. Emmert-Streib, R. Cuc- chiara, I. Augenstein, J. Del Ser, W. Samek, I. Ju- risica, N. Díaz-Rodríguez, Information fusion as an integrative cross-cutting enabler to achieve ro- bust, explainable, and trustworthy medical artificial intelligence, Information Fusion 79 (2022) 263–278. [36] A. Heuillet, F. Couthouis, N. Díaz-Rodríguez, Explainability in deep reinforcement learning, Knowledge-Based Systems 214 (2021) 106685. [37] N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J. Z. Leibo, N. De Freitas, Social influence as intrinsic motivation for multi- agent deep reinforcement learning, in: Interna- tional Conference on Machine Learning, PMLR, 2019, pp. 3040–3049. [38] J. Perolat, J. Z. Leibo, V. Zambaldi, C. Beattie, K. Tuyls, T. Graepel, A multi-agent reinforcement learning model of common-pool resource appropri- ation, Advances in Neural Information Processing Systems 30 (2017). [39] A. Heuillet, F. Couthouis, N. Díaz-Rodríguez, Col- lective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning With Shapley Values, IEEE Computational Intelligence Magazine 17 (2022) 59–71. [40] D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven exploration by self-supervised pre- diction, in: International conference on machine learning, PMLR, 2017, pp. 2778–2787. [41] G. Kovač, R. Portelas, K. Hofmann, P.-Y. Oudeyer, SocialAI: Benchmarking socio-cognitive abilities in deep reinforcement learning agents, arXiv preprint arXiv:2107.00956 (2021).