On Children’s Exploration, Aha! Moments and Explanations
in Model Building for Self-Regulated Problem-Solving
Vicky Charisi∗1 , Natalia Díaz-Rodríguez∗2 , Barbara Mawhin3 and Luis Merino4
1
  Joint Research Centre, European Commission, Seville, Spain.
2
  DaSCI Andalusian Institute in Data Science and Computational Intelligence, Univers. of Granada, Spain
3
  Human Factors Department, EBT-Salient Aero Foundation, Spain
4
  Service Robotics Laboratory, University Pablo de Olavide, Seville, Spain


                                       Abstract
                                       In certain problem-solving tasks that require Human-AI interactions, a mutual understanding of the reasoning behind the
                                       performed actions can benefit both humans and artificial agents. However, identifying and predicting the cognitive strategies
                                       involved in such a hybrid setting, especially in novel, self-regulated exploratory tasks, is a challenging endeavour. Our aim
                                       is to identify behavioural properties relevant to young children’s cognitive strategies that are present in problem-solving,
                                       with an emphasis on the Aha! moment as an intermediate step between exploratory actions, that typically relate to the
                                       development of tacit knowledge, and the generation of explanations that requires explicit knowledge. We use data from
                                       existing, previously published, behavioural studies with children 5 to 7 years old to explore these mechanisms in two self-
                                       regulated problem-solving tasks. In addition, we reflect on our observations of an Artificial Agent (Q-learning algorithm)
                                       that learns to solve the same task. Our findings indicate that while in current reinforcement learning practice, detecting
                                       the moment of the cognitive transformation of the problem representation normally translates into observing convergence
                                       curves of the objective functions being optimized, in young children this involves more complex behavioural properties, such
                                       as verbal metacognition. These behavioural processes can be used as a proxy for the identification of the Aha! moment.
                                       Finally, we propose a conceptual map which integrates the observed behaviours that are used to detect, communicate and
                                       corroborate learning both in humans and machines and we discuss the association of children’s exploratory behaviours, the
                                       Aha! moments and ultimately their explanation generation.

                                       Keywords
                                       Explainability, Child development, Human intelligence, Problem-solving, Behavioural indicators, Explainable AI


1. Introduction                                                                                 retrieval reached by an analytical, multistep strategy,
                                                                                                through which the solver searches long-term memory
For effective hybrid environments where humans collab-                                          for potential algorithms, mental schemas, analogies or
orate with Artificial Intelligence (AI) systems to make a                                       factual knowledge.
decision, a mutual understanding of the reasoning behind                                           In this paper we seek to clarify what behavioural man-
certain actions or recommendations can be of catalytic                                          ifestations indicate the occurrence of the Aha! moment
importance.                                                                                     in children performing certain problem-solving tasks and
              Explainability is one of the features that supports mu-                           instantiate a conceptual map of strategies which are used
tual understanding and trust development [1], and can be                                        to detect, communicate and corroborate learning both
considered as an interface through which machine learn-                                         in humans and machines. The ultimate goal is provid-
ing models can be explained towards a customized and                                            ing a richer test-bed of procedural protocols and tests
diverse set of audiences [2], debugged, and audited. For                                        to more broadly assess learning in machines, beyond a
the generation of explanations, though, implicit knowl-                                         single metric or loss optimization.
edge should become explicit, which often includes the
cognitive process known as the Aha! moment or in-
sight. We adopt the definition of the Aha! moment in
                                                                                                  1.1. Inspiration by children’s
problem solving as a sudden transformation of the prob-                                                 problem-solving
lem representation [3, 4]; this differs from the solution Reverse engineering human intelligence can usefully in-
EBeM’22: IJCAI-ECAI Workshop on AI Evaluation Beyond Metrics,
                                                                                                  form AI and machine learning. The exploration of fun-
July 25, 2022, Vienna, Austria.                                                                   damental cognitive processes that can be informative
(∗ ) Equal contribution.                                                                          for AI approaches often requires focusing on infants or
Envelope-Open vasiliki.charisi@ec.europa.eu (V. Charisi∗ )                                        young children in the context of structured or unstruc-
Orcid 0000-0001-7677-027X (V. Charisi∗ ); 0000-0003-3362-9326                                     tured activities [5, 6, 7]. Self-regulated play, for example,
                                   ∗
(N. Díaz-Rodríguez ); 0000-0003-2857-4922 (B. Mawhin);
0000-0003-4927-8647 (L. Merino)
                                                                                                  that allows children to perform exploratory actions and
                     © 2022 Copyright for this paper by its authors. Use permitted under Creative come up with insights and discoveries in problems they
                     Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings      CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073                                                                  generated has previously been correlated with the de-
Figure 1: Behavioural properties as a proxy for the identification of the Aha! moment in children’s problem solving process.
The proposed conceptual map includes behaviours for the evaluation of the transition from tacit knowledge (appearing in the
phase of exploration) toward explicit knowledge (appearing in the phase of exploitation). The properties include non-verbal
behaviours and verbal metacognitive manifestations (reasoning, planning and reflection). The Aha! moment appears as part of
the transition from tacit to explicit knowledge and functions as an indicator for the generation of explanations.

velopment of their implicit knowledge and their gradual        reorganization of component acts and modularization.
understanding of the surrounding physical world [8].           Although Bruner’s examples came from infants in the
However, what cognitive process are mobilized for the          first year of life, his ideas have been applied to the acquisi-
transformation of tacit into explicit knowledge in young       tion of more complex skills beyond infancy. Additionally,
children? And what behavioural properties can be used          he argued that play is the best way to promote develop-
as a proxy for the identification of those processes?          ment as it can occur with any physical material or with
   Based on a series of behavioural studies with children      imagination, alone or with others and can take place
5 to 7 years of age, we identify behavioural properties rel-   in various settings [10]. The connection of play with
evant to cognitive processes that are present in problem-      the development of fundamental cognitive processes and
solving tasks, with an emphasis on identifying the Aha!        human learning has been well-established [11, 12, 13].
moments, as an intermediate step between exploratory           Self-directed and intrinsically motivated goal generation
actions and the generation of explanations (see Fig. 1),       and problem-solving are among children’s cognitive tools
aiming to inform current and future approaches on ex-          that affect their overall development [7, 14]. In free play,
plainable AI (XAI).                                            children set novel goals, discover unexpected informa-
                                                               tion, and invent problems they would not otherwise en-
                                                               counter. In this context, children apply exploratory pro-
2. Relevant Work                                               cesses that allow them to progressively reduce uncer-
                                                               tainty about their environment [14].
2.1. Problem-Solving in Young Children                             In this context, a problem is defined as a situation in
To understand the fundamentals of problem-solving as           which a solver needs to change a given state to a desired
a cognitive process, developmental psychologists have          one but there are obstacles. There are different types of
extensively explored the involved faculties and the ways       problems such as the routine problem vs. the non-routine
they interact with each other. To this end, classic and con-   problem. The first one refers to a situation in which the
temporary work has examined various tasks that were            solver knows a solution method whereas the second is
used depending on the child’s age and areas of interest.       when the solver has to create a solution method. There is
Bruner, for example, laid out a plan for the development       also the well-defined problem where the state, goal and
of skilled action [9]. First there is intention, then an as-   set of operators are clearly defined. It is opposed to the
sembling of “constituent acts”. They initially occur out       ill-defined problem where the elements are not clearly
of order but later become properly sequenced to reach          defined.
the goal. Bruner emphasised the role of exploratory be-            The problem solving process occurs when a person has
haviour and play prior to achieving skilled action. Flex-      to invent a way to solve it following two main stages: the
ibility and higher order acts become possible through          problem representation and the problem solution. The
                                                               solvers need to comprehend the problem and create a
model of the problem situation. Then, they have to build            The process named scaffolding is described as a process
a solution by using processes of planning, executing and         that enables a child or novice to solve a task or achieve a
they have to monitor it using awareness and control. It          goal that would be beyond his unassisted efforts [20]. To
implies cognitive and metacognitive processes. Problem           achieve more complex tasks (like problem-solving), it is
solving is always domain-specific but the thinking by            necessary to combine simpler skills in order to achieve
analogy strategy seems to be almost always successful.           a higher level of competence. This promotes cognitive
Thinking of a related problem already known and even             growth. The shared space of an activity involving col-
better, already solved, helps for success. An application        laboration mechanisms between peers is also at great
of this is the heuristics which allow a solver to go faster      importance whether it is a human or an artificial agent
to an acceptable solution even if it is not perfectly ac-        [21, 22].
curate. Considering the bounded rationality of humans,
heuristics allows us to make judgements, choices and             2.3. Insight in Problem-Solving
adapt our behaviours efficiently. This is closely related
to the concepts of “social learning” and “adaptation” in    Most commonly, this phenomenon is called the “Aha!”
human development.                                          experience describing the moment when a person gets
   In order to solve a problem, two mental representations  the solution to a problem that up to this point had left
are needed: one of the current state and one of the goal    her puzzled. In cognitive science it is referred to as in-
state. As it is goal-oriented and contextualized, a plan    sight problem solving and it is accompanied by a feeling
detailing the solution step by step is required. A constant of satisfaction for the solver. It has been related to cre-
monitoring process is also required as each move has        ative thinking [23, 24] and includes an exploratory phase
consequences that can bring the solver closer to or further where divergent thinking takes place, especially during
                                                            the early stages of the problem solving process. This al-
to the desired goal state. It also requires mental flexibility
                                                            lows the person to produce new ideas or connect existing
and thus, inhibitory control [15]. If a first chosen solution
seems to be inappropriate, the solver has to adapt his      ideas. The second phase is the convergent thinking phase
strategy.                                                   where a solution should be elected by synthesizing, ana-
                                                            lyzing and monitoring the matching degree of the current
                                                            result to the expected one. Although the experience of
2.2. Social learning
                                                            insight is sudden and can seem disconnected from the im-
Social learning is a crucial component of human intel- mediately preceding thought, recent research shows that
ligence, allowing us to rapidly adapt to new scenarios, insight is the culmination of a series of brain states and
learn new tasks, and communicate knowledge that can processes operating at different time scales. Elucidation
be built on by others [16]. The work of Lev Vygotsky of these precursors suggests interventional opportunities
who put forward this view already in the 1920s takes for the facilitation of insight [3], including concurrent
into account factors such as the language development verbalization [25]. As for every problem solving, most
and cultural influences in the cognitive development of of these strategies rely on a constant restructuring of
children [17]. From his perspective, mental functioning the mental representation of the problem. One way in
and development rely on an interdependence between which explicit knowledge manifests itself is through the
individual and social processes. When learners, whatever formation of causal inferences and the generation of ex-
their age, participate in joint activities, they gain new planations that, in research with children, are used for
abilities and strategies to better understand the world detecting gaps in their causal knowledge.
and adapt to it. This process is also mediated by signs
and tools such as language and mnemonic techniques. 2.4. Explanation generation
Vygotsky folds them in the category of semiotics means.
They are considered as a cornerstone for knowledge co- Regarding explanation generation, there is a large body
construction and can help independent problem-solving of works in various fields. But it always implies an ex-
activity. This leads to the difference of what a learner plainer and an explainee with their own respective char-
can do with or without help as he described under the acteristics. Of particular interest across the fields is the
concept of the Zone of Proximal Development (ZPD). The role of the Theory of Mind ie. the ability of a person
social interaction with the use of linguistic and cultural to attribute mental states to the consequent behaviours
tools facilitate the internalization of knowledge and its of herself or others [26]. The selection and evaluation
transformation into cognitive tools supporting the de- processes of explanations depend on the explainer and
velopment of new cognitive functions. The latter aspect explainee, but also on the characteristics of the context.
has been considered for the design of artificial agents The nature of an interaction for explaining is different
that are able to interact with others and internalize these in kindergarten between the teacher and a young child
interactions in a similar way as humans [18, 19].           than between the cockpit desk and the pilots during a
flight. The role of beliefs has also been raised recently
as a cornerstone. An explanation does not necessarily
needs to be consistent with a person’s beliefs but should
help promoting a revision component [27] thus allowing
the evolution of the internal representations. Human
explanations from social sciences became an integrated
part of Artificial Intelligence (AI) through the XAI field
in order to provide explanatory agents and to facilitate
interactions between humans and machines.


3. Methodological approach
                                                                 Figure 2: Self-regulated music-making setting: a. The Re-
We aimed to identify behavioural properties in young             actable, a table-top interface for sound synthesis and the
children’s problem-solving process that have been al-            touchscreen version of it with two participants; b. The Sibelius
located on the transition from tacit knowledge to the            Groovy, a music-making software for children and the setting
development of explicit knowledge and the generation of          with two children.
explanations. We used two types of problem-solving ac-
tivities, an open-ended task (computer-supported music-          knowledge. For each study we first describe the original
making) and the cognitive task of the Tower of Hanoi             goal of the study, analysis of the data that are relevant
(ToH). We considered the above-mentioned theories and            to the current work and the corresponding findings and
we conducted three behavioural studies to explore chil-          we reflect on how this contributes to the purposes of this
dren’s processes in various settings. In addition, w to          work.
solve the same problem of the ToH. For the purpose of
this paper, we take a case-study methodological approach.        4.1. Study 1: Identification of behaviours
Case studies are in-depth investigations of a single per-
son, group, event or community which are approached              The scope of the study was to identify behaviours that
from a qualitative perspective [28]. All the included case-      emerge spontaneously when children are involved in an
studies meet the following criteria: (i) The sample con-         open-ended problem solving activity and to observe their
sists of children aged 5 to 7 years old, and (ii) the setting    development over time. As such, the setting of the study
facilitates children’s self-regulated activities. In order for   was based on ethnographic methodological principles
us to ensure the necessary variability among the case            and there was no adult intervention during the activities.
studies, we included (i) open vs. non open-ended tasks           Open-ended tasks without adult intervention provide the
and (ii) different types of social contexts (collaboration       space for children to pursue their goals in self-regulated
with two children, collaboration of one child with a robot,      and intrinsically motivated manner. We designed a natu-
hybrid collaboration with two children and a robot, see          ralistic behavioural study in a school-setting with 𝑁 = 16
Figure 5).                                                       young children (5-6 years old) who were invited to com-
   It should be noted that any comparison among the              pose music in pairs with the use of two dedicated screen-
studies was outside the scope of this paper; rather, our         based software packages in a weekly basis over a period
goal is to make a synthesis of the results as appeared in        of maximum 8 weeks (Fig. 5.) The children were only
different settings. For this reason, we only provide the         asked to create music with the sounds provided by the
necessary overall findings for each study and we adopt           digital tool. No other intervention was performed from
a qualitative reflective approach for one representative         the experimenter. The observations included 1795.51𝑚𝑖𝑛
case-study per experiment. The selection of the case             of video data which were transcribed based on an anno-
studies was based on their relevance to the purpose of           tation scheme with a taxonomy of behaviours in relation
this work and on their representativeness of children’s          to children’s cognitive process, social interactions and
average behaviour in specific settings.                          affective engagement. For the purposes of this paper we
                                                                 only focus on the first category. A detailed description
                                                                 of the study appears in [29].
4. Empirical Studies: A Selection
   of Use Cases                                                  4.1.1. Data analysis
                                                              For the elaboration of the data we used the approach of
This section presents a line of empirical evidence that
                                                              microgenetic analysis [30]. The microgenetic method
have contributed to our identification of behavioural indi-
                                                              is defined by three properties: (a) observations span a
cators that facilitated the transition from tacit to explicti
                                                              period of rapid change in competency; (b) the density of
    Code           Behaviour           Occurrence (%)          4.1.2. Reflection
     C1     Spontaneous musicking      11.05                   Computer-supported music composition was selected as
     C2       Sound exploration        15.87                   an open-ended task which does not include a predefined
     C3           Assessment           27.38
                                                               objective final “solution”; rather, it involves decision-
     C4           Reasoning            18.6
     C5      Deliberate musicking      13.06                   making based on subjective criteria and self-regulated
     C6            Planning            14.04                   goal identification and provides the context for the emer-
                                                               gence of a variety of processes and interactions. We
Table 1                                                        identify two major findings relevant to the scope of this
The taxonomy of behaviours that emerged during the open-       paper; first, despite the unstructured and the highly ex-
ended self-regulated task of children’s collaborative music-   ploratory nature of this task, we observed that children
making and the percentage of occurrence per behaviour.         exhibited behaviours that correspond to “making” and to
                                                               “reflecting”. Spontaneous and exploratory actions were
                                                               mixed with deliberate actions and planning while the
                                                               latter were supported by assessment and reasoning. Sec-
                                                               ond, the collaborative setting of this study facilitated
                                                               children’s verbal interactions and negotiations during
                                                               their decision-making process and consequently their
                                                               reasoning and reflection on their actions. These process
                                                               correspond to the mobilization of theirverbal metacog-
                                                               nition part of which was the generation of explanations
                                                               during the negotiation of their task-related decisions.
                                                               This means that given the opportunity (in this case col-
                                                               laborative setting), children as young as 5 years old ac-
Figure 3: Average percentage of children’s behaviours in       tively engage in self-initiated reflection on their actions
Study 1, Making (C1, C2, C5 and C6) and Reflecting (C3 and     and imagine the future outcomes while being able to
C4).                                                           explain their reasoning to the collaborator. However,
                                                               we observed that they often lacked the verbal abilities
observations is high relative to the rate of change; and (c)   and the terminology for accurate explanations. For this
the observations are subjected to an intensive, trial-trial    reason, they mobilised other available modalities, such
analysis to infer the processes that give rise to change.      as gestures, and used the affordances of the graphical
The annotation of the data was based on children’s verbal      user interface of the tool provided to complement their
and non-verbal behaviours and the corpus included 7063         explanatory behaviours.
annotated behaviours. The taxonomy of the behaviours
that related to children’s cognitive processes and the
percentage of their occurrence appear in Fig. 1.
                                                               4.2. Study 2: An indication for the Aha!
   The results indicate that despite the fact that the par-         moment
ticipant children were of a relatively young age - which is    The goal of this experiment was to test the impact of
typically related to exploratory actions - the behaviours      the type of a robot intervention on children’s problem-
of deliberate musicking (C5) and planning (C6) appeared        solving process. We used the cognitive task of the Tower
slightly more than the exploratory behaviours of sponta-       of Hanoi (ToH) [31] which is used to measure children’s
neous musicking (C1) and sound exploration (C2).               planning abilities and inhibitory control. To reach the
   Furthermore, a grouping of the behaviours that corre-       optimal solution, it requires participants to involve in-
spond to reflective actions (C3 and C4) and the ones that      hibition of impulsive moves that superficially bring the
correspond to active music-making (C1, C2, C5 and C6)          child closer to the goal, but are unhelpful for the longer-
reveals that the “reflecting” behaviours occurred 46.18%       term solution [32]. We designed a experiment with three
of the total cognitive behaviours, while the active music-     phases: a baseline (single child), an intervention (manipu-
making behaviours occurred 53.82% (see Fig. 3). These          lation of the robot’s behaviour) and an evaluation (child’s
results indicate that despite the young age of the partici-    voluntary interaction with the robot) for 𝑁 = 20 chil-
pants, reflecting and reasoning about the musical choices      dren 5 to 7 years old. For the intervention phase, we had
appear as an integral part in children’s cognitive engage-     two conditions; in Condition1, the robot and the child
ment with music-making.                                        solved the task in a turn-taking setting and in Condi-
                                                               tion2 we designed a child-initiated voluntary interaction
                                                               with the robot. In this paper, we focus on a single child’s
                                                               problem-solving process to explore behavioural proper-
Figure 4: A child’s performance of the Tower of Hanoi task over time (in seconds) (x-axis) with the duration of each move (in
seconds), in addition to a moving average of the last three movements (y-axis). We observe that throughout the task the child
exhibited a mixture of optimal (blue lines) / suboptimal (pink lines) and slow (lower) / fast (higher) moves.

ties relevant for our understanding of the transition from     instability, the incremental optimization and the perfor-
exploratory actions to the transformation of the prob-         mance stabilization. During the exploratory phase, the
lem representation, which requires the involvement of          children were reinforced by the results of their actions
inhibitory control and the stabilization of the optimal        which eventually guided them to the restructuring of the
performance. The details of the study appear in [33].          problem representation and consequently the use of the
                                                               strategy which is based on inhibitory control. After the
4.2.1. Data analysis                                           Aha! moment, we observe a stabilisation of the optimal
                                                               moves which indicates learning. One of the limitations of
We evaluated the task performance in relation to the           this study was the fact that it was not designed in a way
trajectory of optimal and suboptimal movements over the        to facilitate the child’s verbalisation of their thoughts,
course of the task. The optimal movements are defined          reasoning and reflections. For this reason, we were not
as the ones that lead to the solution of the task with         able to make any inferences regarding the children’s rea-
the minimum number of movements. In addition, we               soning, their verbal metacognition and the generation
measured the relevant speed of the movements in relation       of possible explanations during the problem-solving pro-
to the baseline of each participant. Given the assumption      cess.
that during the task the children sustained the necessary
attention, we identify point A in Fig. 4 as the point
that separates the phase of mostly suboptimal moves            4.3. Study 3: Social Interaction and
(red peaks) with the phase of mostly optimal movements              Explanations
(blue peaks), which are also carried out faster.            The purpose of study 3 was to explore the role of a social
                                                            robot on children’s collective problem-solving and the
4.2.2. Reflection                                           child-child social dynamics in a setting of two children
We observed exploratory behaviours that typically were and one robot (see Fig. 5). We built upon study 2 and
characterised by increased number of suboptimal moves. we used the same task, the ToH task and the same robot.
We identify as an Aha! moment, the point when a trans- We designed a controlled 2X2 experimental study with
formation of the mental representation of the problem 𝑁 = 86 children who all participated in a baseline session
occurs which, in this task, is behaviourally manifested (without robot), an intervention (with the manipulation
by the mobilization of inhibition as a strategy for the of robot behaviour, in terms of its cognitive reliability
optimal solution of the task, meaning that the child in- and expressivity) and an evaluation session (with child-
hibits the impulsive move and performs the less obvious initiated form of interaction) to solve the Tower of Hanoi
one that will lead to the stabilization of optimal solution task with an incremental difficulty level in the different
of the task (see point A in Fig. 4). This is a cognitive experimental configurations without any expert’s inter-
strategy that in the age-group of the present studies does vention. For the purposes of this paper, we focus on the
not appear intuitively. As shown in Figure 1, behavioural findings on the patterns of children’s social interactions
properties that appear in the problem-solving process in and verbal negotiations and explanations during the col-
the context of the given tasks include the performance lective task performance. The detailed research design,
                                                            analysis and findings of the study appear in [22].
Figure 5: Setting of Study 3: Two children collectively solve
the Tower of Hanoi task in a turn-taking or child-initiated     Figure 6: Asking for help scenario with different ask for help
voluntary interaction with the robot.                           values: the LA2 tries to solve the game alone while being able
                                                                to ask for help whenever its best action is not good enough
4.3.1. Data analysis                                            (plot not on logarithmic scale as the agent asks for help at
                                                                most 7 times)
We observed that the setting of the study facilitated child-
child social interaction and verbal reflection, reasoning catalytic for the facilitation of their task-related planning
and planning appeared to be an integral part of the pro-  as part of explanation generation. This was more evident
cess which was lacking from study 2. To measure the       in the sessions with the robot. One possible explanation
team disparity, we define social interaction, 𝑆, as the   for this is the fact that one of the conditions involved
number of task-related interactions between children.     a robot that suggested suboptimal movements. In that
                            𝑆1 + 𝑆 2                      case, the children engaged in child-child negotiations
                        𝑆=                                and explanation generation to collectively take a deci-
                               𝐿
                                                          sion for the next move. Our observations indicate that
where 𝑆𝑛 with 𝑛 = 1, 2 refers to the number of times two cognitive strategies were involved in children’s ex-
child 𝑛 addresses their peer with a task-related verbal planations, planning as a part of an a priori explanation
or non-verbal (i.e. pointing and gestures) behaviours of their reasoning for a certain decision and reflecting
and L refers to the number of movements needed by the as a part of an a posteriori explanation. We need yet to
team to solve the task. Our analysis showed that children analyse the association of the strategy of planning in
had a higher 𝑆 rate during the sessions with the robot, the context of explanatory behaviours and its relation
namely the Intervention (𝑀 = 0.16, 𝑆𝐷 = 0.14) and the to a preceding Aha! moment. It should be noted that
Evaluation (𝑀 = 0.13, 𝑆𝐷 = 0.092) which differed signif- additional non-verbal manifestations, such as pointing
icantly from the Baseline session (𝑀 = 0.06, 𝑆𝐷 = 0.09) and gestures, were mobilised in the cases that a child did
with 𝑝 = 0.08 and 𝑝 = 0.015. Among the verbal manifes- not have the verbal maturity to formulate the planning
tations we identified the utterances related to planning or the explanation.
as one of the strategies children used to negotiate for
the next movement on the ToH task. We identified the
balance between children in the planning of the move- 4.4. Study 4: Multi-agent setting
ments, and defined a planning disparity metric, as the This study in [34] consists of the same non-open-ended
absolute difference in the number of interactions initi- task (ToH) and collaborative setting: one learning agent
ated by each child of the team: Our analysis showed that (LA) and one helping agent with focus on the voluntary
there was a significant difference in task performance interaction among artificial agents. In order to explore
(𝑈 = 297, 𝑝 < 0.001) between teams with a balanced plan- if algorithms benefit from asking for help in collabora-
ning performing better (𝑁 = 19, 𝑀 = 0.51, 𝑆𝐷 = 0.40) tive problem-solving, as children do, two hypotheses are
compared to groups with an unbalanced planning be- tested:
haviour (𝑛 = 18, 𝑀 = 1, 61, 𝑆𝐷 = 0.98). In this case,        H1: Canonical interventions from an expert speed up
planning was used as part of the explanation formation learning.
which was observed to be one of the strategies for chil-     H2: Getting help on demand from an expert accelerates
dren’s negotiations in problem-solving.                   finding the optimal solution compared to not on demand.
                                                             The expert intervention occurs in 2 different scenarios:
4.3.2. Reflection                                         1) LA1 solves the task in collaboration with the help-
                                                          ing agent in a “turn-taking” scenario, which results in
Children’s social verbal and non-verbal interaction dur-
                                                          a canonical cognitive intervention from the expert. 2)
ing the problem-solving process in Study 3 appeared
LA2 solves the task independently, having the option to               2. The lack of a natural language interactive commu-
ask for help of the expert whenever (if) this is needed                  nication interface impedes questioning the sys-
resulting in an on demand intervention. Two parameters                   tem about its confidence in real time, or uncer-
are assessed: 1) Canonical intervention (help) rate (every               tainty estimates.
2, 3 or 4 turns), and 2) Ask-for-help threshold (from 0               3. The dependence of a happy end solving the
to 1). The last parameter was created to simulate what                   task constraints explanations during the learn-
happens when a child asks for help: if the best policy                   ing phase. How can an agent communicate to
value is lower than the ask for help parameter, the expert               its developer or user its struggles in solving the
will play instead of the LA.                                             task when it has not yet achieved a satisfactory
                                                                         performance? Could common AI practices (data
4.4.1. Data analysis                                                     augmentation, fine-tuning), be accessible to the
                                                                         agent for it to communicate and be able to choose
From reinforcement learning (RL) plots, as training                      to change them?
episodes evolve as a function of the mean number of
                                                                      4. Most deep RL models rely on baselines of other
moves to solve the task, some interpretations are ex-
                                                                         agents to assess their worth, lacking comparisons
tracted:
                                                                         in multi-agent settings [36] with children learn-
   From scenario 1 it is observed that the LA is more
                                                                         ing.
efficient when it is helped by the expert in a turn taking
scenario, and that it is even more effective when helped              5. The inability of current RL algorithms to commu-
every 4 turns rather than every 2. The importance of                     nicate the continual progress beyond reporting
exploring on its own is showcased by the agent, rather                   a sole reward value obtained at the end of a con-
than always having the optimal solution.                                 vergence curve makes it challenging for LAs to
   Even when all approaches converge in both scenarios,                  explain their skills to solve the credit assignment
in scenario 2, the agent that asks for help becomes also                 problem, their difficulties or agility to complete
faster and more effective: help is most useful at the be-                sub-tasks, their acting self-confidence, or learned
ginning of learning. After asking for help many times                    savvy behaviours.
during the first episodes it starts solving the task by itself,       6. The lack of alignment of explanations of LAs with
resulting in an increase of inefficient moves. The agent                 meta-learning and trustworthy AI dimensions
seems to gain confidence in movements which, while im-                   (such as the trust calibration meta-information
perfect, allow the task resolution by exploring different                taxonomy [1]) should be accounted for in the
states.                                                                  explanation generation process, in the same way
   Compared to the LA not being helped, the asking-for-                  as mechanisms to ensure the reproducibility of
help agent is a lot more efficient, but there is not much                insights-built explanations.
variation among the canonical and the help-asking con-               Reflection
figurations. This is probably due to the rather simple               This is the only study not involving kids, but follow-
simulation of the trigger for the request of help. Simu-          ing the same protocol as in the ToH studies. While RL
lating the child’s behaviour is a complex task and more           model developers communicate a model learning conver-
emphasis needs to be placed on accurately describing it,          gence due to reaching plateaus in learning curves, these
including how to simulate the ”asking for help” function.         changes should as well reflect key changes in kid be-
Adding mechanisms such as intrinsic motivation, about             haviour. However, we showed this is not always evident
the LA’s desire to solve the game on its own, could make          to map. Providing the learning agent with signals such as
the comparison more accurate. The agent asks for help             the Aha! moment to adjust its self learning /hyperparam-
when it considers that a movement is not good enough              eter changes could be paramount to avoid blind manual
to be played, whereas the actual mechanisms that drive            engineering (on e.g., reward function crafting) processes
the child to ask for help are more complex.                       where no common procedures exist. We believe these
   We identify emergent issues that should be part of the         are the explainable dimensions that XAI for RL should
explanation for the design of a LA, and that are not part of      work on (identifying Aha! moments, categorising prob-
the modelled problem nor a representation is accessible           lems, difficulty, environments, collaboration/competition
for the agent and thus, for its explanation:                      dimensions, etc).
    1. The lack of a multimodal input space [35] for the             The difficulties to explain the learning process of a
       agent to perform in the action-interaction loop            single LA could be reduced by involving interaction with
       of RL can restrain the agent from exhibiting the           other agents. One could attend to social interaction [16]
       correct behaviour and communicating as humans              and social influence as intrinsic motivation [37] learning
       would expect.                                              metrics. Both showed to enhance learning in multiagent
                                                                  settings.
   Explanations should reflect the needs for these incen-       2. Children understand but sometimes lack the cog-
tives that agents depend on to progress. Once an agent             nitive and metacognitive skills to explain. Expla-
learned, it is not enough that the agent performs tasks            nations are subject to both biological and artificial
in less time, and better, but also that it uses other human        systems’ understanding of properties of a given
factors or social outcome metrics such as in [38, 39].             task, and in young children explanations are sub-
                                                                   ject to their verbal abilities. Children often use
                                                                   gestures such as pointing, which means that ex-
5. Conclusions, Limitations and                                    planations that can support human-AI interaction
   Future work                                                     are subject to tools responsible for social interac-
                                                                   tion. Aiming towards human-level AI requires a
We presented a set of behavioural studies and an exper-            broader set of key social skills for complex embod-
iment with a Q-learning algorithm in order to identify             ied communication in multimodal settings within
behavioural properties that relate to the Aha! moment,             constantly evolving social worlds [41].
i.e., the moment of the restructuring of the problem rep-       3. The hybrid use of strategies of “planning” and
resentation (Fig. 1). These behavioural properties appear          “insight”: Questioning the false dilemma of logi-
as part of the transition from exploratory to explanatory          cal reasoning vs machine learning, we argue for
behaviours (tacit to explicit knowledge). They include             a synergy between these two paradigms in order
task-related observations such as performance instabil-            to obtain hybrid AI systems.
ity, incremental optimization and stabilization as well as      4. Both social interaction [16] and social influence
verbal metacognitive manifestations (observed only in              as intrinsic motivation [37] show to be enhancers
two of our studies with children) that involve reasoning,          for learning in multi-agent settings.
reflection and planning. These behavioural properties
seem to facilitate the Aha! moment and eventually the        This paper is our first attempt to synthesise the results
generation of explanations by children.                   of our research on children’s problem solving in different
   In current RL practice, detecting this moment normally settings and combine them with our research on XAI.
translates into observing convergence curves of the ob-   However, due to space limitations, we are limited to pro-
jective functions being optimized (normally reaching a    vide overviews without in-depth analysis. We aim to
plateau in cumulative reward or optimized loss, usually   tackle the latter in our future work.
both). This is an external signal not usually leveraged      We hope this work is useful beyond developmental
by the agent. Although there are exceptions such as the   robotics and AI, i.e., facilitating an effective and ethical
use of artificial curiosity signals for self learning of the
                                                          deployment of RL systems, e.g. from energy building
agent [40]), in regular AI model development practice,    management to AI for health, where evaluating single
we must highlight the need for easier mechanisms to       reward functions simply does not reflect nor assess the
convey actions that demonstrate the difficulties of the   complexity of the system nor the difficulties it has to deal
agent until convergence plateaus and/or a sufficient levelwith.
of an XAI metric are reached.                                Future work should aim to involve more tangible eval-
   We summarise the main points we propose to consider    uation metrics that both 1) optimize technical robust-
in approaches evaluating XAI, as follows:                 ness more broadly, and 2) reflect a human-centered view
    1. The Aha! moment (or problem representation re- where machine learning factors are questioned, moni-
       structuring) acts as the intermediate step between tored and explained in parallel ways to how children
       non-explainable and explainable behaviours. In learn. Evaluation mappings across human and machine
       a deeper view, since the explanation acts as an learning will allow us to better assess the trade-offs be-
       interface between the model and a given target tween AI assisted decision making and policies.
       audience, the Aha! moment is a trigger signal for
       a model to start elaborating explanations. More
       effort should be put into specifying the meaning
                                                          6. Acknowledgments
       of the Aha! moment in various tasks that RL Charisi is supported by the HUMAINT project of the
       models are currently tackling. Defining and de- JRC; Díaz-Rodríguez by IJC2019-039152-I funded by
       tecting high level policies characterising an Aha! MCIN/AEI /10.13039/501100011033 by “ESF Investing in
       moment (e.g. in terms of key/exploratory action your future” and Google Research Scholar Program; and
       sequences) can be signs we should be able to not Merino by Programa Operativo FEDER Andalucia 2014-
       only programmatically detect, but also communi- 2020 through the project DeepBot (PY20_00817).
       cate. In this way, we can achieve explainable and
       reliable models, since Aha! signs must act as an
       additional proxy to attain trustworthy systems.
References                                                        Learning (ICML) (2021).
                                                             [17] L. S. Vygotsky, M. Cole, Mind in society: Develop-
 [1] G. J. Cancro, S. Pan, J. Foulds, Tell me something           ment of higher psychological processes, Harvard
     that will help me trust you: A survey of trust cali-         university press, 1978.
     bration in human-agent interaction, arXiv preprint      [18] C. Colas, T. Karch, C. Moulin-Frier, P.-Y. Oudeyer,
     arXiv:2205.02987 (2022).                                     Vygotskian autotelic artificial intelligence: Lan-
 [2] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Ben-        guage and culture internalization for human-like
     netot, S. Tabik, A. Barbado, S. García, S. Gil-López,        AI, arXiv preprint arXiv:2206.01134 (2022).
     D. Molina, R. Benjamins, et al., Explainable artifi-    [19] J. Lindblom, T. Ziemke, Social situatedness of natu-
     cial intelligence (xai): Concepts, taxonomies, op-           ral and artificial intelligence: Vygotsky and beyond,
     portunities and challenges toward responsible AI,            Adaptive Behavior 11 (2003) 79–96.
     Information fusion 58 (2020) 82–115.                    [20] D. Wood, J. S. Bruner, G. Ross, The role of tutoring
 [3] J. Kounios, M. Beeman, The aha! moment: The cog-             in problem solving., Child Psychology & Psychiatry
     nitive neuroscience of insight, Current directions           & Allied Disciplines (1976).
     in psychological science 18 (2009) 210–216.             [21] L. Kerawalla, D. Pearce, J. O’Connor, R. Luckin,
 [4] H. Stuyck, A. Cleeremans, E. Van den Bussche, Aha!           N. Yuill, A. Harris, Setting the stage for collabora-
     under pressure: The aha! experience is not con-              tive interactions: Exploration of separate control
     strained by cognitive load, Cognition 219 (2022)             of shared space., in: AIED, 2005, pp. 842–844.
     104946.                                                 [22] V. Charisi, L. Merino, M. Escobar, F. Caballero,
 [5] L. B. Smith, L. K. Slone, A developmental approach           R. Gomez, E. Gómez, The effects of robot cogni-
     to machine learning?, Frontiers in psychology 8              tive reliability and social positioning on child-robot
     (2017) 2124.                                                 team dynamics, in: 2021 IEEE International Con-
 [6] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J.             ference on Robotics and Automation (ICRA), IEEE,
     Gershman, Building machines that learn and think             2021, pp. 9439–9445.
     like people, Behavioral and brain sciences 40 (2017).   [23] T. I. Lubart, C. Mouchiroud, Creativity: A source
 [7] P.-Y. Oudeyer, L. B. Smith, How evolution may work           of difficulty in problem solving, The psychology of
     through curiosity-driven developmental process,              problem solving (2003) 127–148.
     Topics in Cognitive Science 8 (2016) 492–502.           [24] W. Carpenter, The aha! moment: The science be-
 [8] M. M. Andersen, J. Kiverstein, M. Miller, A. Roep-           hind creative insights, in: Toward Super-Creativity-
     storff, Play in predictive minds: A cognitive theory         Improving Creativity in Humans, Machines, and
     of play (2021).                                              Human-Machine Collaborations, IntechOpen, 2019.
 [9] J. S. Bruner, Organization of early skilled action,     [25] Y. Chu, J. N. MacGregor, Human performance on
     Child development (1973) 1–11.                               insight problem solving: A review, The Journal of
[10] P. Barrouillet, Theories of cognitive development:           Problem Solving 3 (2011) 6.
     From piaget to today, Developmental Review 38           [26] T. Miller, Explanation in artificial intelligence: In-
     (2015) 1–12.                                                 sights from the social sciences, Artificial intelli-
[11] M. H. Siegel, R. W. Magid, M. Pelz, J. B. Tenenbaum,         gence 267 (2019) 1–38.
     L. E. Schulz, Children’s exploratory play tracks the    [27] M. Shvo, T. Q. Klassen, S. A. McIlraith, Towards
     discriminability of hypotheses, Nature communi-              the role of theory of mind in explanation, in: In-
     cations 12 (2021) 1–9.                                       ternational Workshop on Explainable, Transpar-
[12] J. Chu, L. E. Schulz, Play, curiosity, and cogni-            ent Autonomous Agents and Multi-Agent Systems,
     tion, Annual Review of Developmental Psychology              Springer, 2020, pp. 75–93.
     2 (2020) 317–343.                                       [28] R. K. Yin, et al., Design and methods, Case study
[13] A. Gopnik, Childhood as a solution to explore–               research 3 (2003).
     exploit tensions, Philosophical Transactions of the     [29] V. Charisi, C. C. Liem, E. Gomez, Novelty-based
     Royal Society B 375 (2020) 20190502.                         cognitive processes in unstructured music-making
[14] M. Pelz, C. Kidd, The elaboration of exploratory             settings in early childhood, in: 2018 Joint IEEE
     play, Philosophical Transactions of the Royal Soci-          8th International Conference on Development and
     ety B 375 (2020) 20190503.                                   Learning and Epigenetic Robotics (ICDL-EpiRob),
[15] J. M. Unterrainer, A. M. Owen, Planning and                  IEEE, 2018, pp. 218–223.
     problem solving: from neuropsychology to func-          [30] R. S. Siegler, icrogenetic analyses of learning, in:
     tional neuroimaging, Journal of Physiology-Paris             Handbook of child psychology: Cognition, percep-
     99 (2006) 308–317.                                           tion, and language, John Wiley & Sons Inc, 2006, p.
[16] K. Ndousse, D. Eck, S. Levine, N. Jaques, Learning           464–510.
     social learning, Internation Conference on Machine      [31] H. A. Simon, The functional equivalence of prob-
     lem solving skills, Cognitive psychology 7 (1975)
     268–288.
[32] N. A. Zook, D. B. Davalos, E. L. DeLosh, H. P. Davis,
     Working memory, inhibition, and fluid intelligence
     as predictors of performance on tower of hanoi
     and london tasks, Brain and cognition 56 (2004)
     286–292.
[33] V. Charisi, E. Gomez, G. Mier, L. Merino, R. Gomez,
     Child-robot collaborative problem-solving and the
     importance of child’s voluntary interaction: a de-
     velopmental perspective, Frontiers in Robotics and
     AI 7 (2020) 15.
[34] A. Bennetot, V. Charisi, N. Díaz-Rodríguez, Should
     artificial agents ask for help in human-robot collab-
     orative problem-solving?, Brain-PIL Workshop at
     ICRA, Paris/Remote (2020).
[35] A. Holzinger, M. Dehmer, F. Emmert-Streib, R. Cuc-
     chiara, I. Augenstein, J. Del Ser, W. Samek, I. Ju-
     risica, N. Díaz-Rodríguez, Information fusion as
     an integrative cross-cutting enabler to achieve ro-
     bust, explainable, and trustworthy medical artificial
     intelligence, Information Fusion 79 (2022) 263–278.
[36] A. Heuillet, F. Couthouis, N. Díaz-Rodríguez,
     Explainability in deep reinforcement learning,
     Knowledge-Based Systems 214 (2021) 106685.
[37] N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre,
     P. Ortega, D. Strouse, J. Z. Leibo, N. De Freitas,
     Social influence as intrinsic motivation for multi-
     agent deep reinforcement learning, in: Interna-
     tional Conference on Machine Learning, PMLR,
     2019, pp. 3040–3049.
[38] J. Perolat, J. Z. Leibo, V. Zambaldi, C. Beattie,
     K. Tuyls, T. Graepel, A multi-agent reinforcement
     learning model of common-pool resource appropri-
     ation, Advances in Neural Information Processing
     Systems 30 (2017).
[39] A. Heuillet, F. Couthouis, N. Díaz-Rodríguez, Col-
     lective eXplainable AI: Explaining Cooperative
     Strategies and Agent Contribution in Multiagent
     Reinforcement Learning With Shapley Values, IEEE
     Computational Intelligence Magazine 17 (2022)
     59–71.
[40] D. Pathak, P. Agrawal, A. A. Efros, T. Darrell,
     Curiosity-driven exploration by self-supervised pre-
     diction, in: International conference on machine
     learning, PMLR, 2017, pp. 2778–2787.
[41] G. Kovač, R. Portelas, K. Hofmann, P.-Y. Oudeyer,
     SocialAI: Benchmarking socio-cognitive abilities in
     deep reinforcement learning agents, arXiv preprint
     arXiv:2107.00956 (2021).