=Paper= {{Paper |id=Vol-2659/steels |storemode=property |title=Personal dynamic memories are necessary to deal with meaning and understanding in human-centric AI |pdfUrl=https://ceur-ws.org/Vol-2659/steels.pdf |volume=Vol-2659 |authors=Luc Steels |dblpUrl=https://dblp.org/rec/conf/ecai/Steels20 }} ==Personal dynamic memories are necessary to deal with meaning and understanding in human-centric AI== https://ceur-ws.org/Vol-2659/steels.pdf
                             Personal Dynamic Memories
                        are Necessary to Deal with Meaning and
                         Understanding in Human-Centric AI
                                                                       Luc Steels1


Abstract. Human-centric AI requires not only a fundamental shift               to qualities such as fairness and respect.
in the way AI systems are conceived and designed but also a reori-             (ii) Human-centric AI requires that a system is able to explain its rea-
entation in basic research in order to figure out how AI can come              soning and learning strategies so that the decisions are understand-
to grips with meaning and understanding. Meanings are made up of               able by humans. Only by emphasizing human understandability will
distinctions to categorize and conceptualize an experience at differ-          human-centric AI achieve proper explainability and transparency.
ent levels, from directly observable factual meanings to expressional,         (iii) Human-centric AI should not only learn by observation or the-
social, conventional and intrinsic meanings. Meanings get organised            orizing about reality but also by taking advice from humans, as sug-
into larger-scale narratives that conceptualize experiences from a par-        gested in John McCarthy’s original 1958 proposal of the Advice
ticular perspective. Understanding is the process of constructing and          Taker [13].
then integrating these narratives into a Personal Dynamic Memory               (iv) Human-centric AI should be able to use natural communication,
that stores narratives from past experiences. This memory plays a              i.e. communication primarily based on human language, not only by
crucial role to construct more narratives and thus works intimately            mimicking language syntax but, more importantly, using the rich se-
together with inferences, mental simulations, and the analysis of ex-          mantics of natural languages, augmented with multi-modal commu-
periences in terms of syntactic and semantic structures.                       nication channels. This is needed to support explainability, and ac-
   This paper outlines this approach to meaning and understanding              countability.
by clarifying what it entails, outlining technical challenges that must        (v) Human-centric AI should have the capacity of self-reflection
be overcome, and providing links to earlier relevant AI work as well           which can be achieved by a meta-level architecture that is able to
as new technical advances that could make Personal Dynamic Mem-                track decision-making and intervene by catching failures and repair-
ories a reality in the near future. 2                                          ing them. By extension, the architecture should support the construc-
                                                                               tion of a theory of mind of other agents, i.e. how they see the world,
                                                                               what their motivations and intentions are, and what knowledge they
1     What is human-centric AI?                                                are using or lacking. Only through this capacity can AI achieve in-
“Human-centric AI focuses on collaborating with humans, enhanc-                telligent cooperation and adequate explicability, and learn efficiently
ing human capabilities, and empowering humans to better achieve                through cultural transmission.
their goals.” [17]. Human-centric AI has become a focal point of               (vi) Finally, human-centric AI should reflect the ethical and moral
current research, particularly in Europe, where it is now the stated           standards that are also expected from humans or organisations in our
objective of the EU strategy recently (February 2020) issued by the            society, particularly for supporting tasks that are close to human ac-
European Commission. This strategy calls for AI that shows human               tivity and interest.
agency and oversight, technical robustness and safety, privacy and                Today the dominating perspective on AI is not human-centric. It
data governance, transparency, care for diversity, non-discrimination          focuses primarily on achieving high predictive performance on pre-
and fairness, focus on societal and environmental well-being, and ac-          defined benchmarks, trying to exceed human performance so that hu-
countability [36].                                                             mans can be replaced in the task being considered. This approach is
   Achieving human-centric AI requires a number of changes in fo-              machine-centric rather than human-centric. It emphasizes numeri-
cus compared to current AI:                                                    cal (subsymbolic) techniques (from neural network research, pattern
(i) Human-centric AI systems should be made aware of the goals and             recognition, information retrieval, and data science), often ignoring
intentions of their users and base their own goals and dialog on mean-         valuable contributions from symbolic AI that are needed to achieve
ings rather than on statistical patterns of past behavior only, even if        explicability and robustness.
statistical patterns can play a very important role, for example for              Admittedly the machine-oriented focus has recently lead to a jump
drastically reducing search or carrying out approximate inference.             in performance on chosen benchmarks, particularly in the domain of
Human goals and values should always take precedence. Respect for              pattern recognition and computer vision, but unfortunately also to a
human autonomy should be built into the system by design, leading              kind of AI that is opaque, cannot explain or defend its decisions, is
                                                                               unable to take human advice, is not robust against adversarial attacks,
1 Catalan Institute for Advanced Studies ICREA - Institute for Evolutionary    has no understanding of the motivations of its users, and requires vast
    Biology (UPF-CSIC) Barcelona Spain, email: steels@arti.vub.ac.be           amounts of data and computing power. Although for a large, growing
2 Copyright 2020 for this paper by its authors. Use permitted under Creative
                                                                               class of applications these shortcomings are not an issue, for AI ap-
    Commons License Attribution 4.0 International (CC BY 4.0)
plications that touch on human lives and are socially consequential,            goals, interests, and motivations of the actors and their psycho-
these disadvantages are highly problematic.                                     logical states or the manner in which they carry out actions.
   Different approaches to human-centric AI have been proposed                • The next level is that of social meaning. It is about the social rela-
recently. They are all valuable. Some researchers have advocated                tions between the actors and how the activities are integrated into
guidelines and design methodologies to make AI more trust-worthy                the local community or the society as a whole.
and responsible by emphasizing safety, privacy, data governance,              • The fourth level is that of conventional meaning, based on figuring
transparency, diversity, fairness, and accountability [30], [7]. Oth-           out what is depicted or spoken about and the historical or cultural
ers have emphasized that we need more human-centric interfaces                  context, which has to be learned from conversations or cultural
for AI systems, including better explanation facilities and ways for            artefacts, like books or films.
humans to provide guidance during machine learning or decision-               • The fifth level is known as the intrinsic meaning or content of
making[38].                                                                     an experience. It is about the ultimate motive of certain images
   Here I focus on the idea that human-centric AI requires above all            or texts, or why somebody is carrying out a certain behavior. It
another kind of AI, namely AI which has meaning and understanding               explains why this particular experience may have occurred.
at its core. The present paper is a position paper, trying to clarifying
this point of view and reflecting on the key issues and possible tech-        We define a narrative as a coherent reconstruction of the different
nical solutions. But first, what do we mean by meaning and under-             levels of meaning of an experience or a set of experiences based on
standing?                                                                     one or more perspectives. It contains categorised entities at each of
                                                                              these levels, links between the levels, and possibly additional cross-
2    Meaning and understanding                                                level categorisations. The perspective, which is often the perspective
                                                                              of the agent itself, is unavoidable because categories are most of the
The notion of meaning is related to how we try to understand how hu-
                                                                              time observer-dependent. For example, an object which is to my left
mans make sense of an experience. An experience can be a behavior
                                                                              is for a person opposite of me to the right. I may categorise a ges-
or the observation of a behavior, an image or a sequence of images,
                                                                              ture as aggressive whereas the person making the gesture may have
sounds, soundscapes, smells and tastes, spoken or written text, and
                                                                              performed it to defend herself. I may not know a particular historical
more generally cultural artefacts like scenes in a theatre play. In the
                                                                              figure and believe it is just the representation of an old man, whereas
real world, there is a flow of experiences that we need to interpret
                                                                              you may recognize the figure and be repulsed by the atrocities that
and cope with quickly. For example, if we are driving a car there is
                                                                              were conducted under his command. Transforming a narrative from
a quick succession of situations that we have to gauge correctly in
                                                                              one perspective into a narrative for the same experience from an-
order to act appropriately, even in unusual situations: Why is the car
                                                                              other perspective is a critical component in handling meaning. Even
behind mine honking its horn? Is the woman with a baby stroller go-
                                                                              to communicate properly in language we often have to look at the
ing to cross the street or has she seen me coming? Why is everybody
                                                                              viewpoint of the interlocutor and categorise spatial and other rela-
slowing down? What does this red light on the dashboard mean?
                                                                              tions accordingly.
   Meanings are built from categorisations of reality, for example,
                                                                                 Understanding is a process with three functions: (i) Reconstruct
colors, actions types, temporal and spatial relations, etc. Categori-
                                                                              the different levels of meaning by casting them into coherent nar-
sations are distinctions that are relevant for the interaction between
                                                                              ratives that explain the events underlying the experience, (ii) predict
humans (or agents more generally) and their environment, includ-
                                                                              how the experience will unfold in the future and reconstruct what has
ing other agents [25]. For example, the distinction between red and
                                                                              happened in the past and (iii) integrate these narratives into a Per-
green is relevant in traffic lights because it tells you whether it is safe
                                                                              sonal Dynamic Memory. A Personal Dynamic memory is an active
to start driving or cross the road. The distinction between angry and
                                                                              store of past experiences which may include partly some of the origi-
sad is relevant for knowing how to behave with respect to another
                                                                              nal data but mostly the webs of meanings and the narratives that have
person. The distinction between left and right is relevant for giving
                                                                              been constructed during the interpretation of earlier experiences. A
or following instructions how to reach a location or how to find an
                                                                              Personal Dynamic Memory is crucial for supporting the construction
object in a scene.
                                                                              of narratives of new experiences but it is today missing from existing
   Categories are the building blocks for constructing different levels
                                                                              AI systems.
of meaning for an experience, The following levels are often dis-
                                                                                 Here is a simple example to illustrate these ideas. Consider the
cussed in the appreciation of art works [18] but are actually useful
                                                                              image in Figure 1 (left). This is from a poster that used to be em-
for interpreting any kind of experience [27]:
                                                                              ployed in French and Belgian schools to teach children about daily
• The base level of an experience details the external formal prop-           life and to learn how to talk about it. We instantly recognize that this
  erties directly derivable from the perceived appearance of the ex-          is a scene from a restaurant, using cues like the dress and activities of
  perience, for example, the lines, shapes, color differences in hue,         the waiter and waitress or the fact that people are sitting at different
  value (brightness) and saturation, textures, shading, spatial posi-         tables in the room. Current image recognition algorithms would be
  tions of elements, etc. in the case of images.                              able to segment and identify some of the people and objects in the
• The first level of meaning is that of factual meaning. It identi-           scene and in some cases label them with a fair degree of accuracy,
  fies and categorises events, actors, entities and roles they play in        see Fig. 1 (right).
  events, as well as the temporal, spatial and causal relations be-              However a normal observer would see a lot more than that. For
  tween them. In the case of images they require a suite of sophis-           example, when asked whether a person is missing at the table on
  ticated processing steps, starting from object segmentation, ob-            the right, the answer would be straightforward: Yes, because there
  ject location, object recognition, 3D reconstruction, tracking over         is an empty chair, a plate and cutlery on the table section in front of
  time, etc.                                                                  the chair, and a napkin hanging over the chair. So there must have
• When there are actors involved, a second level, that of expres-             been a third person sitting there, probably the mother of the child.
  sional meaning becomes relevant. It identifies the intentions,              Moreover nobody has a lot of difficulty to imagine where she went.
There is a door marked ‘lavabo’ (meaning ‘toilet’ in French) and             also top-down, shown with the green arrows. The narrative under
it is quite plausible that she went to the toilet while waiting for the      construction is partially guiding semantic analysis and cutting down
meal to arrive. Any human viewer would furthermore guess without             combinatorial search in syntactic analysis, whereas the narratives al-
hesitation why the child is showing his plate to the waitress arriving       ready contained in the Personal Dynamic Memory are guiding the
with the food and why the person to the left of the child (from our          construction of narratives of new experiences.
perspective) is probably the father looking contently at the child. We
could go on further completing the narrative, for example, ask why
the cat at the feet of the waitress looks eagerly at the food, observe
                                                                                                     analysis            syntactic and
that the food contains chicken with potatoes, notice that it looks
                                                                                                                         semantic
quite windy outside, that the vegetation suggests some place in the
south of France, and so on.                                                                                              structures
                                                                              experiences and data

                                                                                                                           interpretation


                                                                                                      integration
                                                                                                                                 narratives
                                                                                          personal
                                                                                          dynamic memory
Fig. 1. Left. Didactic image of a scene in a restaurant. Right. Image seg-   Fig. 2. Components for tackling understanding in AI systems. Besides the in-
mentation identifying regions that contain people (based on Google’s Cloud   troduction of narratives, the main critical component is a Personal Dynamic
Vision API).                                                                 Memory which helps to build narratives to interpret a new experience. The
Clearly these interpretations rely heavily on inferences reflecting          green arrows indicate that there is strong downward information flow from
knowledge about restaurants, families, needs and desires, roles              the Personal Dynamic Memory to the interpretation process and from narra-
played by people in restaurants (waiter, waitress, bar tender, cashier,      tives to the analysis process.
customer). These inferences are not only necessary to properly
interpret the visual image in Fig. 1 but also to answer questions
such as ’Who is the waitress?’, ’Why is she approaching the table?’,         3    Current AI does not handle meaning properly
’Where is the missing person at the table?, ’Who will get food first?’,
                                                                             Before putting some more technical flesh on this architectural skele-
etc., We can also make predictions and reconstructions, for example,
                                                                             ton, I want to emphasize that current techniques and AI design
that the waitress will reach the table, put the food on the table, cut
                                                                             methodologies are not handling meaning and understanding. Current
the chicken into pieces, and put them on the different plates, or that
                                                                             techniques fall into two classes: numerical (or subsymbolic) tech-
the mother of the child will come back from the toilet, sit down
                                                                             niques and symbolic techniques with shades in between.
again at the table, and start eating herself.
                                                                                Simplifying, numerical (or subsymbolic) AI techniques translate
   Each of us has a vast Personal Dynamic Memory that stores nar-
                                                                             problems into a numerical form (real numbers and vectors) and per-
ratives based on prior experiences: from visiting restaurants, seeing
                                                                             form numerical operations over them. The numerical representations
images in pictures or movies, reading about them, etc. Our daily life
                                                                             are constructed using information-theoretic considerations, specifi-
is filled from morning to evening with activities to feed and reorgan-
                                                                             cally, their ability to help predict or complete patterns. Most neural
ise our Personal Dynamic Memories and the richer they become the
                                                                             networks fall into this class, but also other techniques like Latent
more we are able to make sense of new experiences. What is truly
                                                                             Distributional Semantics, which associates a vector representation
amazing is that by the time we reach the adult stage these memories
                                                                             known as an embedding with words, images, or actions. The em-
must already contain a massive number of facts, which are neverthe-
                                                                             beddings capture the syntactic and semantic contexts in which an
less searched at an incredibly fast rate with relevant parts of memory
                                                                             element appears and can be used to compute similarities, predict the
becoming primed and ready for use for handling novel experiences.
                                                                             next word or image, relate an image to a label, answer textual queries,
   Understanding uses information both from syntactic and semantic
                                                                             or perform many other useful subfunctions for building intelligent
parsing of the experience and from inferences based on a Personal
                                                                             applications. Embeddings are computed either by statistical methods
Dynamic Memory, in order to fill in unexpressed or un-observable
                                                                             or by using deep learning algorithms.
information, e.g. via logical reasoning and mental simulation. More-
                                                                                Importantly, and as pointed out clearly by Claude Shannon [24]
over the understanding process changes the contents of Personal Dy-
                                                                             who can be considered the father of numerical AI, information-
namic Memory, not only because the new experience, its interpre-
                                                                             theoretic representations do not try to capture meaning. For example,
tation, and links to other experiences are stored, but also because
                                                                             a word embedding captures the kinds of contexts in which a word
earlier experiences are revisited and their storage may be affected by
                                                                             may occur but this is only an indirect substitute for the real meaning
newer experiences. Memory needed for understanding is therefore
                                                                             of the word. Ignoring meaning makes it feasible to use these numer-
highly dynamic, unlike computer memory that remains unchanged
                                                                             ical techniques in circumstances where there is no representation of
once something has been stored.
                                                                             meaning available for learning or training - which is in fact almost
   This leads to the proposal for a general architecture for AI sys-
                                                                             always the case. But it leaves out a crucial aspect of (human) intelli-
tems that handle understanding depicted in Fig. 2. It shows the flow
                                                                             gence.
from experience to syntactic and semantic structures, and from there
                                                                                Thus, ‘Neural image labeling’ associates rather directly labels
towards the construction of narratives, integrated into a Personal Dy-
                                                                             with images (sometimes even using only pixel-based image repre-
namic Memory. The flow of information is not only bottom-up but
                                                                             sentations), without attempting to discern individual objects, actors,
or events, and without trying to figure out the situation underlying       an integration of numerical and symbolic techniques is a way to go
the image, the nature of the action, the motivations of the actors de-     forward so that the flexibility of pattern recognition and action se-
picted in an action, the historical setting, the reason why the image is   lection based on neurally inspired models, which gives only approx-
made, and many other aspects which human viewers spontaneously             imate answers, can be married to the precision and compositionality
come up with. ‘Neural translation’ does not try to perform a syntactic     of symbolic reasoning.
analysis using grammars and parsers nor semantic analysis using in-
terpreters building conceptual representations of what is being said.
Rather, they associate n-grams in the source language with n-grams         4   Relevant work
in the target language based on word vectors that capture statistical
                                                                           The ideas proposed here are certainly not new. For a long time it has
co-occurrences in dual (source/target) corpora.
                                                                           been commonly accepted in cognitive science that the construction
   Circumventing meaning has made the current wave of deep learn-
                                                                           of narratives is an essential ingredient of cognitive intelligence be-
ing based AI applications possible but it is also responsible for the
                                                                           cause it allows us to make sense of reality [6] [35]. Also in AI there
brittleness of image labeling, the nonsensical nature of translations,
                                                                           has been significant prior work, although mostly in the context of
failures in answering questions that fall slightly outside of the sta-
                                                                           story generation and story understanding, which are the textual man-
tistical patterns in the corpora used to train them, the success of ad-
                                                                           ifestations of internally constructed narratives [12]. We find symbolic
versarial attacks for interpreting images or texts that do not confuse
                                                                           approaches from the late nineteen-seventies onwards, such as in the
humans but throw off AI systems, the non-transparency of decision
                                                                           work of Schank and colleagues [23], Winston’s proposals for Com-
making, and many other features that human-centric AI considers
                                                                           putational Narrative Intelligence [37], or more recently the work of
undesirable.
                                                                           Gervas and his group on narrative generation [9]. There is also in-
   Intuitively a kind of hybrid or integrated AI that combines the
                                                                           creasing work at the moment using numerical approaches towards
virtues of numerical with those of symbolic AI is a possible way
                                                                           narrative intelligence [20], particularly within the context of building
out and has indeed been proposed by several researchers. Symbolic
                                                                           question-answering and dialog systems.
AI maps problems into symbols and symbolic structures and per-
                                                                              In the psychological literature there has also been extensive work
forms transformations over these symbolic structures, for example
                                                                           on personal memory, often based taking Tulving as a starting point
guided by rules of sound logical inference. This approach flourished
                                                                           [33]. He introduced the distinction between procedural (knowledge
in the 1970s and 1980s leading to expert systems built for interac-
                                                                           of skills) and declarative memory, usually divided into semantic
tively supporting experts, large-scale ontologies and domain models
                                                                           memory, which contains general factual knowledge, and episodic
as now used in the semantic web or in encyclopedic knowledge-
                                                                           memory, which refers to specific autobiographical experiences stored
graphs, computer-assisted theorem proving, constraint solvers for
                                                                           in the form of contextualized past perceptions, actions and temporal
scheduling or design, precision language processing, and much more.
                                                                           and causal structures. Schank and colleagues have made proposals
   The symbolic approach has tried, at least in principle, to get closer
                                                                           in the late 1980s on how such dynamic memories could be built[22].
to handling meaning. It has used terms like semantic information
                                                                           This has lead in the nineteen nineties to significant work on case-
processing [14], or story understanding[22], talked about AI able to
                                                                           based reasoning [2] and memory-based reasoning [26]. Much of this
take advice, rather than be programmed explicitly or trained with
                                                                           has been overshadowed by the current peak of interest in deep learn-
large data sets[13], and built sophisticated explanation facilities for
                                                                           ing, but it remains highly valuable for the aims discussed in this pa-
expert systems using deep human-comprehensible models of the do-
                                                                           per.
main and an explicit representation of the problem solving methods
                                                                              Meanwhile various important technological advances have been
being used[16].
                                                                           made in other areas that make a renewed effort towards the exper-
   Nevertheless, the symbolic approach has its own limitations with
                                                                           imentation with Personal Dynamic Memories and narratives a re-
respect to handling meaning and understanding. A key criticism, re-
                                                                           alistic prospect. Among these advances I just want to highlight the
flected in Searle’s Chinese Room argument and known as the symbol
                                                                           following:
grounding problem, is that symbolic AI operates in a world of sym-
bols with no systematic connection to the real world. To solve this        • Very large knowledge bases. One of the critical bottlenecks for
problem requires an integration of a symbolic and a numerical ap-            effective Personal Dynamical Memories is the sheer size of the
proach, because the latter starts from the (real) numbers delivered          knowledge that has to be represented and processed. If we ex-
by sensors and actuators that are directly connected to the world,           press this in terms of facts, then we must expect to handle at least
so that the categories that constitute the meaning of symbols indeed         tens of millions, if not billions. This was totally impossible two
become properly grounded. However, it is important that the ground-          decades ago but very significant progress, pushed by the devel-
ing of symbols is based on what is meaningful, i.e. relevant, to the         opment of the semantic web, has changed the situation. It is now
agent, which is different from grounding based on success in predic-         possible to represent fact-bases up to 100 billion triples using stan-
tion tasks. When agents cooperate on tasks in a shared environment,          dard knowledge representations (RDF statements and OWL) and
particularly if they have to communicate about tasks, they implic-           perform inferences over them fast enough to be used in interac-
itly have to coordinate the way the categorize reality and how these         tive applications[34]. So the issue of computational complexity
categorizations are expressed.[28]                                           for Personal Dynamic Memories can be considered to be solved.
   In addition, the transformations of symbolic structures are formal      • Robotic embodiment Another critical bottleneck is that Personal
operations, similar to a set of axioms and rules of logical inference        Dynamic Memories have to be grounded in sensori-motor experi-
as in mathematics. But the problem is that it is very hard, if not im-       ences. A few decades ago the state of the art in computer vision
possible, to define axioms exhaustively for real world open-ended            and robotics was simply not advanced enough to tackle this issue
domains due to the unavoidable exceptions, lack of knowledge, and            in any realistic way. But also here there have been tremendous
the problem of making clear-cut definitions. These problems have             advances, both in the availability of lower cost robotic hardware
been discussed widely under the title of the frame problem. Also here        including cameras and signal processing chips and in software for
  perception and action control, primarily using techniques from              tion towards achieving meaningful AI.
  deep learning. These developments in themselves do not solve
  the issue of symbol grounding but they have made it possible to
  start addressing it seriously. One example of recent work uses lan-
  guage games between embodied autonomous robots that gener-              5    The organisation of memory
  ate not only their own communication system but also an ontol-
  ogy containing the relevant distinctions in a specific domain [31].     In my opinion, the most critical bottleneck at the moment is: How
  These experiments have shown how perceptually grounded cate-            should a Personal Dynamic Memory be organised at the micro-
  gories (for example for color or size) or spatial and temporal rela-    level and what kind of basic computations (including inferencing and
  tions grounded in event recognition can emerge in populations of        learning) should be supported. Obviously a linear list of facts, possi-
  agents pushed by the task of communication. Another example is          bly represented in RDF, will not do, we need higher level structuring
  the Open-Ease framework http://www.open-ease.org/                       devices, partly for managing inferential and combinatorial complex-
  that supports the recording and storage of inhomogeneous inter-         ity, partly for dealing with the frame problem, and partly for achiev-
  pretation data from robots and human manipulation episodes so           ing fast access to the most relevant prior experiences that will help
  that they can be used to build semantically oriented tools inter-       to make sense of a new experience. What will also not work is to
  preting, analyzing, visualizing, and learning from these experi-        blindly store the vast amount of information generated by an expe-
  ence data.[4]                                                           rience, the complete sensori-motor data streams, the data from the
• Mental simulation Another bottleneck for building realistic Per-        mental simulations that are triggered, the language descriptions and
  sonal Dynamic Memories has been the role of mental simulations          their semantic interpretations, or all the facts relevant for an experi-
  of actions and situations. This is considered an essential function     ence. If eveything is stored this is not only costly from an energetic
  of memory by many psychologists, particularly for predicting how        point of view but will certainly get in the way of fast retrieval and
  a perceived situation will continue to evolve in the future[3]. This    inference.
  hypothesis has also inspired AI researchers[5] but implementa-             The cognitive science and AI literature already contains various
  tions could only explore simple isolated examples until very re-        proposals for the organisation of memory. Many of them start from
  cently. However, significant advances in virtual reality technology     Bartlett’s original idea of a schema also called a frame. It was formu-
  have now pushed the state of the art in computer graphics to allow      lated in the 1930s and revived again in the 1970s by psychologists
  a very high degree of realistic simulation even for complex world       such as David Rumelhart [21], linguists such as Charles Fillmore
  situations, thanks also to dedicated hardware (game engines who         [8], and sociologists, such as Erwin Goffman [10].
  have now reached performances of 12 terraflops) and highly opti-           A schema is a way of framing a particular situation in terms of
  mized software. This technology is already being used for cogni-        a set of entities, roles for these entities, constraints on the kind of
  tive robotics experiments in order to plan future behavior through      entities that can fill these roles, and relations between the entities
  mental simulation, complementary to classical planning based on         based on their roles. Each schema has various associated cues to rec-
  symbol manipulation, and to understand human language instruc-          ognize quickly whether it applies to the current situation. Once it is
  tions or descriptions.[19] So also for this aspect, there are promis-   triggered, a schema casts a web over the sensori-motor inputs and
  ing developments that make Personal Dynamic Memories much               facts associated with the situation and it makes us see or infer cer-
  more feasible.                                                          tain aspects of the experience more clearly at the expense of others.
• Finally there have been significant advances recently in Compu-         Schemas impose a bias and perspective on a situation and often also
  tational Construction Grammar. Most linguistic formalisms, such         an emotional reaction. They come with a lot of defaults. These are
  as Chomskyan generative grammar, remain close to the morpho-            facts which can be expected to be the case if a particular schema
  syntactic structure of a language. Construction Grammar in con-         matches well with an experience, but are not explicitly mentioned or
  trast focuses on capturing the systematic ways in which grammar         observable. Sometimes these defaults even override perception or fly
  expresses meaning [11]. It is therefore a more appropriate basis for    in the face of obvious facts.
  natural language processing for an AI approach that seeks to han-          The notion of a schema was introduced into AI by Minsky[15]
  dle meaning and understanding, particularly because Construction        who used the term frame. It lead to a variety of frame-based knowl-
  Grammarians have worked closely with cognitive semantics [32],          edge representation systems in the 1970s, which were used exten-
  an approach to semantics that seeks to understand the conceptual        sively to model the perception of complex scenes, story telling and
  patterns with which humans organise their experiences in order          story understanding, and expert reasoning. Frame-based representa-
  to make it expressable in their language. A decade ago usable           tion systems feature datastructures for representing frames, basic in-
  implementations of construction grammar and cognitive seman-            ference operations over frames, and languages and interfaces to de-
  tics were in their infancy but this has changed completely. A first     fine frames and maintain large collections of frames. Frame-based
  big effort, spearheaded by ICSI in Berkeley, developed an Em-           knowledge representation systems also support various kinds of re-
  bodied Construction Grammar[5], which not only formalized and           lations between frames, in particular subtype relations so that there
  operationalized construction grammars but also subscribed to the        could be the inheritance of information from one frame handling a
  ‘mental simulation’ approach to meaning mentioned in the pre-           broad set of experiences to another frame concerned with a more
  vious paragraphs. Another big effort, at the University of Brus-        specific situation. Another example are priming relations, so that if
  sels VUB AI Lab and the Sony Computer Science Laboratory in             one frame fits well with a situation, another frame covering a sub-
  Paris, developed Fluid Construction Grammar[29], which has now          sequent event would already be made ready for activation. Besides
  a very solid implementation and a growing user community.(see           mechanisms for handling defaults, the earlier frame-based represen-
  www.fcg.org) Given that language communication plays a ma-              tation systems also supported procedural attachment, so that proce-
  jor role in the way that human Personal Dynamic Memories get            dures like image or sound processing or robotic action in the real
  formed, this line of research provides another hopeful contribu-        world could be seamlessly integrated.
6    Conclusions                                                               [16] J. Moore and W. Swartout., Explanation in Expert Systems: A survey,
                                                                                    ISI Research Reports, ISI, Marina del Rey, Cal, 1988.
The paper argued that human-centric AI, with its implications of ex-           [17] A. Nowak, P. Lukowicz, and P. Horodecki, ‘Assessing artificial intel-
plainability, transparency, robustness, etc., is only going to be possi-            ligence for humanity: Will AI be the our biggest ever advance? or the
                                                                                    biggest threat’, IEEE Technology and Society Magazine, 37(4), 26–34,
ble when AI comes to grips with meaning and understanding. This                     (2018).
requires that we go beyond the numerical AI paradigm that is cur-              [18] E. Panofsky, Studies in Iconology. Humanistic themes in the art of the
rently dominating AI, where meaning is captured only very indirectly                Renaissance., Oxford University Press, Oxford, 1939,1972.
in embeddings and operations over embeddings, but also beyond the              [19] J. Pfau, R. Porzel, M. Pomarlan, V. Cangalovic, S. Grundpan,
symbolic paradigm, which focuses on formal operations over non-                     S. Hoefner, J. Bateman, and R. Malaka, Give MEANinGS to Robots with
                                                                                    Kitchen Clash: A VR Human Computation Serious Game for World
grounded symbols.                                                                   Knowledge Accumulation, 85–96, Entertainment Computing and Se-
   First of all we need at the very least a form of integrated or hy-               rious Games, First IFIP TC 14 Joint International Conference, ICEC-
brid AI that combines numerical and symbolic AI. But we need to go                  JCSG, IFIP, New York, 2019.
beyond both. The paper argued that a central characteristic of under-          [20] Mark O. Riedl, ‘Computational narrative intelligence: A human-
                                                                                    centered goal for artificial intelligence’, CoRR, abs/1602.06484,
standing is the ability to build a coherent narrative of an experience              (2016).
based on narratives of past experiences stored in a Personal Dynamic           [21] D. Rumelhart, Schemata: the building blocks of cognition., Theoretical
Memory, and integrate this narrative in memory. The big challenge                   Issues in Reading Comprehension, Lawrence Erlbaum, Hilssdale, New
for AI is partly technical, to solve problems of computational com-                 Jersey, 1980.
plexity to handle the very large knowledge bases and huge inferences           [22] R. Schank, Dynamic memory: A theory of reminding and learning in
                                                                                    computers and people, Cambridge University Press, Cambridge Eng,
that are required. But it is also conceptual. We need to understand                 1990.
much better how new experiences and the narratives built for them              [23] R. Schank and Abelson, Scripts, Plans, Goals, and Understanding: An
get integrated into a Personal Dynamic Memory in such a way that                    Inquiry into Human Knowledge Structures., L. Erlbaum, Hillsdale, NJ,
they get triggered again on the most relevant new experiences, and                  1977.
                                                                               [24] C. Shannon, ‘A mathematical theory of communication’, SIAM Journal
how facts or narratives that are deemed no longer relevant can be                   of Scientific and Statistical Computing, 6, 865–881, (1985).
forgotten or simply not stored in the first place.                             [25] D. Sperber and D. Wilson, Relevance: Communication and Cognition,
                                                                                    Harvard University Press, Cambridge, MA, 1986.
 Acknowledgement The author is funded by the Catalan Institute                 [26] C. Stanfill and D. Waltz, ‘Toward memory-based reasoning’, Commu-
                                                                                    nications of the ACM, 29(12), (1986).
for Advanced Studies (ICREA) embedded in the Institute for Evo-                [27] L. Steels, Perceiving the Focal Point of a Painting with AI:
lutionary Biology (UPF/CSIC) in Barcelona. This work was made                       Case Studies on works of Luc Tuymans., 12th ICAART, Springer Ver-
possible by H2020 grants within the frame of the Humane AI Flag-                    lag, Berlin, 2020.
ship preparation project and the AI4EU project.                                [28] L. Steels and T. Belpaeme, ‘Coordinating perceptually grounded cate-
                                                                                    gories through language. a case study for colour.’, Behavioral and Brain
                                                                                    Sciences, 28(4), 469–490, (2005).
REFERENCES                                                                     [29] L. Steels and K. Beuls(eds.), Case studies in Fluid Construction Gram-
                                                                                    mar, John Benjamins Pub., Amsterdam, 2019.
 [1] L. Steels, ed., ‘Computational issues in Fluid Construction Grammar.’,    [30] L. Steels and R. Lopez de Mantaras., ‘The Barcelona declaration for
     7240, (2012).                                                                  the proper development and usage of Artificial Intelligence in europe.’,
 [2] A. Aamodt and E. Plaza, ‘Case-based reasoning: Foundational issues,            AI Communications, 31(6), 485–494, (2018).
     methodological variations, and system approaches.’, AI Communica-         [31] L. Steels and M. Hild, Language grounding in robots, Springer Verlag,
     tions, 7(1), 39–52, (1994).                                                    New York, 2012.
 [3] L. Barsalou, ‘Grounded cognition.’, Annual Review of Psychology, 59,      [32] L. Talmy, Toward a cognitive semantics., MIT Press, Cambridge Ma,
     617–645, (2008).                                                               2000.
 [4] M. Beetz, M. Tenorth, and J. Winkler, Open-EASE, 1983–1990, 2015          [33] E. Tulving, ‘Precis of elements of episodic memory.’, Behavioral and
     IEEE International Conference on Robotics and Automation (ICRA),               Brain Sciences, 7, 223–268, (1984).
     IEEE., 2015.                                                              [34] J. Urbani, S. Kotoulas, J.Massen, F.van Harmelen, and H. Bal, ‘Webpie:
 [5] B. Bergen, N. Chang, and S. Narayan, Simulated action in an embod-             A web-scale parallel inference engine using mapreduce.’, First Look.
     ied construction grammar, 108–113, Proceedings of the 26th Annual              Journal of Web Semantics, 10, 59–75, (2012).
     Meeting of the Cognitive Science Society, L. Erlbaum, Mahwah, NJ,         [35] O. Vilarroya, Somos lo que nos contamos. Cómo los relatos construyen
     2004.                                                                          el mundo en que vivimos., Editorial Ariel, Barcelona, 2019.
 [6] J. Bruner, ‘The narrative construction of reality.’, Critical Inquiry,    [36] U. Von der Leyen and et al., ‘White paper on artificial intelligence’, EU
     18(1), 1–21, (1991).                                                           Commission reports, (2020).
 [7] V. Dignum, Responsible Artificial Intelligence. How to develop and use    [37] P. Winston, The Strong Story Hypothesis and the DirectedPerception
     AI in a responsible way., Springer International Publishing, 2019.             Hypothesis, 2011 AAAI Fall symposium, AAAI Press., Menlo Park
 [8] C. Fillmore, Frame Semantics., volume 10, 111–137, 1982.                       Ca, 2011.
 [9] Pablo Gervás, Eugenio Concepción, Carlos León, Gonzalo Méndez,        [38] W. Xu, ‘Toward human-centered AI: A perspective from human-
     and Pablo Delatorre, ‘The long path to narrative generation’, IBM Jour-        computer interaction..’, IX Interactions, 26(4), 42–46, (2019).
     nal of Research & Development, 63(1), 1–8, (01/2019 2019).
[10] E. Goffman, Frame Analysis. An Essay on the Organization of Experi-
     ence., Penguin Books., Harmondsworth, 1974.
[11] A. Goldberg, Constructions at Work: The Nature of Generalization in
     Language., Oxford University Press., Oxford, 2006.
[12] M. Mateas and P. Sengers (eds.), Narrative Intelligence, John Ben-
     jamins Pub., Amsterdam, 2003.
[13] J. McCarthy, Programs with common sense, Symposium on Mecha-
     nization of Thought Processes, National Physical Laboratory, Tedding-
     ton, Eng, 1958.
[14] M. Minsky, Semantic information processing, The MIT Press, Cam-
     bridge MA, 1969.
[15] M. Minsky, A framework of representing Knowledge., The Psychology
     of Computer Vision, McGraw-Hill, New York, 1975.