Personal Dynamic Memories are Necessary to Deal with Meaning and Understanding in Human-Centric AI Luc Steels1 Abstract. Human-centric AI requires not only a fundamental shift to qualities such as fairness and respect. in the way AI systems are conceived and designed but also a reori- (ii) Human-centric AI requires that a system is able to explain its rea- entation in basic research in order to figure out how AI can come soning and learning strategies so that the decisions are understand- to grips with meaning and understanding. Meanings are made up of able by humans. Only by emphasizing human understandability will distinctions to categorize and conceptualize an experience at differ- human-centric AI achieve proper explainability and transparency. ent levels, from directly observable factual meanings to expressional, (iii) Human-centric AI should not only learn by observation or the- social, conventional and intrinsic meanings. Meanings get organised orizing about reality but also by taking advice from humans, as sug- into larger-scale narratives that conceptualize experiences from a par- gested in John McCarthy’s original 1958 proposal of the Advice ticular perspective. Understanding is the process of constructing and Taker [13]. then integrating these narratives into a Personal Dynamic Memory (iv) Human-centric AI should be able to use natural communication, that stores narratives from past experiences. This memory plays a i.e. communication primarily based on human language, not only by crucial role to construct more narratives and thus works intimately mimicking language syntax but, more importantly, using the rich se- together with inferences, mental simulations, and the analysis of ex- mantics of natural languages, augmented with multi-modal commu- periences in terms of syntactic and semantic structures. nication channels. This is needed to support explainability, and ac- This paper outlines this approach to meaning and understanding countability. by clarifying what it entails, outlining technical challenges that must (v) Human-centric AI should have the capacity of self-reflection be overcome, and providing links to earlier relevant AI work as well which can be achieved by a meta-level architecture that is able to as new technical advances that could make Personal Dynamic Mem- track decision-making and intervene by catching failures and repair- ories a reality in the near future. 2 ing them. By extension, the architecture should support the construc- tion of a theory of mind of other agents, i.e. how they see the world, what their motivations and intentions are, and what knowledge they 1 What is human-centric AI? are using or lacking. Only through this capacity can AI achieve in- “Human-centric AI focuses on collaborating with humans, enhanc- telligent cooperation and adequate explicability, and learn efficiently ing human capabilities, and empowering humans to better achieve through cultural transmission. their goals.” [17]. Human-centric AI has become a focal point of (vi) Finally, human-centric AI should reflect the ethical and moral current research, particularly in Europe, where it is now the stated standards that are also expected from humans or organisations in our objective of the EU strategy recently (February 2020) issued by the society, particularly for supporting tasks that are close to human ac- European Commission. This strategy calls for AI that shows human tivity and interest. agency and oversight, technical robustness and safety, privacy and Today the dominating perspective on AI is not human-centric. It data governance, transparency, care for diversity, non-discrimination focuses primarily on achieving high predictive performance on pre- and fairness, focus on societal and environmental well-being, and ac- defined benchmarks, trying to exceed human performance so that hu- countability [36]. mans can be replaced in the task being considered. This approach is Achieving human-centric AI requires a number of changes in fo- machine-centric rather than human-centric. It emphasizes numeri- cus compared to current AI: cal (subsymbolic) techniques (from neural network research, pattern (i) Human-centric AI systems should be made aware of the goals and recognition, information retrieval, and data science), often ignoring intentions of their users and base their own goals and dialog on mean- valuable contributions from symbolic AI that are needed to achieve ings rather than on statistical patterns of past behavior only, even if explicability and robustness. statistical patterns can play a very important role, for example for Admittedly the machine-oriented focus has recently lead to a jump drastically reducing search or carrying out approximate inference. in performance on chosen benchmarks, particularly in the domain of Human goals and values should always take precedence. Respect for pattern recognition and computer vision, but unfortunately also to a human autonomy should be built into the system by design, leading kind of AI that is opaque, cannot explain or defend its decisions, is unable to take human advice, is not robust against adversarial attacks, 1 Catalan Institute for Advanced Studies ICREA - Institute for Evolutionary has no understanding of the motivations of its users, and requires vast Biology (UPF-CSIC) Barcelona Spain, email: steels@arti.vub.ac.be amounts of data and computing power. Although for a large, growing 2 Copyright 2020 for this paper by its authors. Use permitted under Creative class of applications these shortcomings are not an issue, for AI ap- Commons License Attribution 4.0 International (CC BY 4.0) plications that touch on human lives and are socially consequential, goals, interests, and motivations of the actors and their psycho- these disadvantages are highly problematic. logical states or the manner in which they carry out actions. Different approaches to human-centric AI have been proposed • The next level is that of social meaning. It is about the social rela- recently. They are all valuable. Some researchers have advocated tions between the actors and how the activities are integrated into guidelines and design methodologies to make AI more trust-worthy the local community or the society as a whole. and responsible by emphasizing safety, privacy, data governance, • The fourth level is that of conventional meaning, based on figuring transparency, diversity, fairness, and accountability [30], [7]. Oth- out what is depicted or spoken about and the historical or cultural ers have emphasized that we need more human-centric interfaces context, which has to be learned from conversations or cultural for AI systems, including better explanation facilities and ways for artefacts, like books or films. humans to provide guidance during machine learning or decision- • The fifth level is known as the intrinsic meaning or content of making[38]. an experience. It is about the ultimate motive of certain images Here I focus on the idea that human-centric AI requires above all or texts, or why somebody is carrying out a certain behavior. It another kind of AI, namely AI which has meaning and understanding explains why this particular experience may have occurred. at its core. The present paper is a position paper, trying to clarifying this point of view and reflecting on the key issues and possible tech- We define a narrative as a coherent reconstruction of the different nical solutions. But first, what do we mean by meaning and under- levels of meaning of an experience or a set of experiences based on standing? one or more perspectives. It contains categorised entities at each of these levels, links between the levels, and possibly additional cross- 2 Meaning and understanding level categorisations. The perspective, which is often the perspective of the agent itself, is unavoidable because categories are most of the The notion of meaning is related to how we try to understand how hu- time observer-dependent. For example, an object which is to my left mans make sense of an experience. An experience can be a behavior is for a person opposite of me to the right. I may categorise a ges- or the observation of a behavior, an image or a sequence of images, ture as aggressive whereas the person making the gesture may have sounds, soundscapes, smells and tastes, spoken or written text, and performed it to defend herself. I may not know a particular historical more generally cultural artefacts like scenes in a theatre play. In the figure and believe it is just the representation of an old man, whereas real world, there is a flow of experiences that we need to interpret you may recognize the figure and be repulsed by the atrocities that and cope with quickly. For example, if we are driving a car there is were conducted under his command. Transforming a narrative from a quick succession of situations that we have to gauge correctly in one perspective into a narrative for the same experience from an- order to act appropriately, even in unusual situations: Why is the car other perspective is a critical component in handling meaning. Even behind mine honking its horn? Is the woman with a baby stroller go- to communicate properly in language we often have to look at the ing to cross the street or has she seen me coming? Why is everybody viewpoint of the interlocutor and categorise spatial and other rela- slowing down? What does this red light on the dashboard mean? tions accordingly. Meanings are built from categorisations of reality, for example, Understanding is a process with three functions: (i) Reconstruct colors, actions types, temporal and spatial relations, etc. Categori- the different levels of meaning by casting them into coherent nar- sations are distinctions that are relevant for the interaction between ratives that explain the events underlying the experience, (ii) predict humans (or agents more generally) and their environment, includ- how the experience will unfold in the future and reconstruct what has ing other agents [25]. For example, the distinction between red and happened in the past and (iii) integrate these narratives into a Per- green is relevant in traffic lights because it tells you whether it is safe sonal Dynamic Memory. A Personal Dynamic memory is an active to start driving or cross the road. The distinction between angry and store of past experiences which may include partly some of the origi- sad is relevant for knowing how to behave with respect to another nal data but mostly the webs of meanings and the narratives that have person. The distinction between left and right is relevant for giving been constructed during the interpretation of earlier experiences. A or following instructions how to reach a location or how to find an Personal Dynamic Memory is crucial for supporting the construction object in a scene. of narratives of new experiences but it is today missing from existing Categories are the building blocks for constructing different levels AI systems. of meaning for an experience, The following levels are often dis- Here is a simple example to illustrate these ideas. Consider the cussed in the appreciation of art works [18] but are actually useful image in Figure 1 (left). This is from a poster that used to be em- for interpreting any kind of experience [27]: ployed in French and Belgian schools to teach children about daily • The base level of an experience details the external formal prop- life and to learn how to talk about it. We instantly recognize that this erties directly derivable from the perceived appearance of the ex- is a scene from a restaurant, using cues like the dress and activities of perience, for example, the lines, shapes, color differences in hue, the waiter and waitress or the fact that people are sitting at different value (brightness) and saturation, textures, shading, spatial posi- tables in the room. Current image recognition algorithms would be tions of elements, etc. in the case of images. able to segment and identify some of the people and objects in the • The first level of meaning is that of factual meaning. It identi- scene and in some cases label them with a fair degree of accuracy, fies and categorises events, actors, entities and roles they play in see Fig. 1 (right). events, as well as the temporal, spatial and causal relations be- However a normal observer would see a lot more than that. For tween them. In the case of images they require a suite of sophis- example, when asked whether a person is missing at the table on ticated processing steps, starting from object segmentation, ob- the right, the answer would be straightforward: Yes, because there ject location, object recognition, 3D reconstruction, tracking over is an empty chair, a plate and cutlery on the table section in front of time, etc. the chair, and a napkin hanging over the chair. So there must have • When there are actors involved, a second level, that of expres- been a third person sitting there, probably the mother of the child. sional meaning becomes relevant. It identifies the intentions, Moreover nobody has a lot of difficulty to imagine where she went. There is a door marked ‘lavabo’ (meaning ‘toilet’ in French) and also top-down, shown with the green arrows. The narrative under it is quite plausible that she went to the toilet while waiting for the construction is partially guiding semantic analysis and cutting down meal to arrive. Any human viewer would furthermore guess without combinatorial search in syntactic analysis, whereas the narratives al- hesitation why the child is showing his plate to the waitress arriving ready contained in the Personal Dynamic Memory are guiding the with the food and why the person to the left of the child (from our construction of narratives of new experiences. perspective) is probably the father looking contently at the child. We could go on further completing the narrative, for example, ask why the cat at the feet of the waitress looks eagerly at the food, observe analysis syntactic and that the food contains chicken with potatoes, notice that it looks semantic quite windy outside, that the vegetation suggests some place in the south of France, and so on. structures experiences and data interpretation integration narratives personal dynamic memory Fig. 1. Left. Didactic image of a scene in a restaurant. Right. Image seg- Fig. 2. Components for tackling understanding in AI systems. Besides the in- mentation identifying regions that contain people (based on Google’s Cloud troduction of narratives, the main critical component is a Personal Dynamic Vision API). Memory which helps to build narratives to interpret a new experience. The Clearly these interpretations rely heavily on inferences reflecting green arrows indicate that there is strong downward information flow from knowledge about restaurants, families, needs and desires, roles the Personal Dynamic Memory to the interpretation process and from narra- played by people in restaurants (waiter, waitress, bar tender, cashier, tives to the analysis process. customer). These inferences are not only necessary to properly interpret the visual image in Fig. 1 but also to answer questions such as ’Who is the waitress?’, ’Why is she approaching the table?’, 3 Current AI does not handle meaning properly ’Where is the missing person at the table?, ’Who will get food first?’, Before putting some more technical flesh on this architectural skele- etc., We can also make predictions and reconstructions, for example, ton, I want to emphasize that current techniques and AI design that the waitress will reach the table, put the food on the table, cut methodologies are not handling meaning and understanding. Current the chicken into pieces, and put them on the different plates, or that techniques fall into two classes: numerical (or subsymbolic) tech- the mother of the child will come back from the toilet, sit down niques and symbolic techniques with shades in between. again at the table, and start eating herself. Simplifying, numerical (or subsymbolic) AI techniques translate Each of us has a vast Personal Dynamic Memory that stores nar- problems into a numerical form (real numbers and vectors) and per- ratives based on prior experiences: from visiting restaurants, seeing form numerical operations over them. The numerical representations images in pictures or movies, reading about them, etc. Our daily life are constructed using information-theoretic considerations, specifi- is filled from morning to evening with activities to feed and reorgan- cally, their ability to help predict or complete patterns. Most neural ise our Personal Dynamic Memories and the richer they become the networks fall into this class, but also other techniques like Latent more we are able to make sense of new experiences. What is truly Distributional Semantics, which associates a vector representation amazing is that by the time we reach the adult stage these memories known as an embedding with words, images, or actions. The em- must already contain a massive number of facts, which are neverthe- beddings capture the syntactic and semantic contexts in which an less searched at an incredibly fast rate with relevant parts of memory element appears and can be used to compute similarities, predict the becoming primed and ready for use for handling novel experiences. next word or image, relate an image to a label, answer textual queries, Understanding uses information both from syntactic and semantic or perform many other useful subfunctions for building intelligent parsing of the experience and from inferences based on a Personal applications. Embeddings are computed either by statistical methods Dynamic Memory, in order to fill in unexpressed or un-observable or by using deep learning algorithms. information, e.g. via logical reasoning and mental simulation. More- Importantly, and as pointed out clearly by Claude Shannon [24] over the understanding process changes the contents of Personal Dy- who can be considered the father of numerical AI, information- namic Memory, not only because the new experience, its interpre- theoretic representations do not try to capture meaning. For example, tation, and links to other experiences are stored, but also because a word embedding captures the kinds of contexts in which a word earlier experiences are revisited and their storage may be affected by may occur but this is only an indirect substitute for the real meaning newer experiences. Memory needed for understanding is therefore of the word. Ignoring meaning makes it feasible to use these numer- highly dynamic, unlike computer memory that remains unchanged ical techniques in circumstances where there is no representation of once something has been stored. meaning available for learning or training - which is in fact almost This leads to the proposal for a general architecture for AI sys- always the case. But it leaves out a crucial aspect of (human) intelli- tems that handle understanding depicted in Fig. 2. It shows the flow gence. from experience to syntactic and semantic structures, and from there Thus, ‘Neural image labeling’ associates rather directly labels towards the construction of narratives, integrated into a Personal Dy- with images (sometimes even using only pixel-based image repre- namic Memory. The flow of information is not only bottom-up but sentations), without attempting to discern individual objects, actors, or events, and without trying to figure out the situation underlying an integration of numerical and symbolic techniques is a way to go the image, the nature of the action, the motivations of the actors de- forward so that the flexibility of pattern recognition and action se- picted in an action, the historical setting, the reason why the image is lection based on neurally inspired models, which gives only approx- made, and many other aspects which human viewers spontaneously imate answers, can be married to the precision and compositionality come up with. ‘Neural translation’ does not try to perform a syntactic of symbolic reasoning. analysis using grammars and parsers nor semantic analysis using in- terpreters building conceptual representations of what is being said. Rather, they associate n-grams in the source language with n-grams 4 Relevant work in the target language based on word vectors that capture statistical The ideas proposed here are certainly not new. For a long time it has co-occurrences in dual (source/target) corpora. been commonly accepted in cognitive science that the construction Circumventing meaning has made the current wave of deep learn- of narratives is an essential ingredient of cognitive intelligence be- ing based AI applications possible but it is also responsible for the cause it allows us to make sense of reality [6] [35]. Also in AI there brittleness of image labeling, the nonsensical nature of translations, has been significant prior work, although mostly in the context of failures in answering questions that fall slightly outside of the sta- story generation and story understanding, which are the textual man- tistical patterns in the corpora used to train them, the success of ad- ifestations of internally constructed narratives [12]. We find symbolic versarial attacks for interpreting images or texts that do not confuse approaches from the late nineteen-seventies onwards, such as in the humans but throw off AI systems, the non-transparency of decision work of Schank and colleagues [23], Winston’s proposals for Com- making, and many other features that human-centric AI considers putational Narrative Intelligence [37], or more recently the work of undesirable. Gervas and his group on narrative generation [9]. There is also in- Intuitively a kind of hybrid or integrated AI that combines the creasing work at the moment using numerical approaches towards virtues of numerical with those of symbolic AI is a possible way narrative intelligence [20], particularly within the context of building out and has indeed been proposed by several researchers. Symbolic question-answering and dialog systems. AI maps problems into symbols and symbolic structures and per- In the psychological literature there has also been extensive work forms transformations over these symbolic structures, for example on personal memory, often based taking Tulving as a starting point guided by rules of sound logical inference. This approach flourished [33]. He introduced the distinction between procedural (knowledge in the 1970s and 1980s leading to expert systems built for interac- of skills) and declarative memory, usually divided into semantic tively supporting experts, large-scale ontologies and domain models memory, which contains general factual knowledge, and episodic as now used in the semantic web or in encyclopedic knowledge- memory, which refers to specific autobiographical experiences stored graphs, computer-assisted theorem proving, constraint solvers for in the form of contextualized past perceptions, actions and temporal scheduling or design, precision language processing, and much more. and causal structures. Schank and colleagues have made proposals The symbolic approach has tried, at least in principle, to get closer in the late 1980s on how such dynamic memories could be built[22]. to handling meaning. It has used terms like semantic information This has lead in the nineteen nineties to significant work on case- processing [14], or story understanding[22], talked about AI able to based reasoning [2] and memory-based reasoning [26]. Much of this take advice, rather than be programmed explicitly or trained with has been overshadowed by the current peak of interest in deep learn- large data sets[13], and built sophisticated explanation facilities for ing, but it remains highly valuable for the aims discussed in this pa- expert systems using deep human-comprehensible models of the do- per. main and an explicit representation of the problem solving methods Meanwhile various important technological advances have been being used[16]. made in other areas that make a renewed effort towards the exper- Nevertheless, the symbolic approach has its own limitations with imentation with Personal Dynamic Memories and narratives a re- respect to handling meaning and understanding. A key criticism, re- alistic prospect. Among these advances I just want to highlight the flected in Searle’s Chinese Room argument and known as the symbol following: grounding problem, is that symbolic AI operates in a world of sym- bols with no systematic connection to the real world. To solve this • Very large knowledge bases. One of the critical bottlenecks for problem requires an integration of a symbolic and a numerical ap- effective Personal Dynamical Memories is the sheer size of the proach, because the latter starts from the (real) numbers delivered knowledge that has to be represented and processed. If we ex- by sensors and actuators that are directly connected to the world, press this in terms of facts, then we must expect to handle at least so that the categories that constitute the meaning of symbols indeed tens of millions, if not billions. This was totally impossible two become properly grounded. However, it is important that the ground- decades ago but very significant progress, pushed by the devel- ing of symbols is based on what is meaningful, i.e. relevant, to the opment of the semantic web, has changed the situation. It is now agent, which is different from grounding based on success in predic- possible to represent fact-bases up to 100 billion triples using stan- tion tasks. When agents cooperate on tasks in a shared environment, dard knowledge representations (RDF statements and OWL) and particularly if they have to communicate about tasks, they implic- perform inferences over them fast enough to be used in interac- itly have to coordinate the way the categorize reality and how these tive applications[34]. So the issue of computational complexity categorizations are expressed.[28] for Personal Dynamic Memories can be considered to be solved. In addition, the transformations of symbolic structures are formal • Robotic embodiment Another critical bottleneck is that Personal operations, similar to a set of axioms and rules of logical inference Dynamic Memories have to be grounded in sensori-motor experi- as in mathematics. But the problem is that it is very hard, if not im- ences. A few decades ago the state of the art in computer vision possible, to define axioms exhaustively for real world open-ended and robotics was simply not advanced enough to tackle this issue domains due to the unavoidable exceptions, lack of knowledge, and in any realistic way. But also here there have been tremendous the problem of making clear-cut definitions. These problems have advances, both in the availability of lower cost robotic hardware been discussed widely under the title of the frame problem. Also here including cameras and signal processing chips and in software for perception and action control, primarily using techniques from tion towards achieving meaningful AI. deep learning. These developments in themselves do not solve the issue of symbol grounding but they have made it possible to start addressing it seriously. One example of recent work uses lan- guage games between embodied autonomous robots that gener- 5 The organisation of memory ate not only their own communication system but also an ontol- ogy containing the relevant distinctions in a specific domain [31]. In my opinion, the most critical bottleneck at the moment is: How These experiments have shown how perceptually grounded cate- should a Personal Dynamic Memory be organised at the micro- gories (for example for color or size) or spatial and temporal rela- level and what kind of basic computations (including inferencing and tions grounded in event recognition can emerge in populations of learning) should be supported. Obviously a linear list of facts, possi- agents pushed by the task of communication. Another example is bly represented in RDF, will not do, we need higher level structuring the Open-Ease framework http://www.open-ease.org/ devices, partly for managing inferential and combinatorial complex- that supports the recording and storage of inhomogeneous inter- ity, partly for dealing with the frame problem, and partly for achiev- pretation data from robots and human manipulation episodes so ing fast access to the most relevant prior experiences that will help that they can be used to build semantically oriented tools inter- to make sense of a new experience. What will also not work is to preting, analyzing, visualizing, and learning from these experi- blindly store the vast amount of information generated by an expe- ence data.[4] rience, the complete sensori-motor data streams, the data from the • Mental simulation Another bottleneck for building realistic Per- mental simulations that are triggered, the language descriptions and sonal Dynamic Memories has been the role of mental simulations their semantic interpretations, or all the facts relevant for an experi- of actions and situations. This is considered an essential function ence. If eveything is stored this is not only costly from an energetic of memory by many psychologists, particularly for predicting how point of view but will certainly get in the way of fast retrieval and a perceived situation will continue to evolve in the future[3]. This inference. hypothesis has also inspired AI researchers[5] but implementa- The cognitive science and AI literature already contains various tions could only explore simple isolated examples until very re- proposals for the organisation of memory. Many of them start from cently. However, significant advances in virtual reality technology Bartlett’s original idea of a schema also called a frame. It was formu- have now pushed the state of the art in computer graphics to allow lated in the 1930s and revived again in the 1970s by psychologists a very high degree of realistic simulation even for complex world such as David Rumelhart [21], linguists such as Charles Fillmore situations, thanks also to dedicated hardware (game engines who [8], and sociologists, such as Erwin Goffman [10]. have now reached performances of 12 terraflops) and highly opti- A schema is a way of framing a particular situation in terms of mized software. This technology is already being used for cogni- a set of entities, roles for these entities, constraints on the kind of tive robotics experiments in order to plan future behavior through entities that can fill these roles, and relations between the entities mental simulation, complementary to classical planning based on based on their roles. Each schema has various associated cues to rec- symbol manipulation, and to understand human language instruc- ognize quickly whether it applies to the current situation. Once it is tions or descriptions.[19] So also for this aspect, there are promis- triggered, a schema casts a web over the sensori-motor inputs and ing developments that make Personal Dynamic Memories much facts associated with the situation and it makes us see or infer cer- more feasible. tain aspects of the experience more clearly at the expense of others. • Finally there have been significant advances recently in Compu- Schemas impose a bias and perspective on a situation and often also tational Construction Grammar. Most linguistic formalisms, such an emotional reaction. They come with a lot of defaults. These are as Chomskyan generative grammar, remain close to the morpho- facts which can be expected to be the case if a particular schema syntactic structure of a language. Construction Grammar in con- matches well with an experience, but are not explicitly mentioned or trast focuses on capturing the systematic ways in which grammar observable. Sometimes these defaults even override perception or fly expresses meaning [11]. It is therefore a more appropriate basis for in the face of obvious facts. natural language processing for an AI approach that seeks to han- The notion of a schema was introduced into AI by Minsky[15] dle meaning and understanding, particularly because Construction who used the term frame. It lead to a variety of frame-based knowl- Grammarians have worked closely with cognitive semantics [32], edge representation systems in the 1970s, which were used exten- an approach to semantics that seeks to understand the conceptual sively to model the perception of complex scenes, story telling and patterns with which humans organise their experiences in order story understanding, and expert reasoning. Frame-based representa- to make it expressable in their language. A decade ago usable tion systems feature datastructures for representing frames, basic in- implementations of construction grammar and cognitive seman- ference operations over frames, and languages and interfaces to de- tics were in their infancy but this has changed completely. A first fine frames and maintain large collections of frames. Frame-based big effort, spearheaded by ICSI in Berkeley, developed an Em- knowledge representation systems also support various kinds of re- bodied Construction Grammar[5], which not only formalized and lations between frames, in particular subtype relations so that there operationalized construction grammars but also subscribed to the could be the inheritance of information from one frame handling a ‘mental simulation’ approach to meaning mentioned in the pre- broad set of experiences to another frame concerned with a more vious paragraphs. Another big effort, at the University of Brus- specific situation. Another example are priming relations, so that if sels VUB AI Lab and the Sony Computer Science Laboratory in one frame fits well with a situation, another frame covering a sub- Paris, developed Fluid Construction Grammar[29], which has now sequent event would already be made ready for activation. Besides a very solid implementation and a growing user community.(see mechanisms for handling defaults, the earlier frame-based represen- www.fcg.org) Given that language communication plays a ma- tation systems also supported procedural attachment, so that proce- jor role in the way that human Personal Dynamic Memories get dures like image or sound processing or robotic action in the real formed, this line of research provides another hopeful contribu- world could be seamlessly integrated. 6 Conclusions [16] J. Moore and W. Swartout., Explanation in Expert Systems: A survey, ISI Research Reports, ISI, Marina del Rey, Cal, 1988. The paper argued that human-centric AI, with its implications of ex- [17] A. Nowak, P. Lukowicz, and P. Horodecki, ‘Assessing artificial intel- plainability, transparency, robustness, etc., is only going to be possi- ligence for humanity: Will AI be the our biggest ever advance? or the biggest threat’, IEEE Technology and Society Magazine, 37(4), 26–34, ble when AI comes to grips with meaning and understanding. This (2018). requires that we go beyond the numerical AI paradigm that is cur- [18] E. Panofsky, Studies in Iconology. Humanistic themes in the art of the rently dominating AI, where meaning is captured only very indirectly Renaissance., Oxford University Press, Oxford, 1939,1972. in embeddings and operations over embeddings, but also beyond the [19] J. Pfau, R. Porzel, M. Pomarlan, V. Cangalovic, S. Grundpan, symbolic paradigm, which focuses on formal operations over non- S. Hoefner, J. Bateman, and R. Malaka, Give MEANinGS to Robots with Kitchen Clash: A VR Human Computation Serious Game for World grounded symbols. Knowledge Accumulation, 85–96, Entertainment Computing and Se- First of all we need at the very least a form of integrated or hy- rious Games, First IFIP TC 14 Joint International Conference, ICEC- brid AI that combines numerical and symbolic AI. But we need to go JCSG, IFIP, New York, 2019. beyond both. The paper argued that a central characteristic of under- [20] Mark O. Riedl, ‘Computational narrative intelligence: A human- centered goal for artificial intelligence’, CoRR, abs/1602.06484, standing is the ability to build a coherent narrative of an experience (2016). based on narratives of past experiences stored in a Personal Dynamic [21] D. Rumelhart, Schemata: the building blocks of cognition., Theoretical Memory, and integrate this narrative in memory. The big challenge Issues in Reading Comprehension, Lawrence Erlbaum, Hilssdale, New for AI is partly technical, to solve problems of computational com- Jersey, 1980. plexity to handle the very large knowledge bases and huge inferences [22] R. Schank, Dynamic memory: A theory of reminding and learning in computers and people, Cambridge University Press, Cambridge Eng, that are required. But it is also conceptual. We need to understand 1990. much better how new experiences and the narratives built for them [23] R. Schank and Abelson, Scripts, Plans, Goals, and Understanding: An get integrated into a Personal Dynamic Memory in such a way that Inquiry into Human Knowledge Structures., L. Erlbaum, Hillsdale, NJ, they get triggered again on the most relevant new experiences, and 1977. [24] C. Shannon, ‘A mathematical theory of communication’, SIAM Journal how facts or narratives that are deemed no longer relevant can be of Scientific and Statistical Computing, 6, 865–881, (1985). forgotten or simply not stored in the first place. [25] D. Sperber and D. Wilson, Relevance: Communication and Cognition, Harvard University Press, Cambridge, MA, 1986. Acknowledgement The author is funded by the Catalan Institute [26] C. Stanfill and D. Waltz, ‘Toward memory-based reasoning’, Commu- nications of the ACM, 29(12), (1986). for Advanced Studies (ICREA) embedded in the Institute for Evo- [27] L. Steels, Perceiving the Focal Point of a Painting with AI: lutionary Biology (UPF/CSIC) in Barcelona. This work was made Case Studies on works of Luc Tuymans., 12th ICAART, Springer Ver- possible by H2020 grants within the frame of the Humane AI Flag- lag, Berlin, 2020. ship preparation project and the AI4EU project. [28] L. Steels and T. Belpaeme, ‘Coordinating perceptually grounded cate- gories through language. a case study for colour.’, Behavioral and Brain Sciences, 28(4), 469–490, (2005). REFERENCES [29] L. Steels and K. Beuls(eds.), Case studies in Fluid Construction Gram- mar, John Benjamins Pub., Amsterdam, 2019. [1] L. Steels, ed., ‘Computational issues in Fluid Construction Grammar.’, [30] L. Steels and R. Lopez de Mantaras., ‘The Barcelona declaration for 7240, (2012). the proper development and usage of Artificial Intelligence in europe.’, [2] A. Aamodt and E. Plaza, ‘Case-based reasoning: Foundational issues, AI Communications, 31(6), 485–494, (2018). methodological variations, and system approaches.’, AI Communica- [31] L. Steels and M. Hild, Language grounding in robots, Springer Verlag, tions, 7(1), 39–52, (1994). New York, 2012. [3] L. Barsalou, ‘Grounded cognition.’, Annual Review of Psychology, 59, [32] L. Talmy, Toward a cognitive semantics., MIT Press, Cambridge Ma, 617–645, (2008). 2000. [4] M. Beetz, M. Tenorth, and J. Winkler, Open-EASE, 1983–1990, 2015 [33] E. Tulving, ‘Precis of elements of episodic memory.’, Behavioral and IEEE International Conference on Robotics and Automation (ICRA), Brain Sciences, 7, 223–268, (1984). IEEE., 2015. [34] J. Urbani, S. Kotoulas, J.Massen, F.van Harmelen, and H. Bal, ‘Webpie: [5] B. Bergen, N. Chang, and S. Narayan, Simulated action in an embod- A web-scale parallel inference engine using mapreduce.’, First Look. ied construction grammar, 108–113, Proceedings of the 26th Annual Journal of Web Semantics, 10, 59–75, (2012). Meeting of the Cognitive Science Society, L. Erlbaum, Mahwah, NJ, [35] O. Vilarroya, Somos lo que nos contamos. Cómo los relatos construyen 2004. el mundo en que vivimos., Editorial Ariel, Barcelona, 2019. [6] J. Bruner, ‘The narrative construction of reality.’, Critical Inquiry, [36] U. Von der Leyen and et al., ‘White paper on artificial intelligence’, EU 18(1), 1–21, (1991). Commission reports, (2020). [7] V. Dignum, Responsible Artificial Intelligence. How to develop and use [37] P. Winston, The Strong Story Hypothesis and the DirectedPerception AI in a responsible way., Springer International Publishing, 2019. Hypothesis, 2011 AAAI Fall symposium, AAAI Press., Menlo Park [8] C. Fillmore, Frame Semantics., volume 10, 111–137, 1982. Ca, 2011. [9] Pablo Gervás, Eugenio Concepción, Carlos León, Gonzalo Méndez, [38] W. Xu, ‘Toward human-centered AI: A perspective from human- and Pablo Delatorre, ‘The long path to narrative generation’, IBM Jour- computer interaction..’, IX Interactions, 26(4), 42–46, (2019). nal of Research & Development, 63(1), 1–8, (01/2019 2019). [10] E. Goffman, Frame Analysis. An Essay on the Organization of Experi- ence., Penguin Books., Harmondsworth, 1974. [11] A. Goldberg, Constructions at Work: The Nature of Generalization in Language., Oxford University Press., Oxford, 2006. [12] M. Mateas and P. Sengers (eds.), Narrative Intelligence, John Ben- jamins Pub., Amsterdam, 2003. [13] J. McCarthy, Programs with common sense, Symposium on Mecha- nization of Thought Processes, National Physical Laboratory, Tedding- ton, Eng, 1958. [14] M. Minsky, Semantic information processing, The MIT Press, Cam- bridge MA, 1969. [15] M. Minsky, A framework of representing Knowledge., The Psychology of Computer Vision, McGraw-Hill, New York, 1975.