Embodiment of an Agent by a Pepper Robot for Explaining Retrieval Results Simon Schiff1,* , Magnus Bender1 and Ralf MΓΆller1 1 University of LΓΌbeck, Institute of Information Systems, Ratzeburger Allee 160, 23562 LΓΌbeck, Germany Abstract Conceptually, an agent perceives its environment through sensors, builds a set of models, and then uses these models to select an appropriate action to fulfill its goals. As long as an agent is embodied by a robot, even humans that are not familiar with the concept of an agent, are more likely aware of the presence of an individual, independent of how the agent maps state sequences to actions, than if an agent is part of a web application. In the latter, agents are sometimes visualized as an animation, such as Clippy by Microsoft. Thus, depending on the context, it is often explicitly desired, that humans are aware of an individual, while they interact with a system. Our aim is to demonstrate the prototype of our information retrieval (IR) agent, running in the background of our information system (IS), implemented for humanities scholars. Instead of animating our IR agent, we embodied it by a Pepper robot for demonstration purposes only. Pepper is a humanoid robot especially designed for the interaction with humans, as he has among others a speech-to-text and text-to-speech module allowing for a verbal conversation between a human and the robot. We tested our approach with humans of which not everyone was familiar with the concept of an IR agent. During the interaction with our IS, Pepper explains, as the IR agent, his behavior. The embodiment of our IR agent, using Pepper, helps to understand the concept of an IR agent and that it is running in the background of our IS, without explaining that explicitly. Keywords Agent, Robot, Information Retrieval, Demonstration, Curated Datasets, Information System 1. Introduction An agent in pursuit of a task perceives its environment through sensors, builds a set of models, and then uses these models to select an appropriate action to fulfill its goals [1]. It is perceived as being intelligent depending on which actions are selected, given the current state of its environment and its goals, regardless of which (artificial intelligence (AI)) methods are in use to map state sequences to actions. One of these goals could be for instance to satisfy the information need of a human. In this case, an IR agent, that has access to a large corpus of documents receives a query and its goal is to assign to each document in its corpus a score, given the query. Top 𝑛 highest scored documents are returned to the human in descending 2nd Workshop on Humanities-Centred Artificial Intelligence (CHAI) * Corresponding author. $ schiff@ifis.uni-luebeck.de (S. Schiff); bender@ifis.uni-luebeck.de (M. Bender); moeller@ifis.uni-luebeck.de (R. MΓΆller) Β€ https://www.ifis.uni-luebeck.de/index.php (S. Schiff); https://www.ifis.uni-luebeck.de/index.php (M. Bender); https://www.ifis.uni-luebeck.de/index.php (R. MΓΆller)  0000-0002-1986-3119 (S. Schiff); 0000-0002-1854-225X (M. Bender); 0000-0002-1174-3323 (R. MΓΆller) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) order. Assuming, the query and the documents are sequences of words, then functions such as TF.IDF, assigning a score to each query, document pair, have been shown to be effective in practice [2, 3]. However, such an IR agent, is not necessarily perceived as being intelligent. For instance, given a corpus of fantasy novels and the query β€œbike repair shop”, an IR agent would return documents that are the most relevant ones with respect to the query, but not with respect to the information need of a human, that has a bike with a flat tire. The IR agent should approximate the true information need of the human from the query and the expectations the human has about the IR agent itself. The human expects to retrieve at least a document from the IR agent, containing a list of bike repair shops, of which the agent is unable to return. If the IR agent is able to identify the gap between the expectations of the human and its ability to satisfy its information need, then it can select an appropriate action not only given its goal to chose the most relevant documents, given the query. Additionally, it can act legible by explaining its behavior, if the gap is too large. That is a step towards to gain trust by the human and thus to be perceived as being intelligent. In this work, we implemented and evaluated our IR agent, as an extension to our IS, implemented as a web application [4]. The web application enables humanities scholars to upload Word documents for the creation of curated datasets. Uploaded Word documents are parsed, preprocessed, and depending on the context, split into several documents. For instance, a Word document containing hundreds of poems is split into a corpus of documents, where each document contains one poem. We have created for various types of documents, such as poems, viewer to view the contents of the documents at the web application. Additionally, links are created automatically, that help to jump between, for instance, words in poems and their corresponding entries in a dictionary. Our IR agent, part of the web application, not only ranks uploaded documents, given a query, by relevance in descending order, it additionally returns for each document and score an explanation, in order to act legible. A web application, providing a search interface, is usually accessed by a human, using a tablet, smartphone, laptop, etc. does not expect to interact with an IR agent. Our aim is to evaluate our IR agent with real humans, who know that there is an IR agent, acting in the background. That can be solved by either explaining the concept of an IR agent, as we do in this paper or to embody our IR agent with a robot, having a text-to-speech module. The robot verbally explains its behavior on demand and humans are aware of that there is an IR agent running in the background, which changes their expectations and perceptions, while using our IS. We evaluate our approach, by using a Pepper robot [5]. As has been shown, the Pepper robot is a very effective tool to show to others what happens in the background of our IS system, without explaining explicitly the concept of an IR agent. We introduce in Section 2 our IS that we extend with our IR agent we present in Section 3. In Section 4 we show how we use a Pepper robot to present our work to an audience, where some never heard of an IR agent before. Finally, we present related work in Section 6, conclude our results in Section 7, and give an outlook for future research directions. 2. Web Application Humanities scholars work with specific tools and document formats across chronological and geographical borders to reach their goals. For instance, the goal is to produce a critical edition, from a large collection of palm-leave manuscripts and editions, such as [6] created by Eva Wilden. A critical edition contains the trajectory a text made through various manuscript and print versions into the modern days. Producing a critical edition can take up to several years and often many humanities scholars are involved. Regardless of preferred document formats and tools in use, a finished critical edition is mostly published as a printed book or online as a PDF. We argue that this violates the FAIR (Findable, Accessible, Interoperable, Reuse) principles. Findable is often not a problem at all since published books have mostly associated metadata to be findable, by humans and machines. However, the contents of a critical edition are possibly not searchable and require a faceted IR system. Accessibility does not only account for of how the data is accessible, additionally it is important to make clear who is allowed to access what. For instance, not everyone is allowed to access some pictures of manuscripts in the printed books, but everything else. Only those who are allowed to see the images, are allowed to access the printed critical edition, as the images are inseparable from the rest of the book. A printed book or a PDF is made for humans to be readable and not to be interoperable with other programs except those that visualize or print the contents of a PDF. Finally, metadata should be well-described, such that other programs can reuse the associated data. A critical edition that does not violate the FAIR principles allows for faceted searches, automatic linking, access control, and the transformation of contents into various formats. However, humanities scholars prefer to use what you see is what you get (WYSIWYG) tools such as Microsoft Word, as they see always the current state of the book. Our web application allows humanities scholars to still work with their preferred tools, such as Microsoft Word, and document formats across chronological and geographical borders, and yet to produce data that does not violate the FAIR principles. Word documents can be uploaded at our web application, specific parts that are written in a specific controlled natural language, are parsed, split, and loaded into a database. The parser is automatically generated from an Antlr4 [7] grammar, allowing to be adapted easily to other types of documents. Viewer, part of the web application, are used in lectures for the visualization of the contents of the database. Additionally, one can merge specific parts of documents automatically on demand, which would take a humanities scholar weeks of work [4]. Among these features and those we would like to add in the future, we implemented an IR agent, we present in Section 3 and evaluate in Section 4, by embodying it by a Pepper robot. 3. Information Retrieval Agent Word documents, containing hundreds of texts, such as poems, are treated each as corpora of texts, where each text is a document. Given a word and its context, part of a document, one could be interested in other documents, containing text snippets within the same context. We assume that the surrounding words of a word within a text make up the context and refer to the context to as subjective content descriptionss (SCDs) [8]. Our IR agent assigns a score to all text snippets within all documents in the corpora, given a word and its context, as a query. Additionally, our IR agent adds an explanation for each score it assigns to the text snippets. Finally, all text snippets are returned to the human along with the associated document and an explanation in descending order sorted by score. More formally, our IR agent has access to a set of documents 𝐷 part of a corpus 𝐢. Each 𝑀1𝐷 𝑀2𝐷 𝑀3𝐷 𝑀4𝐷 𝑀5𝐷 𝑀6𝐷 𝑀7𝐷 𝑀8𝐷 𝑀9𝐷 𝐷 𝑀10 𝐷 𝑀11 𝐷 𝑀12 𝐷 𝑀13 𝑐(𝑀𝐷 ) Figure 1: Window Function over a Sequence of Words βŸ¨π‘€1𝐷 , . . . , 𝑀13 𝐷 ⟩ with π‘Ÿ = 4 βŸ¨οΈ€ 𝐷 𝐷 βŸ©οΈ€ {︁ 𝐷 is a sequence of words document }︁ 𝑀1 , . . . , 𝑀𝑛 of length 𝑛. We assume that the surrounding words 𝑀𝑗𝐷 | 𝑖 βˆ’ π‘Ÿ ≀ 𝑗 ≀ 𝑖 + π‘Ÿ of a word 𝑀𝑗𝐷 within a given radius π‘Ÿ make up the context of the word 𝑀𝑗𝐷 . As depicted in Figure 1, document 𝐷 is a sequence of words βŸ¨π‘€1𝐷 , . . . , 𝑀13𝐷 ⟩. The 𝐷 𝐷 𝐷 context is highlighted in red cross lines, each for the words 𝑀5 , 𝑀6 , 𝑀7 , and 𝐷 𝐷 {οΈ€ 𝑀𝐷 8 respectively. 𝐷 , as }οΈ€ For instance, the words that make up the context for the word 𝑀7 are 𝑀3 , . . . , 𝑀11 depicted in the third row in Figure 1. In Algorithm 1, we show how to initially compute for each word in every document in the corpus, the words, that make up the context. The result is a mapping 𝑐 that maps all words 𝑀𝑗𝐷𝑖 to a set of words, that make up the context: {︁ }︁ 𝑐 : 𝑀𝑗𝐷𝑖 β†’ π‘€π‘—βˆ’π‘Ÿπ·π‘– 𝐷𝑖 , . . . , 𝑀𝑗+π‘Ÿ , with 𝐷𝑖 being the current document, 𝑗 being being the position of word 𝑀𝑗𝐷𝑖 in document 𝐷𝑖 , and π‘Ÿ the radius. All sets of words in one document, have possibly Algorithm 1 Compute Windows 1: procedure contextWindows(𝐢, {︁ }︁ π‘Ÿ) ◁ Corpus 𝐢 and radius π‘Ÿ 𝐷𝑖 𝐷𝑖 𝐷𝑖 2: 𝑐 : 𝑀𝑗 β†’ π‘€π‘—βˆ’π‘Ÿ , . . . , 𝑀𝑗+π‘Ÿ 3: for all 𝐷𝑖 ∈ 𝐢 do 4: removeStopWords(𝐷𝑖 ) ◁ Remove stop words from document 𝐷𝑖 5: for 𝑗 (︁← π‘Ÿ to )︁ |𝐷𝑖 | βˆ’ π‘Ÿ do ◁ Length |𝐷𝑖 | of document 𝐷𝑖 𝐷𝑖 6: 𝑐 𝑀𝑗 ← {} 7: for π‘˜(︁← 𝑗 )︁ βˆ’ π‘Ÿ to (︁ 𝑗 + π‘Ÿ )︁do {︁ }︁ 𝐷𝑖 8: 𝑐 𝑀𝑗 ← 𝑐 𝑀𝑗𝐷𝑖 βˆͺ π‘€π‘˜π·π‘– 9: return 𝑐 ◁ Return mapping 𝑐 similar sets in other documents. We measure the similarity of two sets by the size of their intersection (i.e. the number of words they have in common). If it is above a given threshold 𝑑, then we assume that both contexts are similar up to an extend. Algorithm 2 returns a mapping π‘Ÿ, mapping words in all documents to text snippets in other documents, from the same context, if the similarity is above a given threshold 𝑑. A human, that is interested in text snippets from a similar context, sends a word as a query to our IR agent. Our IR agent returns all documents of similar context, given a word as a query, that contain text snippets returned by mapping π‘Ÿ, with respect to the similarity of the text snippets in descending Algorithm 2 Compute Results 1: procedure computeResults(𝑐, {︁ (︁ )︁ 𝑑) (︁ )︁}︁ ◁ Contexts 𝑐 and threshold 𝑑 2: π‘Ÿ : 𝑀𝑗 β†’ 𝑐 𝑀1 , . . . , 𝑐 π‘€π‘™π·π‘˜ 𝐷𝑖 𝐷1 3: for all 𝑀𝑗𝐷𝑖 ∈ 𝑐 do 4: for all 𝑀⃒𝑙𝐷(οΈπ‘˜ ∈ 𝑐 )︁do (︁ )︁⃒ 𝑖 ← ⃒𝑐 𝑀𝑗𝐷𝑖 ∩ 𝑐 π‘€π‘™π·π‘˜ βƒ’ βƒ’ βƒ’ 5: 6: if 𝐷𝑖 (︁̸= π·π‘˜)︁and 𝑖 (︁β‰₯ 𝑑 then )︁ (︁ )︁ 7: π‘Ÿ 𝑀𝑗𝐷𝑖 ← π‘Ÿ 𝑀𝑗𝐷𝑖 βˆͺ 𝑐 π‘€π‘™π·π‘˜ 8: return π‘Ÿ ◁ Return mapping π‘Ÿ order. (︁ )︁ Given the 𝑗-th word 𝑀𝑗𝐷𝑖 in document 𝐷𝑖 , the context of the word 𝑐 𝑀𝑗𝐷𝑖 , a text snippet (︁ )︁ (︁ )︁ (︁ )︁ {︁ }︁ 𝑐 π‘€π‘™π·π‘˜ ∈ π‘Ÿ 𝑀𝑗𝐷𝑖 , with π‘€π‘™π·π‘˜ = π‘€π‘™βˆ’π‘Ÿ π·π‘˜ π·π‘˜ , . . . , 𝑀𝑙+π‘Ÿ , from another document π·π‘˜ ΜΈ= 𝐷𝑖 , and radius π‘Ÿ, our IR agent has to generate an explanation, of why it has returned the document π·π‘˜ , among others, as a result for the query 𝑀𝑗𝐷𝑖 . It generates for each document π·π‘˜ in the result set, (︁ an )︁explanation, (︁ )︁ by returning an excerpt from each document, that (︁ contains )︁ the π·π‘˜ 𝐷𝑖 π·π‘˜ words 𝑐 𝑀𝑙 ∈ π‘Ÿ 𝑀𝑗 . Each excerpt is a sequence of words containing 𝑐 𝑀𝑙 , of which (︁ )︁ (︁ )︁ 𝑐 𝑀𝑗𝐷𝑖 ∩ 𝑐 π‘€π‘™π·π‘˜ are emphasized. The human can decide to send another query to the IR agent, and to change radius π‘Ÿ or threshold 𝑑. Even if results do not satisfy the information need of the human, the IR agent acts legible from the perspective of the human. Changing π‘Ÿ and 𝑑 each to a value that leads possibly to more sophisticated results, is possible by the human, as the IR agent explains of how it computes a result. 4. Evaluation At the open day of the Centre for the Study of Manuscript Cultures (CSMC)1 we presented our web application to an audience, at where not everyone is familiar with the concept of an IR agent. Instead of explaining the concept of an IR agent, we embodied the IR agent by the Pepper robot [5]. The experimental setup is depicted in Figure 2. The presenter sits in front of a table with a laptop, hosting the web application as well as running a web browser for accessing the web application. A 75 inch large screen is behind the presenter, in a height such that the whole screen is visible for all visitors in front of the table, mirroring the screen of the laptop. Pepper stands on the left hand site of the table, near enough for the visitors to see the contents of the tablet on its chest and to hear what he says. Our web application is controlled by the presenter. In addition to various sensors and actuators, Pepper has a machine inside his head and an Android tablet attached to his chest. The machine in his head is equipped with a quad-core Intel Atom E3845 processor up to 1.91 GHz, 4 GB RAM, and a flash memory of 32 GB. An Android tablet, connected via an internal network with the machine in Peppers head, has a 10.1 inch 1 https://www.csmc.uni-hamburg.de/openday-en.html 75 in b o er ch Ro epp t sc P re e n Ta bl ew Pr es ith en La te r pt op Vi sit ro Vi sit o r Figure 2: Experimental Setup of our Evaluation display, a TCC8925 processor with a single ARMv7 A5 core up to 833 MHz, and 1 GB RAM [5]. It currently runs Android 6.0 β€œMarshmallow”, allowing to install Android apps from the official Google Play store and to deploy self developed Android apps. Due to the hardware limitations of the tablet, even the Android interface itself is sometimes jerky, therefore the graphical design of Android apps is limited up to an extend. Pepper is equipped with a text-to-speech module, that can be accessed over an application programming interface (API), when one develops an Android app, that runs on the tablet of Pepper. The tablet, then sends texts over the internal network to the machine inside Peppers head, that is responsible, among others, for translating text into speech, that then the human can hear over Peppers speakers. As depicted in Figure 3, the laptop, running the web application, is connected with Pepper over a network. We developed an Android app, that opens a WebSocket in the background, for receiving texts from a JavaScript interface, accessible using our web application. Texts are then forwarded over the API to the text-to-speech module inside Peppers head. As we added a web view to our Android app, our web application can be used both on the laptop and directly on the tablet of the Pepper robot. We have created a video of pepper for demonstration purposes.2 Approximately 20 visitors, from the humanities, chemistries, biologies, and computer sciences, have visited our stand at the open day of the CSMC. Only the computer scientists have heard of the concept of an IR agent at beforehand. The presenter uses the web application on the laptop to first upload a Word document. Pepper then explains how he processes the uploaded document, as if he is the IR agent in the background, as described in Section 2. After the document is 2 https://www.fdr.uni-hamburg.de/record/10769/files/KI2022_CHAI-presentation4.zip Web Socket HDMI ws://192.168.0.1 Router Laptop Screen Pepper Robot Figure 3: Architecture of the Demonstration processed, its title is visible at the web application. It possibly consists of several texts, that are each treated as a document and loaded into a database. All documents in the database can be listed and its contents can be viewed with a viewer at our web application. Words 𝑀𝑗𝐷𝑖 with π‘Ÿ(𝑀𝑗𝐷𝑖 ) ΜΈ= βˆ… are highlighted at the web application as to be clickable, while the others π‘€π‘™π·π‘˜ with π‘Ÿ(π‘€π‘™π·π‘˜ ) = βˆ… are not. The visitors decide on which word the presenter should click. Finally, Pepper as the IR agent, explains how it computes the results, as described in Section 3. As far as we can tell from feedback and questions in return to our presentation, all of the visitors were able to understand, of how our IR agent computes the results, given a query, that our IR agent is running in the background, and that the results are relevant. It was not necessary to explain all the technical details, as we do in Section 2 and Section 3. 5. Human Aware IR Agent Visitors are aware of an IR agent, running in the background of our IS, implemented as a web application, as we embodied our IR agent by a Pepper robot. The Pepper robot explains as the IR agent, of how it processes documents and returns them sorted descending by a score it assigns to each of them, given a query. As we propose in [9], our IR agent can greatly improve its performance, if it would be aware of the human, such that they then can collaboratively seek for information. We refer to such an IR agent to as a human-aware IR agent, at where the human and the IR agent are modeled with their mental models β„³H and β„³ ΜƒοΈ‚A respectively, as depicted in Figure 4. On the left hand side, the IR agent approximates the information need of the human β„³H as β„³ ΜƒοΈ‚H . However, the IR agent has its own mental-model 𝑀 a ΜƒοΈ‚A , containing the information need of the human, from the perspective of the IR agent. This is comparable to a customer explaining to an IT-specialist what requirements an application to be developed has to meet. The IT specialist has years of experience, identical to our IR agent that is able to go through all documents in a corpus it has access to, and knows that the program has to meet more than the customers requirements to work properly. A human, sending a query to our IR agent, is aware of our IR agents mental-model β„³ ΜƒοΈ‚A and the IR agent itself is aware of that, as depicted on the right hand side in Figure 4. The human β„³H ΜƒοΈ‚A β„³ β„³H ΜƒοΈ‚A β„³ β„³H ΜƒοΈ‚H β„³ β„³Ah ΜƒοΈ‚Aβ€² β„³ h a Figure 4: Mental models of the human β„³H and the IR agent β„³ ΜƒοΈ‚A [10] ΜƒοΈ‚A as β„³A and the IR agent approximates β„³A as β„³ approximates 𝑀 ΜƒοΈ‚A . If the gap between β„³A h h h’ h ΜƒοΈ‚A is too large, then the IR agent’s behavior is not explicable and it should explain its and β„³ h’ behavior. As in the example before, a human and an IT specialist aim to find all requirements a program has to meet. The human expects that the IT specialist has experience in developing an application and possibly expects suggestions for improvements. If the IT specialist notes, that the human does not understand his suggestions, then the specialist should explain them. The human is more likely aware of β„³ΜƒοΈ‚A if the IR agent is embodied by a robot or animated, as we have shown in Section 4. 6. Related Work The animation of an agent is often done to make humans aware of that an actual agent is running in the background, which can improve the collaboration between humans and agents and to make agents more life-alike [11, 12]. Even humans are more likely aware of the copresence of other humans, if they are animated as an avatar [13]. However, as has been shown in the past, the animation of an agent is not sufficient at all, as it has turned out with Clippy [14]. Among other things, Clippy often interrupts a person to provide assistance even though no help is needed and even if needed, the goals of humans are often wrongly anticipated. As Kambhampati et al. note in [10], this problem has not yet been solved in the field of robotics, where agents are embodied by a robot, but crucial for the collaboration between a human and a robot. Li et al. note in a survey that humans perceive agents more positively, when they are embodied by a robot that is physically on site rather then virtually present or animated [15]. Thellman et al. in [16] add that there might be no difference, with respect to of its social presence, but note that their study is domain-specific and short. Our contribution is to first make humans aware of our IR agent, running in the background of our web application and then, as a future work, to make our IR agent aware of the humans interacting with it. 7. Conclusion and Future Work As has been shown at our demonstration, we do not need to explicitly explain the concept of our IR agent, if it is embodied by a humanoid robot, such as Pepper. Currently, we use the API of Pepper, such that it speaks out what the web application sends to it. The API provides more than that and we aim to extend our IR agent demonstration, such that visitors can interact with it using the speech-to-text and text-to-speech modules inside the machine of Peppers head. That allows for perceiving our IR agent even more as an individual, that aims to collaboratively seek together with humans for information and thereby to satisfy their information needs. As mentioned in Section 5, an IR agent can greatly improve its performance if it is human- aware. We will further develop our IR agent [9], such that it is human-aware and then can be embedded in the Pepper robot for demonstration purposes. References [1] S. Russel, P. Norvig, Artificial Intelligence: A Modern Approach, 2021. [2] K. S. Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of documentation 28 (2004). [3] J. Beel, B. Gipp, S. Langer, C. Breitinger, Paper recommender systems: a literature survey, International Journal on Digital Libraries 17 (2016) 305–338. [4] S. Schiff, S. Melzer, E. Wilden, R. MΓΆller, TEI-Based Interactive Critical Editions, in: International Workshop on Document Analysis Systems, Springer, 2022, pp. 230–244. [5] A. K. Pandey, R. Gelin, A Mass-Produced Sociable Humanoid Robot, IEEE Robotics & Automation Magazine 25 (2018) 40–48. [6] E. Wilden, A Critical Edition and an Annotated Translation of the AkanaΜ„ nuΜ„ _ _ru: Part 1, Kali_r_riyaΜ„ nainirai. _ Old commentary on Kali_r_riyaΜ„ nainirai _ KV - 90, word index of AkanaΜ„ nuΜ„ _ _ru KV - 120, Γ‰cole FranΓ§aise d’ExtrΓͺme-Orient, 2018. [7] T. Parr, The Definitive ANTLR 4 Reference, Pragmatic Bookshelf, 2013. [8] F. Kuhr, T. Braun, M. Bender, R. MΓΆller, To Extend or not to Extend? Context-Specific Corpus Enrichment, in: Australasian Joint Conference on Artificial Intelligence, Springer, 2019, pp. 357–368. [9] S. Schiff, R. MΓΆller, On Human-Aware Information Seeking, in: CHAI@KI, 2021, pp. 31–39. [10] S. Kambhampati, Challenges of Human-Aware AI Systems, CoRR abs/1910.07089 (2019). URL: http://arxiv.org/abs/1910.07089. [11] T. Holz, M. Dragone, G. M. O’Hare, Where Robots and Virtual Agents Meet, International Journal of Social Robotics 1 (2009) 83–93. [12] M. Thiebaux, S. Marsella, A. N. Marshall, M. Kallmann, Smartbody: Behavior Realization for Embodied Conversational Agents, in: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 1, 2008, pp. 151–158. [13] M. Gerhard, D. Moore, D. Hobbs, Embodiment and copresence in collaborative interfaces, International Journal of Human-Computer Studies 61 (2004) 453–480. [14] N. Baym, L. Shifman, C. Persaud, K. Wagman, Intelligent Failures: Clippy Memes and the Limits of Digital Assistants, AoIR Selected Papers of Internet Research (2019). [15] J. Li, The Benefit of Being Physically Present: A Survey of Experimental Works Comparing Copresent Robots, Telepresent Robots and Virtual Agents, International Journal of Human- Computer Studies 77 (2015) 23–37. [16] S. Thellman, A. Silvervarg, A. Gulz, T. Ziemke, Physical vs. Virtual Agent Embodiment and Effects on Social Interaction, in: International conference on intelligent virtual agents, Springer, 2016, pp. 412–415.