Conversational Assistants for Elderly Users – The Importance of Socially Cooperative Dialogue Stefan Kopp Mara Brandt Hendrik Buschmeier Bielefeld University, CITEC Bielefeld University, CITEC Bielefeld University, CITEC skopp@uni-bielefeld.de mbrandt@techfak.uni-bielefeld.de hbuschme@uni-bielefeld.de Katharina Cyra Farina Freigang Nicole Krämer University of Duisburg-Essen Bielefeld University, CITEC University of Duisburg-Essen katharina.cyra@uni-due.de farina.freigang@uni-bielefeld.de nicole.kraemer@uni-due.de Franz Kummert Christiane Opfermann Karola Pitsch Bielefeld University, CITEC University of Duisburg-Essen University of Duisburg-Essen franz@techfak.uni-bielefeld.de christiane.opfermann@uni-due.de karola.pitsch@uni-due.de Lars Schillingmann Carolin Straßmann Eduard Wall Bielefeld University, CITEC University of Duisburg-Essen Bielefeld University, CITEC lschilli@techfak.uni-bielefeld.de carolin.strassmann@uni-due.de ewall@techfak.uni-bielefeld.de Ramin Yaghoubzadeh Bielefeld University, CITEC ryaghoub@techfak.uni-bielefeld.de ABSTRACT Conversational agents can provide valuable cognitive and/or emo- tional assistance to elderly users or people with cognitive impair- ments who often have difficulties in organizing and following a structured day schedule. Previous research showed that a virtual assistant that can interact in spoken language would be a desir- able help for those users. However, these user groups pose spe- cific requirements for spoken dialogue interaction that existing systems hardly meet. This paper presents work on a virtual conver- sational assistant that was designed for, and together with, elderly as well as cognitively handicapped users. It has been specifically developed to enable ‘socially cooperative dialogue’ – adaptive and aware conversational interaction in which mutual understanding is co-constructed and ensured collaboratively. The technical approach is described and results of evaluation studies are reported. KEYWORDS Figure 1: Concept for an elderly user interacting with the virtual assistant ‘Billie’ in her home environment. conversational assistants; elderly users; cooperative dialogue 1 INTRODUCTION needing home assistance services, and cognitively handicapped In recent years politics and society have placed emphasis on ways users that are already supported by professional care-givers. What to enable a longer autonomous and self-determined life for elderly both user groups have in common are mild cognitive impairments people. One approach is the development of assistive technology. that create a need for support with autonomously organizing and However, this has often been focused on supporting physical tasks following a structured day schedule. (e.g., fetching or lifting objects, moving around) and it has been While technical means of supporting this are already available, struggling with questions of human–machine interaction and user many elderly users have little prior experience with using assistive acceptance. The goal of the kompass project (which started in 2015) systems. Applying such technology thus requires to overcome a was to develop a virtual assistant (‘Billie’) to accompany and guide a ‘digital barrier’ both with the individual users as well as their care- user throughout the day. The system has been specifically designed providing environment. The kompass project built on pre-studies for, and together with, two user groups: elderly users that live [13] suggesting that natural spoken-language interaction with a autonomously in their home environment but are on the verge of virtual agent may be desirable and acceptable for these user groups. 10 Building and applying conversational agents for these user groups, 3 REQUIREMENTS however, raises it’s own challenges. Elderly users often have select- 3.1 Interactional tasks ively impaired abilities, e.g., for auditory perception, articulation, adapting to a recommended interaction style, adhering to a clean The goal of the KOMPASS project was to develop a conversational turn-taking structure, or comprehending content of high informa- assistant that helps users organize and keep track of their schedule tion density [35, 37]. We thus set out to develop a conversational for the day. This goal was identified by our application partner agent that provides a dialogue style that enables robust and reliable, (v. Bodelschwingh Foundation Bethel, Bielefeld, Germany) as an yet acceptable spoken-language interactions with these user groups. important need of their respective client groups. The conversational We refer to this special quality as ‘socially cooperative dialogue’ agent system ‘Billie’ thus offers several functions within the domain [32]. of schedule management. Users can enter their various kinds of In this paper we present our approach and report on results appointments (single appointments, recurrent appointments with obtained in evaluation studies. After discussing related work in specification of recurrence, e.g., weekly, biweekly and for how long the next section, Sect. 3 points our requirements before Sect. 4 de- the recurrent appointment will be reiterated), they can choose being scribes our approach to modeling socially cooperative dialogue in reminded of them including setting a time for the reminder, and they the virtual assistant ‘Billie’. Section 5 presents results and lessons can edit already entered appointments. The editing of appointments learned from several evaluation studies carried out with users in comprises the following sub-tasks: Users can change any of the the lab environment as well as in their real home environment (on- appointment values ‘start time’, ‘end time’, ‘topic’ and ‘duration’, going), showing how conversational agents can be built to achieve they can delete or replace appointments within the calendar, and the interaction abilities needed to provide elderly users and mildly they can query their entered appointments of any point in time cognitively impaired persons with successful assistance. (same day, same week, forthcoming day or days and weeks, and previous days or weeks). Moreover, the agent system provides for user-tailored suggestions for leisure time activities [17, 18] in order 2 RELATED WORK to promote a more active life. Several conversational assistants have been developed for care- related settings such as companionship for people living alone [20], assistance in multilingual care-giving/receiving [31], or pain and 3.2 Socially cooperative dialogue affect management [28]. An increasing body of work suggests that In line with other previous work, focus groups and pre-studies that spoken interaction with users with cognitive impairments seems, we carried out within a user-centered design process made clear in general, to be feasible and accepted by the user group, though that assistive functions of the system preferably should be accessible some requirements need to be met. Meis [15] noted that older sub- and realized through easy-to-use spoken dialogue interaction [13]. jects want a spoken-dialogue helper to have a name and to react Further analyses of several quasi experimental studies (run in a contingently to social affordances such as expressions of gratitude. Wizard-of-Oz scenario in 2015, as a semi-autonomous study in Miehle and colleagues [16] noted that conversational assistants are 2017, and as a long-term study ongoing; see section 5) pointed to required to speak sufficiently loud and with an appropriate pace but the fact that spoken-language interaction, while being generally were accepted as interlocutors in a study with elderly people. Bick- preferred, raises a number of (well-known) challenges that tend more and colleagues [2] analyzed long-term interactions of older to be amplified in our user groups. Therefore, the conversational adults with an agent-based coaching system using spoken language assistant has to fulfill specific requirements for the users to be while user input was given through a touchscreen and found them acceptable and successful with respect to the various sub-tasks of effective on the short term. Sidner and colleagues [23] attempted to schedule management: identify preferred domains of conversation or joint activity based on this system design. Yaghoubzadeh and colleagues [36] reported • The system has to be able to deal with long, extensive user that older adults and people with cognitive impairments are able utterances. Interruptions and barge-ins by the user must be to successfully ground information. Explicit confirmation patterns possible at all times, in particular when they are instrumental and a low information density (one information unit per utterance) in solving the current communicative task at hand. Overall, enabled the user to detect and repair more of the system’s language turn-taking has to be cooperative such that interruptions understanding problems and subsequent errors. by the system should be foreshadowed through nonverbal A well-discussed concept that is central for successful human– behavior [33]. Simultaneously, the system must be robust to human dialogue is ‘grounding’ [6]. Researchers have attempted to non-cooperative turn-taking behavior of the user such that model it computationally for conversational agents, discretely as turn fights are avoided (generally yielding the turn to the a finite-state process [27] or probabilistically using Bayesian net- user). works [19]. Recent work on real-time dialogue systems has focused • Generally, the system must work to ensure dynamic coordin- on advanced issues so that discourse context is taken into account ation of understanding and grounding in dialogue. Feed- [24], partial and overlapping utterances can be grounded incre- back by the system to user input must be provided timely mentally [29], groundedness can be estimated from multimodal to prevent long user turns, and clearly mark the system’s feedback cues [4], or information from multiple modalities related current level of understanding [8, 9]; user feedback must to socio-emotional aspects such as attention and engagement are be continuously processed and interpreted for indicators of taken into account [14]. miscommunication. 11 • The handling of understanding problems on part of the sys- Fusion and interpretation Dialog management Behavior generation Behavior realisation tem or the user is crucial. User-initiated displays of non- Nuance Dragon Natural Natural language language understanding (e.g., “Sorry” or “Can you repeat please”) must ASR understanding flexdiam generation Asap ! – Issues Realiser always be possible and handled by the system properly. For Tobii 4C – Planning Gaze Eyetracking – flow_ctl non-specific system displays of non-understanding, a vari- Estimation of – User model –… Head-gesture ation of error handling strategies must be available, e.g., recognition contact and engagement Gesture in form of reprompts and non-understanding notification, Filled pause Calendar Calendar detection manager combined with more restrictive clarification sequences de- pending on the sub-task and local move. While reprompts may be beneficial for problem solving in its first issuing [17], Timeboard a lack of progress in problem solving without a change in asrstate sil… speech silence spe… error recovery strategy may lead to further complications userWords Billie, I’d like to add a new a… On … userGaze ? agent calendar ag… (cf. [17]). This holds especially in action contexts like ap- floor yielded pointment suggestions in which user responses can address agentWords Okay, tell me about your ap… nlgRequest introduce_topic(abs) social matters like willingness, availability, disposition and agentGaze idle attentive speaking attentive … deontic authority. Thus, error handling on part of the system should employ strategies that clarify the user’s agreement or resistance [18]. Figure 2: Overall architecture of the conversational assistant • Topic shifts by the user must be possible and followed readily by the system. This requires the system to also keep track of non-settled discourse segments and to return to them at realization (synthesis, graphical components / GUI changes, control appropriate points in time (when the user does not pursue of animated characters etc.). other discourse goals) and with a cooperative and gentle The architecture is built around a ‘timeboard’, a central rep- entrance strategy, possibly with repeating and rectifying resentation that captures temporal information of interactional parts of the discourse unit that have been discussed already. events on different tiers. Importantly, these tiers hold rewindable Generally, the system should avoid topic shifts. If they are representations of certain and uncertain variables (probability dis- unavoidable, e.g., because a previous discourse topic has not tributions) with generic metrics – like entropy – that serve as the been settled yet, they should be of as close a distance as basis for local decision heuristics. Event-driven observers are used possible and must be marked explicitly. to derive events from interval relations between existing ones, and trigger higher-level functions, most centrally the dialogue manager The requirements listed above are (necessary, but most probably proper, but also a contribution manager, which schedules queued not sufficient) examples of a specific dialogue quality that we deem communicative intentions when the floor situation allows. necessary for our user groups. We refer to this dialogue quality as ‘socially cooperative’ [32] and note that it goes well beyond classical 4.2 Dialogue management notions like grounding in dialogue, as it implies a specific role that the agent has to fulfill consistently throughout the interaction. This To realize the required dialogue management abilities, the ‘flexdiam’ role entails a range of collaborative-supportive action policies, e.g., [34] system has been developed. Following an issue-based approach, for readily following topic shifts, yielding turns, adhering to rules it generally pursues a single joint task and discourse model for both of politeness, and adapting dialogue structures thoroughly to the interactants. The basic structure of the joint task and discourse needs of the user. model is a forest of independent but hierarchically interdependent agents termed ‘Issues’, as well as generic update rules to transform 4 APPROACH AND IMPLEMENTATION this forest after dialogue management invocations. When an Issue is instantiated, it is at the same time made a child of the Issue that 4.1 Architecture created it. Any path from a leaf Issue to the root corresponds to a To account for the requirements identified above, we have based nested (sub-)topic of discussion. Any number of topics can be active the conversational assistant on an interaction framework that aims at any one time and will be considered valid points of reference in to support the features of incrementality (to quickly update and parallel, if applicable according to their grounding state. To that relay discussed information), provisions for representation and res- end, any Issue can be in one of five states (new, entered, fulfilled, olution of uncertainty (resulting from input and unclear grounding) failed, obsolete). with explicit representation of topics, structured hierarchically in Invocations that trigger local processing in Issues come in two units intuitive to laymen. The overall architecture is shown in fig. 2. flavors: input handling (e.g., prompt request, NLU parse) and plan It is built on top of the IPAACA middleware [21], a distributed, structure updates (e.g., child issue progressed, completed or inval- platform-independent implementation of a general model for in- idated). Issues will decide along their local path in the hierarchy, cremental dialogue processing proposed by Schlangen and Skantze and based on the current global context, whether they can provide [22]. This provides the back-end for the connection of the core a plan to handle an invocation. If an Issue cannot handle an input dialogue management components to input (including ASR, tagger handling invocation locally, a preference is marked to let its parent and parser, eye tracker, keyboard/mouse/touch etc.), multimodal handle it instead. Partial localized processing does not preclude fusion, behavior planning (NLG, gaze, gesture, calender) and output propagation through the hierarchy, though. This allows for situated 12 partial interpretation and processing, which is most specific and situation-dependent in the leaves, and most generic and general in the roots of the forest. If a user contribution does not fit well into any active Issue, a discourse transition based on user initiative can be assumed to have taken place. Depending on the situation, this could be construed as either a forward-looking contribution (if anticipated by the cur- rently invoked entrance point or a direct ancestor) or a real topic jump. A new branch is created then, marked as entered, and moved to the top of the entry point priority queue. Note that the present system is suited to quick, interactive ap- proaches to spoken interaction and to modeling real-world applica- tions within limited domains. Manual extension is quite straight- forward. Incremental processing and the handling of uncertain input and information derived from it has received special focus, the ‘output’ side employs a similar notion of indeterminate state until evidence for communicative success provides a precondition for grounding being attested. Communicative plans are capable of employing several modalities and the implemented suite of basic Issues for grounding problems can be fine-tuned to cover a wide space of varying explicitness, verbosity, and conversational styles, which can be used to seed user models that best suit the estim- ated capabilities and preferences of our specific user groups. This extends to information density (configurable via different options for packaging and different approaches to confirmation requests), Figure 3: Screenshot of the system processing and fusing but also discourse structure: explicit ratification for topic jumps socio-communicative signals. beyond a distance threshold (and implicit acceptance by means of contingent continuation by the user) are currently in development. 4.4 Multimodal expressiveness Regarding output behavior, the conversational assistant ‘Billie’ is 4.3 Socio-communicative signal processing enhanced with multimodal cues such as gestures, facial expressions Human communication is highly multi-modal and thus the ability and head movements in order to develop a more natural behavior to process this variety of information is very important to facilitate and making it easily accessible, understandable and helpful for communication with a virtual agent. Therefore, several modules the human user. The cues were selected based on an analysis of in our architecture recognize visual communication signals and their form, function and frequency in natural interaction data [11]. non-verbal socio-emotional speech cues. Confirmations play an Specifically, we simulate cues that serve the pragmatic functions to important role in the dialog structure of the interaction with ‘Billie’. convey and mark emphasis, de-emphasis and (un-)certainty of the Therefore, we focused on the recognition of natural confirmation speaker. These functions have been successfully mapped onto the signals, like nodding and non-lexical confirmations like “mhm”, conversational agent [10, 12]. For the calendar domain, key phrases which are typically not recognized by automatic speech recognition. were selected and are accompanied by these multimodal functions if To detect non-lexical confirmations, the speech signal is segmen- applicable (emphasis: “Let’s continue!”, de-emphasis: “Good, this is ted into speech intervals using voice activity detection (VAD). Our canceled”, uncertainty: “Did you say ‘swimming’?”; see fig. 4). The module detects non-lexical confirmations by extracting acoustic multimodal expressiveness has been evaluated with the participants features and classifying the result using a Support Vector Machine in lab-based studies (e.g., showing better information uptake) and (SVM). If confirmations are detected, the component sends mes- will be evaluated systematically during a long-term study. sages via the IPAACA middleware to inform other components [3]. Additionally to the assistant ‘Billie’ itself, we designed different Further, the system is able to detect human head nods based on states, visual and sound features of the weekly-based calendar that dynamic time warping and estimations of head pose angles from support the dialogue between user and agent multimodally. As facial landmark features [30]. In addition, the face detection is used previous analyses have shown [8], users orient toward the calendar to verify contact with the user. A cue aggregation module combines area of the interface while entering appointments as this is the signals detected in individual modalities to derive a higher-level area of interest for the ongoing interactional task. So, to design interpretation using a Bayesian network (see Fig. 3). Currently, the a responsive and recipient system that supports the dialogue not system detects if the user signals confirmation by combining non- only with verbal and non-verbal features of the agent, the calendar lexical confirmation and nod detection. User contact is detected by provides visual cues (mainly through highlighting) to comprehen- combining face detection and eye contact related cues. sion hypotheses of the system even before words are uttered by the agent (see fig. 5). These visual updates represent the system’s 13 Emphasizing De-emphasizing Uncertainty abstract deictic brushing palm up open hand Figure 4: Multimodal cues used by the conversational assist- Figure 6: Example of the data recorded during the field eval- ant for different pragmatic functions. uation study. 5 EVALUATION RESULTS In our project we followed an iterative design-implement-evaluation approach that comprised a number of empirical evaluations of in- dividual sub-systems of the agent [3, 5, 8–10, 17, 18, 25, 26, 30]. Further, to gather training data for the socio-emotional signal re- cognition components and to get insights into users’ reactions to potential interaction problems, an initial Wizard-of-Oz study with 53 participants from the different user groups (18 senior parti- cipants, 19 participants with cognitive impairments and, 16 controls from the local student population) was carried out. In order to elicit participants’ reactions, the agent’s behavior in this study followed a script that was designed to create a number of typical interaction problems. Participants could negotiate and enter their own appoint- ments, but could also use previously prepared appointment cards. Figure 5: The calendar including a visual cue (highlighting In the following, we report more recent studies that were carried of the current day), an appointment “Coffee and Cake” and out to evaluate the socially cooperative dialogue abilities of the the conversational assistant confirming the successful entry full-blown conversational assistant in less restricted interactions. of the appointment. 5.1 Lab-based evaluation Based on first insights gathered from the above described WOz- study as well as on knowledge acquired in preparatory studies status regarding the understanding of user input. Furthermore, a [13, 35], a first version of a semi-autonomous agent was evaluated sound was added to mark the successful entry of appointments. in a laboratory setting. This study investigated whether participants from the different user groups – without specific instructions – were able to carry out calendar-related tasks through spoken in- 4.5 Data Recording teraction with the agent. Furthermore, the study’s objective was Our system architecture is designed to be used in both lab and field to investigate how participants manage transitions between bigger environments. As manual recording control is not feasible in long topics/issues, how the length of participants’ utterances vary given term field studies, the architecture contains an automatic recording different confirmation strategies, how users react to communication module which starts recording when the user starts interacting with of uncertainty, and whether the agent’s ways of guiding the users’ the system by pushing a button. Recording is automatically stopped attention (via voice, manual gesture, gaze, calendar highlighting, if the user says goodbye and, in order to ensure users’ privacy, when and sounds) is effective. the user is not visible to the system and does not react to system We employed a system that autonomously handled dialogue prompts for an extended period of time, or when a system error is management for entering appointments as well as for stating ap- detected. The recordings comprise five video and five audio tracks pointment suggestions. A human ‘wizard’ was included only for which are compressed in real-time using hardware acceleration. controlling transitions between global modes of user entering ap- One video track (depicted in Fig. 6) is a 4-in-1 overview of the other pointments, auto-generated partial suggestions of appointments video tracks. System and interaction log files are archived after each by the agent (“Would you like to do something on Saturday?”), and session. closing the interaction. 44 participants took part in the study: 19 14 older adults (SEN) aged about 75+; 15 cognitively impaired adults the prototype system is placed into the participants’ apartment for (CIM) of working age; and 10 students serving as control group a period of about 15 days (including setup and dismounting day). (CTL). The task was free-form entering of appointments. All sub- Participants are asked to manage their daily schedule together with jects managed to enter the required number of appointments into ”Billie” and to jot down their impressions about the system into a the calendar. The number of final entries averaged 10.4, 8.5, and research diary (freely as well as to structured questions). After the 8.9 for CTL, SEN and CIM, respectively (including up to two agent- period of applying the system, participants give a final rating in recommended items). Older adults and the group with impairments the form of a semi-structured interview. on average spent about 20% longer on a topic than controls; some participants from the CIM group made long hesitations in isolated First results. The above described study design has been carried instances (up to tens of seconds; see [34] for a detailed discussion). out with one female senior person as the first long-term study. The Still, the socially cooperative dialogue abilities of the agent enabled system has been in use in the participant’s home for 13 days (exclud- the user to conduct successful repair with the agent and to settle in ing a setup and a dismounting day), during which she interacted on acceptable solutions in every subtask. 61 times with the system, for a total duration of 284 minutes and The study was also conducted to gain qualitative insight into 46 seconds. She used the system for a mean number of 4.8 times the repair, revision and meta-communicative patterns exhibited by per day (SD = 1.7, Min = 2, Max = 8). Although usage over the the user groups. Further, we wanted to observe temporal aspects duration of the study varied between days, it did not differ much of the planning and verbalizing of an appointment to learn about between the first seven days (M = 5.3SD = 1.8, Min = 3, Max = 8) participants’ practices in different phases of appointment entry and the last six days (Mean = 4.2, SD = 1.6, Min = 2, Max = 6), (to eventually design the timing and turn taking of the system). suggesting that the participant did not lose interest. Analyses of the data are still ongoing and focus among others (1) Interaction durations varied greatly, with the shortest interaction on the analysis of gaze when interactional trouble occurs, (2) on lasting only 6 seconds and the longest interaction lasting 18 minutes the system’s repair strategies and (3) on the different multimodal and 34 seconds. Mean interaction duration is 4:26 minutes (SD = states and uptake strategies of the system (as a representation of 5:11). This is mainly due to the fact that interactions differ due its recipiency) with effects on turn production of the users. to the type of activity. Interactions that are initiated by the user are typically longer, whereas agent-initiated reminder interaction can often be handled quickly and usually do not lead to longer 5.2 Field study dialogues. Pending a detailed analysis of the actual interaction A long-term field study is currently ongoing to evaluate the sys- logs, we differentiate between reminder-based interactions and tem’s performance and effects, as well as how participants adopt other interactions by setting a threshold of 120 seconds. The 33 and handle it in their home environment over a 15-day period. For shorter – probably reminder – interactions have a mean duration this study we implemented and apply a fully autonomous system of 45 seconds (SD = 0:33). The 28 longer interactions have a mean with no additional aids by a human wizard (see fig. 7 for the setup). duration of 8:57 minutes (SD =4:37). This differentiation gives This study comprises an ethnographic component focusing on daily further insight into daily usage of the system: A mean of 2.5 (SD = life management in the homes of seniors living alone and people 1.7) of the interactions per day were reminders, and a mean of 2.2 with cognitive impairments in supported-living [1, 7]. These ana- (SD = 1.0) of the interactions lasted longer than 120 seconds. lyses aim to shed light also on broader questions such as: What are The durations of the interactions indicate that the participant the challenges when a novel technology is brought into a household used the conversational assistant quite a lot. After the 13 days of of the target groups? What are the participants’ experiences and usage she had entered 67 unique events in her calender, five of expectations of a technological assistant? What are the effects of which were serial events (yielding a total of 132 events displayed the assistant on participants’ daily routines and in particular their in the calendar). The large number of reminder-based interactions schedule management? And, an overall topic considered through- (33) also indicates that she successfully used this function of the out the project, what are issues with regard to privacy and data system. protection with our special user groups? As the first main study has just ended, further analyses regarding Overall, the field study followed an iterative approach consisting the changes of daily life and routines are still due. However, the of three phases. In the pre-pilot study, we aim to acquaint parti- pre-pilot and pilot study already has been carried out with other cipants with the system and to identify their specific expectations elderly participants who could interact with the prototype system and needs. Researchers lead semi-structured interviews in the apart- for three days at home. Different assessments or concerns were ments of the participants and discuss the possible placement of the gathered from these participants as anecdotal feedback: system in the apartment using a full-size paper prototype. Next, a • Concerning the social presence and relationship: “We greet pilot study evaluates the feasibility of the main study with a special each other kindly every morning and he is asking what he focus on the robustness of the system and its performance outside can do for me. It’s great.“ of the lab. Therefore the prototype system is set up in the apart- • Concerning the dialogue: “Whenever we had problems in a ments of the participants for a period of about 48 hours. In addition conversation, we could resolve them. Was really nice.” to the system evaluation, we want to learn about the acceptance • After coming out of coma and being isolated in hospital: of the system by the participants and their assessment of the dia- “I would have been glad about having such a guy next to logue design. This provides the basis for further optimizations of my bed.” (in order to practice speaking and to have social the dialogue design and preparation of the main study, for which interactions) 15 the design and implementation phases, which also helped a lot in increasing acceptance and willingness to participate in the project. ACKNOWLEDGMENTS This research was supported by the German Research Foundation (DFG) in the Cluster of Excellence ‘Cognitive Interaction Techno- logy’(EXC 277) and by the German Federal Ministry of Education and Research (BMBF) in the project ‘KOMPASS’ (FKZ 16SV7271K). REFERENCES [1] Antje Amrhein, Katharina Cyra, and Karola Pitsch. 2016. Processes of reminding and requesting in supporting people with special needs. Human practices as basis for modeling a virtual assistant?. In Proceedings 1st ECAI Workshop on Ethics in the Design of Intelligent Agents. The Hague, The Netherlands, 18–23. [2] Timothy W. Bickmore, Rebecca A. Silliman, Kerrie Nelson, Debbie M. Cheng, Michael Winter, Lori Henault, and Michael K. Paasche-Orlow. 2013. A randomized controlled trial of an automated exercise coach for older adults. Journal of the American Geriatrics Society 61 (2013), 1676–1683. https://doi.org/10.1111/jgs.12449 [3] Mara Brandt, Britta Wrede, Franz Kummert, and Lars Schillingmann. 2017. Con- firmation detection in human-agent interaction using non-lexical speech cues. Presented at the AAAI Fall Symposium on Natural Communication for Human- Robot Collaboration. (2017). [4] Hendrik Buschmeier and Stefan Kopp. 2013. Co-constructing grounded symbols— Feedback and incremental adaptation in human–agent dialogue. Künstliche Intelligenz 27 (2013), 137–143. https://doi.org/10.1007/s13218-013-0241-8 [5] Hendrik Buschmeier and Stefan Kopp. 2018. Communicative listener feedback in human–agent interaction: artificial speakers need to be attentive and adaptive. Figure 7: The prototype system deployed in an elderly parti- In Proceedings of the 17th International Conference on Autonomous Agents and cipant’s apartment. Multiagent Systems. Stockholm, Sweden. [6] Herbert H. Clark. 1996. Using Language. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511620539 [7] Katharina Cyra, Antje Amrhein, and Karola Pitsch. 2016. Fallstudien zur All- • Concerning the size of the current setup and affordability: tagsrelevanz von Zeit- und Kalenderkonzepten. In Mensch und Computer 2016 “Who wants to have this in the living room? Me not and I Kurzbeiträge. Aachen, Germany, 1–5. https://doi.org/10.18420/muc2016-mci-0253 [8] Katharina Cyra and Karola Pitsch. 2017. Dealing with long utterances: How wouldn’t buy it either, if it’s a lot of money.” to interrupt the user in a socially acceptable manner?. In Proceedings of the • Concerning the duration of acquaintance: “He doesn’t know 5th International Conference on Human Agent Interaction. Bielefeld, Germany, anything about me. If that’s possible, one has to use such a 341–345. https://doi.org/10.1145/3125739.3132586 [9] Katharina Cyra and Karola Pitsch. 2017. Dealing with ‘long turns‘ produced device for at least half a year to get used to it.” by users of an assistive system: How missing uptake and recipiency lead to turn increments. In Proceedings of the 26th IEEE International Symposium on Robot and Human Interactive Communication. Lisbon, Portugal, 329–334. https: 6 CONCLUSIONS //doi.org/10.1109/ROMAN.2017.8172322 The present work has explored how conversational agents can be [10] Farina Freigang, Sören Klett, and Stefan Kopp. 2017. Pragmatic multimodality: Effects of nonverbal cues of focus and certainty in a virtual human. In Proceedings used to provide cognitive or emotional assistance to elderly users. of the 17th International Conference on Intelligent Virtual Agents. Stockholm, We have focused in particular on the use of spoken language dia- Sweden, 142–155. https://doi.org/10.1007/978-3-319-67401-8_16 logue as a preferred way of interacting with technical systems (as [11] Farina Freigang and Stefan Kopp. 2015. Analysing the modifying functions of gesture in multimodal utterances. In Proceedings of the 4th Conference on Gesture indicated by studies by others as well as ourselves). Yet, enabling and Speech in Interaction (GESPIN). Nantes, France, 107–112. successful and acceptable dialogue with these user groups raises [12] Farina Freigang and Stefan Kopp. 2016. This is what’s important—using speech and gesture to create focus in multimodal utterance. In Proceedings of the 16th several challenges and communication problems abound quickly International Conference on Intelligent Virtual Agents. Los Angeles, CA, USA, with off-the-shelf dialogue system technology. However, our find- 96–109. https://doi.org/10.1007/978-3-319-47665-0_9 ings indicate that virtual assistants can still be an effective and [13] Marcel Kramer, Ramin Yaghoubzadeh, Stefan Kopp, and Karola Pitsch. 2013. A conversational virtual human as autonomous assistant for elderly and cognitively acceptable help if they provide abilities for the kind of socially impaired users? Social acceptability and design considerations. In Proceedings of cooperative dialogue needed to resolve these issues. The key in- INFORMATIK 2013. Koblenz, Germany, 1105–1119. sight of the present project is how conversational agents can be [14] Gregor Mehlmann, Kathrin Janowski, and Elisabeth André. 2016. Modeling grounding for interactive social companions. Künstliche Intelligenz 30 (2016), built such that this is possible for the majority of issues, even for 45–52. https://doi.org/10.1007/s13218-015-0397-5 the special user groups of persons with a mild cognitive impair- [15] Markus Meis. 2013. Nutzerzentrierte Entwicklung eines Erinnerungsas- sistenten. Presented at Abschlusssymposium Niedersächsischer Forschungs- ment (and often also additional motoric or perceptual handicaps). verbund Gestaltung altersgerechter Lebenswelten. (2013). https://www. This requires numerous things, from the processing of subtle, mul- altersgerechte-lebenswelten.de/ timodal and context-dependent communication-relevant signals, [16] Juliana Miehle, Ilker Bagci, Wolfgang Minker, and Stefan Ultes. 2017. A so- cial companion and conversation partner for elderly. In Proceedings of the 8th to generating them in combination with visual cues (calender), to International Workshop On Spoken Dialogue Systems. Farmington, PA, USA. enabling a highly flexible dialogue with responsive turn-taking, [17] Christiane Opfermann and Karola Pitsch. 2017. Reprompts as error handling communicative feedback, and (pro-)active strategies for avoiding strategy in human–agent-dialog? User responses to a system’s display of non- understanding. In Proceedings of the 26th IEEE International Symposium on Robot communication problems as well as repairing them. One prerequis- and Human Interactive Communication. Lisbon, Portugal, 310–316. https://doi. ite to achieve this was a high degree of user involvement throughout org/10.1109/ROMAN.2017.8172319 16 [18] Christiane Opfermann, Karola Pitsch, Ramin Yaghoubzadeh, and Stefan Kopp. an automatic pain- and emotion-recognition system. In Proceedings of the 4th 2017. The communicative activity of ‘making suggestions’ as an interactional IAPR TC 9 Workshop on Pattern Recognition of Social Signals in Human-Computer- process: Towards a dialog model for HAI. In Proceedings of the 5th International Interaction. Cancun, Mexico, 127–139. https://doi.org/10.1007/978-3-319-59259-6 Conference on Human Agent Interaction. Bielefeld, Germany, 161–170. https: [29] Thomas Visser, David R. Traum, David DeVault, and Rieks op den Akker. 2014. //doi.org/10.1145/3125739.3125752 A model for incremental grounding in spoken dialogue systems. Journal on Mul- [19] Tim Paek and Eric Horvitz. 2000. Conversation as action under uncertainty. In timodal User Interfaces 8 (2014), 61–73. https://doi.org/10.1007/s12193-013-0147-7 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford, [30] Eduard Wall, Lars Schillingmann, and Franz Kummert. 2017. Online nod detec- CA, USA, 455–464. tion in human–robot interaction. In Proceedings of the 26th IEEE International [20] Lazlo Ring, Lin Shi, Kathleen Totzke, and Timothy Bickmore. 2014. Social Symposium on Robot and Human Interactive Communication. Lisbon, Portugal, support agents for older adults: longitudinal affective computing in the home. 811–817. https://doi.org/10.1109/ROMAN.2017.8172396 Journal on Multimodal User Interfaces 9 (2014), 79–88. https://doi.org/10.1007/ [31] Leo Wanner, Elisabeth André, Josep Blat, et al. 2017. KRISTINA: A knowledge- s12193-014-0157-0 based virtual conversation agent. In Proceedings of the 15th International Confer- [21] David Schlangen, Timo Baumann, Hendrik Buschmeier, Okko Buß, Stefan Kopp, ence on Practical Applications of Agents and Multi-Agent Systems. Porto, Portugal, Gabriel Skantze, and Ramin Yaghoubzadeh. 2010. Middleware for incremental 284–295. https://doi.org/10.1007/978-3-319-59930-4_23 processing in conversational agents. In Proceedings of the 11th Annual Meeting of [32] Ramin Yaghoubzadeh, Hendrik Buschmeier, and Stefan Kopp. 2015. Socially the Special Interest Group in Discourse and Dialogue. Tokyo, Japan, 51–54. cooperative behavior for artificial companions for elderly and cognitively im- [22] David Schlangen and Gabriel Skantze. 2011. A general, abstract model of paired people. In Proceedings of the 1st International Symposium on Companion- incremental dialogue processing. Dialogue and Discourse 2 (2011), 83–111. Technology. Ulm, Germany, 15–19. https://doi.org/10.5087/dad.2011.105 [33] Ramin Yaghoubzadeh and Stefan Kopp. 2016. Towards graceful turn management [23] Candace Sidner, Timothy Bickmore, Charles Rich, Barbara Barry, Lazlo Ring, in human-agent interaction for people with cognitive impairments. In Proceedings Morteza Behrooz, and Mohammad Shayganfar. 2013. Demonstration of an always- of the 7th Workshop on Speech and Language Processing for Assistive Technologies. on companion for isolated older adults. In Proceedings of the 14th Annual Meeting San Francisco, CA, USA, 26–31. of the Special Interest Group on Discourse and Dialogue. Metz, France, 148–150. [34] Ramin Yaghoubzadeh and Stefan Kopp. 2017. Enabling robust and fluid spoken [24] Gabriel Skantze. 2007. Error Handling in Spoken Dialogue Systems. Managing dialogue with cognitively impaired users. In Proceedings of the 18th Annual Uncertainty, Grounding and Miscommunication. Ph.D. Dissertation. Computer Meeting of the Special Interest Group on Discourse and Dialogue. Saarbrücken, Science and Communication, Department of Speech, Music and Hearing, KTH Germany, 273–283. Stockholm, Stockholm, Sweden. [35] Ramin Yaghoubzadeh, Marcel Kramer, Karola Pitsch, and Stefan Kopp. 2013. [25] Carolin Straßmann and Nicole C. Krämer. 2017. A categorization of virtual Virtual agents as daily assistants for elderly or cognitively impaired people. agent appearances and a qualitative study on age-related user preferences. In In Proceedings of the 13th International Conference on Intelligent Virtual Agents. Proceedings of the 17th International Conference on Intelligent Virtual Agents. Edinburgh, United Kingdom, 79–91. https://doi.org/10.1007/978-3-642-40415-3_7 Stockholm, Sweden, 413–422. https://doi.org/10.1007/978-3-319-67401-8_51 [36] Ramin Yaghoubzadeh, Karola Pitsch, and Stefan Kopp. 2015. Adaptive ground- [26] Carolin Straßmann, Astrid Rosenthal von der Pütten, Ramin Yaghoubzadeh, ing and dialogue management for autonomous conversational assistants for Raffael Kaminski, and Nicole Krämer. 2016. The effect of an intelligent virtual elderly users. In Proceedings of the 15th International Conference on Intelli- agent’s nonverbal behavior with regard to dominance and cooperativity. In gent Virtual Agents. Delft, The Netherlands, 28–38. https://doi.org/10.1007/ Proceedings of the 16th International Conference on Intelligent Virtual Agents. Los 978-3-319-21996-7_3 Angeles, CA, USA, 15–28. https://doi.org/10.1007/978-3-319-47665-0_2 [37] Victoria Young and Alex Mihailidis. 2010. Difficulties in automatic speech re- [27] David R. Traum. 1994. A Computational Theory of Grounding in Natural Language cognition of dysarthric speakers and implications for speech-based applications Conversation. Ph.D. Dissertation. University of Rochester, Rochester, NY, USA. used by the elderly: A literature review. Assistive Technology 22 (2010), 99–112. [28] Maria Velana, Sascha Gruss, Georg Layher, et al. 2017. The SenseEmotion data- https://doi.org/10.1080/10400435.2010.483646 base: A multimodal database for the development and systematic validation of 17