Conversational Assistants for Elderly Users –
                 The Importance of Socially Cooperative Dialogue
                 Stefan Kopp                                    Mara Brandt                                 Hendrik Buschmeier
         Bielefeld University, CITEC                   Bielefeld University, CITEC                        Bielefeld University, CITEC
           skopp@uni-bielefeld.de                    mbrandt@techfak.uni-bielefeld.de                     hbuschme@uni-bielefeld.de

              Katharina Cyra                                 Farina Freigang                                    Nicole Krämer
        University of Duisburg-Essen                    Bielefeld University, CITEC                      University of Duisburg-Essen
         katharina.cyra@uni-due.de                   farina.freigang@uni-bielefeld.de                     nicole.kraemer@uni-due.de

              Franz Kummert                              Christiane Opfermann                                    Karola Pitsch
         Bielefeld University, CITEC                  University of Duisburg-Essen                       University of Duisburg-Essen
       franz@techfak.uni-bielefeld.de               christiane.opfermann@uni-due.de                       karola.pitsch@uni-due.de

           Lars Schillingmann                              Carolin Straßmann                                     Eduard Wall
         Bielefeld University, CITEC                   University of Duisburg-Essen                      Bielefeld University, CITEC
      lschilli@techfak.uni-bielefeld.de               carolin.strassmann@uni-due.de                     ewall@techfak.uni-bielefeld.de

                                                         Ramin Yaghoubzadeh
                                                       Bielefeld University, CITEC
                                                    ryaghoub@techfak.uni-bielefeld.de

ABSTRACT
Conversational agents can provide valuable cognitive and/or emo-
tional assistance to elderly users or people with cognitive impair-
ments who often have difficulties in organizing and following a
structured day schedule. Previous research showed that a virtual
assistant that can interact in spoken language would be a desir-
able help for those users. However, these user groups pose spe-
cific requirements for spoken dialogue interaction that existing
systems hardly meet. This paper presents work on a virtual conver-
sational assistant that was designed for, and together with, elderly
as well as cognitively handicapped users. It has been specifically
developed to enable ‘socially cooperative dialogue’ – adaptive and
aware conversational interaction in which mutual understanding is
co-constructed and ensured collaboratively. The technical approach
is described and results of evaluation studies are reported.

KEYWORDS                                                                      Figure 1: Concept for an elderly user interacting with the
                                                                              virtual assistant ‘Billie’ in her home environment.
conversational assistants; elderly users; cooperative dialogue

1   INTRODUCTION                                                              needing home assistance services, and cognitively handicapped
In recent years politics and society have placed emphasis on ways             users that are already supported by professional care-givers. What
to enable a longer autonomous and self-determined life for elderly            both user groups have in common are mild cognitive impairments
people. One approach is the development of assistive technology.              that create a need for support with autonomously organizing and
However, this has often been focused on supporting physical tasks             following a structured day schedule.
(e.g., fetching or lifting objects, moving around) and it has been               While technical means of supporting this are already available,
struggling with questions of human–machine interaction and user               many elderly users have little prior experience with using assistive
acceptance. The goal of the kompass project (which started in 2015)           systems. Applying such technology thus requires to overcome a
was to develop a virtual assistant (‘Billie’) to accompany and guide a        ‘digital barrier’ both with the individual users as well as their care-
user throughout the day. The system has been specifically designed            providing environment. The kompass project built on pre-studies
for, and together with, two user groups: elderly users that live              [13] suggesting that natural spoken-language interaction with a
autonomously in their home environment but are on the verge of                virtual agent may be desirable and acceptable for these user groups.


                                                                         10
Building and applying conversational agents for these user groups,            3 REQUIREMENTS
however, raises it’s own challenges. Elderly users often have select-
                                                                              3.1 Interactional tasks
ively impaired abilities, e.g., for auditory perception, articulation,
adapting to a recommended interaction style, adhering to a clean              The goal of the KOMPASS project was to develop a conversational
turn-taking structure, or comprehending content of high informa-              assistant that helps users organize and keep track of their schedule
tion density [35, 37]. We thus set out to develop a conversational            for the day. This goal was identified by our application partner
agent that provides a dialogue style that enables robust and reliable,        (v. Bodelschwingh Foundation Bethel, Bielefeld, Germany) as an
yet acceptable spoken-language interactions with these user groups.           important need of their respective client groups. The conversational
We refer to this special quality as ‘socially cooperative dialogue’           agent system ‘Billie’ thus offers several functions within the domain
[32].                                                                         of schedule management. Users can enter their various kinds of
   In this paper we present our approach and report on results                appointments (single appointments, recurrent appointments with
obtained in evaluation studies. After discussing related work in              specification of recurrence, e.g., weekly, biweekly and for how long
the next section, Sect. 3 points our requirements before Sect. 4 de-          the recurrent appointment will be reiterated), they can choose being
scribes our approach to modeling socially cooperative dialogue in             reminded of them including setting a time for the reminder, and they
the virtual assistant ‘Billie’. Section 5 presents results and lessons        can edit already entered appointments. The editing of appointments
learned from several evaluation studies carried out with users in             comprises the following sub-tasks: Users can change any of the
the lab environment as well as in their real home environment (on-            appointment values ‘start time’, ‘end time’, ‘topic’ and ‘duration’,
going), showing how conversational agents can be built to achieve             they can delete or replace appointments within the calendar, and
the interaction abilities needed to provide elderly users and mildly          they can query their entered appointments of any point in time
cognitively impaired persons with successful assistance.                      (same day, same week, forthcoming day or days and weeks, and
                                                                              previous days or weeks). Moreover, the agent system provides for
                                                                              user-tailored suggestions for leisure time activities [17, 18] in order
2   RELATED WORK                                                              to promote a more active life.
Several conversational assistants have been developed for care-
related settings such as companionship for people living alone [20],
assistance in multilingual care-giving/receiving [31], or pain and            3.2     Socially cooperative dialogue
affect management [28]. An increasing body of work suggests that              In line with other previous work, focus groups and pre-studies that
spoken interaction with users with cognitive impairments seems,               we carried out within a user-centered design process made clear
in general, to be feasible and accepted by the user group, though             that assistive functions of the system preferably should be accessible
some requirements need to be met. Meis [15] noted that older sub-             and realized through easy-to-use spoken dialogue interaction [13].
jects want a spoken-dialogue helper to have a name and to react               Further analyses of several quasi experimental studies (run in a
contingently to social affordances such as expressions of gratitude.          Wizard-of-Oz scenario in 2015, as a semi-autonomous study in
Miehle and colleagues [16] noted that conversational assistants are           2017, and as a long-term study ongoing; see section 5) pointed to
required to speak sufficiently loud and with an appropriate pace but          the fact that spoken-language interaction, while being generally
were accepted as interlocutors in a study with elderly people. Bick-          preferred, raises a number of (well-known) challenges that tend
more and colleagues [2] analyzed long-term interactions of older              to be amplified in our user groups. Therefore, the conversational
adults with an agent-based coaching system using spoken language              assistant has to fulfill specific requirements for the users to be
while user input was given through a touchscreen and found them               acceptable and successful with respect to the various sub-tasks of
effective on the short term. Sidner and colleagues [23] attempted to          schedule management:
identify preferred domains of conversation or joint activity based
on this system design. Yaghoubzadeh and colleagues [36] reported                    • The system has to be able to deal with long, extensive user
that older adults and people with cognitive impairments are able                      utterances. Interruptions and barge-ins by the user must be
to successfully ground information. Explicit confirmation patterns                    possible at all times, in particular when they are instrumental
and a low information density (one information unit per utterance)                    in solving the current communicative task at hand. Overall,
enabled the user to detect and repair more of the system’s language                   turn-taking has to be cooperative such that interruptions
understanding problems and subsequent errors.                                         by the system should be foreshadowed through nonverbal
   A well-discussed concept that is central for successful human–                     behavior [33]. Simultaneously, the system must be robust to
human dialogue is ‘grounding’ [6]. Researchers have attempted to                      non-cooperative turn-taking behavior of the user such that
model it computationally for conversational agents, discretely as                     turn fights are avoided (generally yielding the turn to the
a finite-state process [27] or probabilistically using Bayesian net-                  user).
works [19]. Recent work on real-time dialogue systems has focused                   • Generally, the system must work to ensure dynamic coordin-
on advanced issues so that discourse context is taken into account                    ation of understanding and grounding in dialogue. Feed-
[24], partial and overlapping utterances can be grounded incre-                       back by the system to user input must be provided timely
mentally [29], groundedness can be estimated from multimodal                          to prevent long user turns, and clearly mark the system’s
feedback cues [4], or information from multiple modalities related                    current level of understanding [8, 9]; user feedback must
to socio-emotional aspects such as attention and engagement are                       be continuously processed and interpreted for indicators of
taken into account [14].                                                              miscommunication.


                                                                         11
    • The handling of understanding problems on part of the sys-                                    Fusion and
                                                                                                  interpretation
                                                                                                                                   Dialog
                                                                                                                                 management
                                                                                                                                                                     Behavior
                                                                                                                                                                    generation
                                                                                                                                                                                                       Behavior
                                                                                                                                                                                                      realisation
      tem or the user is crucial. User-initiated displays of non-                Nuance Dragon          Natural                                                    Natural
                                                                                                       language                                                   language
      understanding (e.g., “Sorry” or “Can you repeat please”) must                  ASR
                                                                                                     understanding                 flexdiam                      generation                           Asap
                                                                                                                                                                                                                    !
                                                                                                                              – Issues                                                               Realiser
      always be possible and handled by the system properly. For                   Tobii 4C                                   – Planning
                                                                                                                                                                           Gaze
                                                                                  Eyetracking                                 – flow_ctl
      non-specific system displays of non-understanding, a vari-                                      Estimation of
                                                                                                                              – User model
                                                                                                                              –…
                                                                                  Head-gesture
      ation of error handling strategies must be available, e.g.,                  recognition
                                                                                                       contact and
                                                                                                      engagement
                                                                                                                                                                   Gesture

      in form of reprompts and non-understanding notification,                    Filled pause                                                                                                      Calendar
                                                                                                                                                                      Calendar
                                                                                   detection                                                                                                        manager
      combined with more restrictive clarification sequences de-
      pending on the sub-task and local move. While reprompts
      may be beneficial for problem solving in its first issuing [17],                                                                              Timeboard
      a lack of progress in problem solving without a change in                                     asrstate       sil…               speech                                  silence                        spe…
      error recovery strategy may lead to further complications                                   userWords               Billie, I’d like to add a new a…                                                   On …
                                                                                                   userGaze        ?                               agent                                calendar             ag…
      (cf. [17]). This holds especially in action contexts like ap-                                    floor                                                                                       yielded
      pointment suggestions in which user responses can address                                  agentWords                                                     Okay, tell me about your ap…
                                                                                                 nlgRequest                                                  introduce_topic(abs)
      social matters like willingness, availability, disposition and                             agentGaze         idle                attentive                          speaking                   attentive
                                                                                                     …
      deontic authority. Thus, error handling on part of the system
      should employ strategies that clarify the user’s agreement
      or resistance [18].                                                      Figure 2: Overall architecture of the conversational assistant
    • Topic shifts by the user must be possible and followed readily
      by the system. This requires the system to also keep track
      of non-settled discourse segments and to return to them at               realization (synthesis, graphical components / GUI changes, control
      appropriate points in time (when the user does not pursue                of animated characters etc.).
      other discourse goals) and with a cooperative and gentle                    The architecture is built around a ‘timeboard’, a central rep-
      entrance strategy, possibly with repeating and rectifying                resentation that captures temporal information of interactional
      parts of the discourse unit that have been discussed already.            events on different tiers. Importantly, these tiers hold rewindable
      Generally, the system should avoid topic shifts. If they are             representations of certain and uncertain variables (probability dis-
      unavoidable, e.g., because a previous discourse topic has not            tributions) with generic metrics – like entropy – that serve as the
      been settled yet, they should be of as close a distance as               basis for local decision heuristics. Event-driven observers are used
      possible and must be marked explicitly.                                  to derive events from interval relations between existing ones, and
                                                                               trigger higher-level functions, most centrally the dialogue manager
   The requirements listed above are (necessary, but most probably
                                                                               proper, but also a contribution manager, which schedules queued
not sufficient) examples of a specific dialogue quality that we deem
                                                                               communicative intentions when the floor situation allows.
necessary for our user groups. We refer to this dialogue quality as
‘socially cooperative’ [32] and note that it goes well beyond classical
                                                                               4.2       Dialogue management
notions like grounding in dialogue, as it implies a specific role that
the agent has to fulfill consistently throughout the interaction. This         To realize the required dialogue management abilities, the ‘flexdiam’
role entails a range of collaborative-supportive action policies, e.g.,        [34] system has been developed. Following an issue-based approach,
for readily following topic shifts, yielding turns, adhering to rules          it generally pursues a single joint task and discourse model for both
of politeness, and adapting dialogue structures thoroughly to the              interactants. The basic structure of the joint task and discourse
needs of the user.                                                             model is a forest of independent but hierarchically interdependent
                                                                               agents termed ‘Issues’, as well as generic update rules to transform
4 APPROACH AND IMPLEMENTATION                                                  this forest after dialogue management invocations. When an Issue
                                                                               is instantiated, it is at the same time made a child of the Issue that
4.1 Architecture                                                               created it. Any path from a leaf Issue to the root corresponds to a
To account for the requirements identified above, we have based                nested (sub-)topic of discussion. Any number of topics can be active
the conversational assistant on an interaction framework that aims             at any one time and will be considered valid points of reference in
to support the features of incrementality (to quickly update and               parallel, if applicable according to their grounding state. To that
relay discussed information), provisions for representation and res-           end, any Issue can be in one of five states (new, entered, fulfilled,
olution of uncertainty (resulting from input and unclear grounding)            failed, obsolete).
with explicit representation of topics, structured hierarchically in               Invocations that trigger local processing in Issues come in two
units intuitive to laymen. The overall architecture is shown in fig. 2.        flavors: input handling (e.g., prompt request, NLU parse) and plan
It is built on top of the IPAACA middleware [21], a distributed,               structure updates (e.g., child issue progressed, completed or inval-
platform-independent implementation of a general model for in-                 idated). Issues will decide along their local path in the hierarchy,
cremental dialogue processing proposed by Schlangen and Skantze                and based on the current global context, whether they can provide
[22]. This provides the back-end for the connection of the core                a plan to handle an invocation. If an Issue cannot handle an input
dialogue management components to input (including ASR, tagger                 handling invocation locally, a preference is marked to let its parent
and parser, eye tracker, keyboard/mouse/touch etc.), multimodal                handle it instead. Partial localized processing does not preclude
fusion, behavior planning (NLG, gaze, gesture, calender) and output            propagation through the hierarchy, though. This allows for situated


                                                                          12
partial interpretation and processing, which is most specific and
situation-dependent in the leaves, and most generic and general in
the roots of the forest.
   If a user contribution does not fit well into any active Issue, a
discourse transition based on user initiative can be assumed to have
taken place. Depending on the situation, this could be construed
as either a forward-looking contribution (if anticipated by the cur-
rently invoked entrance point or a direct ancestor) or a real topic
jump. A new branch is created then, marked as entered, and moved
to the top of the entry point priority queue.
   Note that the present system is suited to quick, interactive ap-
proaches to spoken interaction and to modeling real-world applica-
tions within limited domains. Manual extension is quite straight-
forward. Incremental processing and the handling of uncertain
input and information derived from it has received special focus,
the ‘output’ side employs a similar notion of indeterminate state
until evidence for communicative success provides a precondition
for grounding being attested. Communicative plans are capable of
employing several modalities and the implemented suite of basic
Issues for grounding problems can be fine-tuned to cover a wide
space of varying explicitness, verbosity, and conversational styles,
which can be used to seed user models that best suit the estim-
ated capabilities and preferences of our specific user groups. This
extends to information density (configurable via different options
for packaging and different approaches to confirmation requests),               Figure 3: Screenshot of the system processing and fusing
but also discourse structure: explicit ratification for topic jumps             socio-communicative signals.
beyond a distance threshold (and implicit acceptance by means of
contingent continuation by the user) are currently in development.
                                                                                4.4    Multimodal expressiveness
                                                                                Regarding output behavior, the conversational assistant ‘Billie’ is
4.3    Socio-communicative signal processing                                    enhanced with multimodal cues such as gestures, facial expressions
Human communication is highly multi-modal and thus the ability                  and head movements in order to develop a more natural behavior
to process this variety of information is very important to facilitate          and making it easily accessible, understandable and helpful for
communication with a virtual agent. Therefore, several modules                  the human user. The cues were selected based on an analysis of
in our architecture recognize visual communication signals and                  their form, function and frequency in natural interaction data [11].
non-verbal socio-emotional speech cues. Confirmations play an                   Specifically, we simulate cues that serve the pragmatic functions to
important role in the dialog structure of the interaction with ‘Billie’.        convey and mark emphasis, de-emphasis and (un-)certainty of the
Therefore, we focused on the recognition of natural confirmation                speaker. These functions have been successfully mapped onto the
signals, like nodding and non-lexical confirmations like “mhm”,                 conversational agent [10, 12]. For the calendar domain, key phrases
which are typically not recognized by automatic speech recognition.             were selected and are accompanied by these multimodal functions if
To detect non-lexical confirmations, the speech signal is segmen-               applicable (emphasis: “Let’s continue!”, de-emphasis: “Good, this is
ted into speech intervals using voice activity detection (VAD). Our             canceled”, uncertainty: “Did you say ‘swimming’?”; see fig. 4). The
module detects non-lexical confirmations by extracting acoustic                 multimodal expressiveness has been evaluated with the participants
features and classifying the result using a Support Vector Machine              in lab-based studies (e.g., showing better information uptake) and
(SVM). If confirmations are detected, the component sends mes-                  will be evaluated systematically during a long-term study.
sages via the IPAACA middleware to inform other components [3].                    Additionally to the assistant ‘Billie’ itself, we designed different
Further, the system is able to detect human head nods based on                  states, visual and sound features of the weekly-based calendar that
dynamic time warping and estimations of head pose angles from                   support the dialogue between user and agent multimodally. As
facial landmark features [30]. In addition, the face detection is used          previous analyses have shown [8], users orient toward the calendar
to verify contact with the user. A cue aggregation module combines              area of the interface while entering appointments as this is the
signals detected in individual modalities to derive a higher-level              area of interest for the ongoing interactional task. So, to design
interpretation using a Bayesian network (see Fig. 3). Currently, the            a responsive and recipient system that supports the dialogue not
system detects if the user signals confirmation by combining non-               only with verbal and non-verbal features of the agent, the calendar
lexical confirmation and nod detection. User contact is detected by             provides visual cues (mainly through highlighting) to comprehen-
combining face detection and eye contact related cues.                          sion hypotheses of the system even before words are uttered by
                                                                                the agent (see fig. 5). These visual updates represent the system’s


                                                                           13
   Emphasizing          De-emphasizing            Uncertainty


  abstract deictic           brushing         palm up open hand

Figure 4: Multimodal cues used by the conversational assist-                  Figure 6: Example of the data recorded during the field eval-
ant for different pragmatic functions.                                        uation study.


                                                                              5     EVALUATION RESULTS
                                                                              In our project we followed an iterative design-implement-evaluation
                                                                              approach that comprised a number of empirical evaluations of in-
                                                                              dividual sub-systems of the agent [3, 5, 8–10, 17, 18, 25, 26, 30].
                                                                              Further, to gather training data for the socio-emotional signal re-
                                                                              cognition components and to get insights into users’ reactions
                                                                              to potential interaction problems, an initial Wizard-of-Oz study
                                                                              with 53 participants from the different user groups (18 senior parti-
                                                                              cipants, 19 participants with cognitive impairments and, 16 controls
                                                                              from the local student population) was carried out. In order to elicit
                                                                              participants’ reactions, the agent’s behavior in this study followed
                                                                              a script that was designed to create a number of typical interaction
                                                                              problems. Participants could negotiate and enter their own appoint-
                                                                              ments, but could also use previously prepared appointment cards.
Figure 5: The calendar including a visual cue (highlighting                   In the following, we report more recent studies that were carried
of the current day), an appointment “Coffee and Cake” and                     out to evaluate the socially cooperative dialogue abilities of the
the conversational assistant confirming the successful entry                  full-blown conversational assistant in less restricted interactions.
of the appointment.
                                                                              5.1    Lab-based evaluation
                                                                              Based on first insights gathered from the above described WOz-
                                                                              study as well as on knowledge acquired in preparatory studies
status regarding the understanding of user input. Furthermore, a              [13, 35], a first version of a semi-autonomous agent was evaluated
sound was added to mark the successful entry of appointments.                 in a laboratory setting. This study investigated whether participants
                                                                              from the different user groups – without specific instructions –
                                                                              were able to carry out calendar-related tasks through spoken in-
4.5    Data Recording                                                         teraction with the agent. Furthermore, the study’s objective was
Our system architecture is designed to be used in both lab and field          to investigate how participants manage transitions between bigger
environments. As manual recording control is not feasible in long             topics/issues, how the length of participants’ utterances vary given
term field studies, the architecture contains an automatic recording          different confirmation strategies, how users react to communication
module which starts recording when the user starts interacting with           of uncertainty, and whether the agent’s ways of guiding the users’
the system by pushing a button. Recording is automatically stopped            attention (via voice, manual gesture, gaze, calendar highlighting,
if the user says goodbye and, in order to ensure users’ privacy, when         and sounds) is effective.
the user is not visible to the system and does not react to system               We employed a system that autonomously handled dialogue
prompts for an extended period of time, or when a system error is             management for entering appointments as well as for stating ap-
detected. The recordings comprise five video and five audio tracks            pointment suggestions. A human ‘wizard’ was included only for
which are compressed in real-time using hardware acceleration.                controlling transitions between global modes of user entering ap-
One video track (depicted in Fig. 6) is a 4-in-1 overview of the other        pointments, auto-generated partial suggestions of appointments
video tracks. System and interaction log files are archived after each        by the agent (“Would you like to do something on Saturday?”), and
session.                                                                      closing the interaction. 44 participants took part in the study: 19


                                                                         14
older adults (SEN) aged about 75+; 15 cognitively impaired adults              the prototype system is placed into the participants’ apartment for
(CIM) of working age; and 10 students serving as control group                 a period of about 15 days (including setup and dismounting day).
(CTL). The task was free-form entering of appointments. All sub-               Participants are asked to manage their daily schedule together with
jects managed to enter the required number of appointments into                ”Billie” and to jot down their impressions about the system into a
the calendar. The number of final entries averaged 10.4, 8.5, and              research diary (freely as well as to structured questions). After the
8.9 for CTL, SEN and CIM, respectively (including up to two agent-             period of applying the system, participants give a final rating in
recommended items). Older adults and the group with impairments                the form of a semi-structured interview.
on average spent about 20% longer on a topic than controls; some
participants from the CIM group made long hesitations in isolated                 First results. The above described study design has been carried
instances (up to tens of seconds; see [34] for a detailed discussion).         out with one female senior person as the first long-term study. The
Still, the socially cooperative dialogue abilities of the agent enabled        system has been in use in the participant’s home for 13 days (exclud-
the user to conduct successful repair with the agent and to settle in          ing a setup and a dismounting day), during which she interacted
on acceptable solutions in every subtask.                                      61 times with the system, for a total duration of 284 minutes and
    The study was also conducted to gain qualitative insight into              46 seconds. She used the system for a mean number of 4.8 times
the repair, revision and meta-communicative patterns exhibited by              per day (SD = 1.7, Min = 2, Max = 8). Although usage over the
the user groups. Further, we wanted to observe temporal aspects                duration of the study varied between days, it did not differ much
of the planning and verbalizing of an appointment to learn about               between the first seven days (M = 5.3SD = 1.8, Min = 3, Max = 8)
participants’ practices in different phases of appointment entry               and the last six days (Mean = 4.2, SD = 1.6, Min = 2, Max = 6),
(to eventually design the timing and turn taking of the system).               suggesting that the participant did not lose interest.
Analyses of the data are still ongoing and focus among others (1)                 Interaction durations varied greatly, with the shortest interaction
on the analysis of gaze when interactional trouble occurs, (2) on              lasting only 6 seconds and the longest interaction lasting 18 minutes
the system’s repair strategies and (3) on the different multimodal             and 34 seconds. Mean interaction duration is 4:26 minutes (SD =
states and uptake strategies of the system (as a representation of             5:11). This is mainly due to the fact that interactions differ due
its recipiency) with effects on turn production of the users.                  to the type of activity. Interactions that are initiated by the user
                                                                               are typically longer, whereas agent-initiated reminder interaction
                                                                               can often be handled quickly and usually do not lead to longer
5.2    Field study                                                             dialogues. Pending a detailed analysis of the actual interaction
A long-term field study is currently ongoing to evaluate the sys-              logs, we differentiate between reminder-based interactions and
tem’s performance and effects, as well as how participants adopt               other interactions by setting a threshold of 120 seconds. The 33
and handle it in their home environment over a 15-day period. For              shorter – probably reminder – interactions have a mean duration
this study we implemented and apply a fully autonomous system                  of 45 seconds (SD = 0:33). The 28 longer interactions have a mean
with no additional aids by a human wizard (see fig. 7 for the setup).          duration of 8:57 minutes (SD =4:37). This differentiation gives
This study comprises an ethnographic component focusing on daily               further insight into daily usage of the system: A mean of 2.5 (SD =
life management in the homes of seniors living alone and people                1.7) of the interactions per day were reminders, and a mean of 2.2
with cognitive impairments in supported-living [1, 7]. These ana-              (SD = 1.0) of the interactions lasted longer than 120 seconds.
lyses aim to shed light also on broader questions such as: What are               The durations of the interactions indicate that the participant
the challenges when a novel technology is brought into a household             used the conversational assistant quite a lot. After the 13 days of
of the target groups? What are the participants’ experiences and               usage she had entered 67 unique events in her calender, five of
expectations of a technological assistant? What are the effects of             which were serial events (yielding a total of 132 events displayed
the assistant on participants’ daily routines and in particular their          in the calendar). The large number of reminder-based interactions
schedule management? And, an overall topic considered through-                 (33) also indicates that she successfully used this function of the
out the project, what are issues with regard to privacy and data               system.
protection with our special user groups?                                          As the first main study has just ended, further analyses regarding
    Overall, the field study followed an iterative approach consisting         the changes of daily life and routines are still due. However, the
of three phases. In the pre-pilot study, we aim to acquaint parti-             pre-pilot and pilot study already has been carried out with other
cipants with the system and to identify their specific expectations            elderly participants who could interact with the prototype system
and needs. Researchers lead semi-structured interviews in the apart-           for three days at home. Different assessments or concerns were
ments of the participants and discuss the possible placement of the            gathered from these participants as anecdotal feedback:
system in the apartment using a full-size paper prototype. Next, a                 • Concerning the social presence and relationship: “We greet
pilot study evaluates the feasibility of the main study with a special               each other kindly every morning and he is asking what he
focus on the robustness of the system and its performance outside                    can do for me. It’s great.“
of the lab. Therefore the prototype system is set up in the apart-                 • Concerning the dialogue: “Whenever we had problems in a
ments of the participants for a period of about 48 hours. In addition                conversation, we could resolve them. Was really nice.”
to the system evaluation, we want to learn about the acceptance                    • After coming out of coma and being isolated in hospital:
of the system by the participants and their assessment of the dia-                   “I would have been glad about having such a guy next to
logue design. This provides the basis for further optimizations of                   my bed.” (in order to practice speaking and to have social
the dialogue design and preparation of the main study, for which                     interactions)


                                                                          15
                                                                             the design and implementation phases, which also helped a lot in
                                                                             increasing acceptance and willingness to participate in the project.

                                                                             ACKNOWLEDGMENTS
                                                                             This research was supported by the German Research Foundation
                                                                             (DFG) in the Cluster of Excellence ‘Cognitive Interaction Techno-
                                                                             logy’(EXC 277) and by the German Federal Ministry of Education
                                                                             and Research (BMBF) in the project ‘KOMPASS’ (FKZ 16SV7271K).

                                                                             REFERENCES
                                                                              [1] Antje Amrhein, Katharina Cyra, and Karola Pitsch. 2016. Processes of reminding
                                                                                  and requesting in supporting people with special needs. Human practices as basis
                                                                                  for modeling a virtual assistant?. In Proceedings 1st ECAI Workshop on Ethics in
                                                                                  the Design of Intelligent Agents. The Hague, The Netherlands, 18–23.
                                                                              [2] Timothy W. Bickmore, Rebecca A. Silliman, Kerrie Nelson, Debbie M. Cheng,
                                                                                  Michael Winter, Lori Henault, and Michael K. Paasche-Orlow. 2013. A randomized
                                                                                  controlled trial of an automated exercise coach for older adults. Journal of the
                                                                                  American Geriatrics Society 61 (2013), 1676–1683. https://doi.org/10.1111/jgs.12449
                                                                              [3] Mara Brandt, Britta Wrede, Franz Kummert, and Lars Schillingmann. 2017. Con-
                                                                                  firmation detection in human-agent interaction using non-lexical speech cues.
                                                                                  Presented at the AAAI Fall Symposium on Natural Communication for Human-
                                                                                  Robot Collaboration. (2017).
                                                                              [4] Hendrik Buschmeier and Stefan Kopp. 2013. Co-constructing grounded symbols—
                                                                                  Feedback and incremental adaptation in human–agent dialogue. Künstliche
                                                                                  Intelligenz 27 (2013), 137–143. https://doi.org/10.1007/s13218-013-0241-8
                                                                              [5] Hendrik Buschmeier and Stefan Kopp. 2018. Communicative listener feedback
                                                                                  in human–agent interaction: artificial speakers need to be attentive and adaptive.
Figure 7: The prototype system deployed in an elderly parti-                      In Proceedings of the 17th International Conference on Autonomous Agents and
cipant’s apartment.                                                               Multiagent Systems. Stockholm, Sweden.
                                                                              [6] Herbert H. Clark. 1996. Using Language. Cambridge University Press, Cambridge,
                                                                                  UK. https://doi.org/10.1017/CBO9780511620539
                                                                              [7] Katharina Cyra, Antje Amrhein, and Karola Pitsch. 2016. Fallstudien zur All-
    • Concerning the size of the current setup and affordability:                 tagsrelevanz von Zeit- und Kalenderkonzepten. In Mensch und Computer 2016
      “Who wants to have this in the living room? Me not and I                    Kurzbeiträge. Aachen, Germany, 1–5. https://doi.org/10.18420/muc2016-mci-0253
                                                                              [8] Katharina Cyra and Karola Pitsch. 2017. Dealing with long utterances: How
      wouldn’t buy it either, if it’s a lot of money.”                            to interrupt the user in a socially acceptable manner?. In Proceedings of the
    • Concerning the duration of acquaintance: “He doesn’t know                   5th International Conference on Human Agent Interaction. Bielefeld, Germany,
      anything about me. If that’s possible, one has to use such a                341–345. https://doi.org/10.1145/3125739.3132586
                                                                              [9] Katharina Cyra and Karola Pitsch. 2017. Dealing with ‘long turns‘ produced
      device for at least half a year to get used to it.”                         by users of an assistive system: How missing uptake and recipiency lead to
                                                                                  turn increments. In Proceedings of the 26th IEEE International Symposium on
                                                                                  Robot and Human Interactive Communication. Lisbon, Portugal, 329–334. https:
6   CONCLUSIONS                                                                   //doi.org/10.1109/ROMAN.2017.8172322
The present work has explored how conversational agents can be               [10] Farina Freigang, Sören Klett, and Stefan Kopp. 2017. Pragmatic multimodality:
                                                                                  Effects of nonverbal cues of focus and certainty in a virtual human. In Proceedings
used to provide cognitive or emotional assistance to elderly users.               of the 17th International Conference on Intelligent Virtual Agents. Stockholm,
We have focused in particular on the use of spoken language dia-                  Sweden, 142–155. https://doi.org/10.1007/978-3-319-67401-8_16
logue as a preferred way of interacting with technical systems (as           [11] Farina Freigang and Stefan Kopp. 2015. Analysing the modifying functions of
                                                                                  gesture in multimodal utterances. In Proceedings of the 4th Conference on Gesture
indicated by studies by others as well as ourselves). Yet, enabling               and Speech in Interaction (GESPIN). Nantes, France, 107–112.
successful and acceptable dialogue with these user groups raises             [12] Farina Freigang and Stefan Kopp. 2016. This is what’s important—using speech
                                                                                  and gesture to create focus in multimodal utterance. In Proceedings of the 16th
several challenges and communication problems abound quickly                      International Conference on Intelligent Virtual Agents. Los Angeles, CA, USA,
with off-the-shelf dialogue system technology. However, our find-                 96–109. https://doi.org/10.1007/978-3-319-47665-0_9
ings indicate that virtual assistants can still be an effective and          [13] Marcel Kramer, Ramin Yaghoubzadeh, Stefan Kopp, and Karola Pitsch. 2013. A
                                                                                  conversational virtual human as autonomous assistant for elderly and cognitively
acceptable help if they provide abilities for the kind of socially                impaired users? Social acceptability and design considerations. In Proceedings of
cooperative dialogue needed to resolve these issues. The key in-                  INFORMATIK 2013. Koblenz, Germany, 1105–1119.
sight of the present project is how conversational agents can be             [14] Gregor Mehlmann, Kathrin Janowski, and Elisabeth André. 2016. Modeling
                                                                                  grounding for interactive social companions. Künstliche Intelligenz 30 (2016),
built such that this is possible for the majority of issues, even for             45–52. https://doi.org/10.1007/s13218-015-0397-5
the special user groups of persons with a mild cognitive impair-             [15] Markus Meis. 2013. Nutzerzentrierte Entwicklung eines Erinnerungsas-
                                                                                  sistenten. Presented at Abschlusssymposium Niedersächsischer Forschungs-
ment (and often also additional motoric or perceptual handicaps).                 verbund Gestaltung altersgerechter Lebenswelten. (2013).              https://www.
This requires numerous things, from the processing of subtle, mul-                altersgerechte-lebenswelten.de/
timodal and context-dependent communication-relevant signals,                [16] Juliana Miehle, Ilker Bagci, Wolfgang Minker, and Stefan Ultes. 2017. A so-
                                                                                  cial companion and conversation partner for elderly. In Proceedings of the 8th
to generating them in combination with visual cues (calender), to                 International Workshop On Spoken Dialogue Systems. Farmington, PA, USA.
enabling a highly flexible dialogue with responsive turn-taking,             [17] Christiane Opfermann and Karola Pitsch. 2017. Reprompts as error handling
communicative feedback, and (pro-)active strategies for avoiding                  strategy in human–agent-dialog? User responses to a system’s display of non-
                                                                                  understanding. In Proceedings of the 26th IEEE International Symposium on Robot
communication problems as well as repairing them. One prerequis-                  and Human Interactive Communication. Lisbon, Portugal, 310–316. https://doi.
ite to achieve this was a high degree of user involvement throughout              org/10.1109/ROMAN.2017.8172319


                                                                        16
[18] Christiane Opfermann, Karola Pitsch, Ramin Yaghoubzadeh, and Stefan Kopp.                           an automatic pain- and emotion-recognition system. In Proceedings of the 4th
     2017. The communicative activity of ‘making suggestions’ as an interactional                        IAPR TC 9 Workshop on Pattern Recognition of Social Signals in Human-Computer-
     process: Towards a dialog model for HAI. In Proceedings of the 5th International                    Interaction. Cancun, Mexico, 127–139. https://doi.org/10.1007/978-3-319-59259-6
     Conference on Human Agent Interaction. Bielefeld, Germany, 161–170. https:                     [29] Thomas Visser, David R. Traum, David DeVault, and Rieks op den Akker. 2014.
     //doi.org/10.1145/3125739.3125752                                                                   A model for incremental grounding in spoken dialogue systems. Journal on Mul-
[19] Tim Paek and Eric Horvitz. 2000. Conversation as action under uncertainty. In                       timodal User Interfaces 8 (2014), 61–73. https://doi.org/10.1007/s12193-013-0147-7
     Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford,        [30] Eduard Wall, Lars Schillingmann, and Franz Kummert. 2017. Online nod detec-
     CA, USA, 455–464.                                                                                   tion in human–robot interaction. In Proceedings of the 26th IEEE International
[20] Lazlo Ring, Lin Shi, Kathleen Totzke, and Timothy Bickmore. 2014. Social                            Symposium on Robot and Human Interactive Communication. Lisbon, Portugal,
     support agents for older adults: longitudinal affective computing in the home.                      811–817. https://doi.org/10.1109/ROMAN.2017.8172396
     Journal on Multimodal User Interfaces 9 (2014), 79–88. https://doi.org/10.1007/                [31] Leo Wanner, Elisabeth André, Josep Blat, et al. 2017. KRISTINA: A knowledge-
     s12193-014-0157-0                                                                                   based virtual conversation agent. In Proceedings of the 15th International Confer-
[21] David Schlangen, Timo Baumann, Hendrik Buschmeier, Okko Buß, Stefan Kopp,                           ence on Practical Applications of Agents and Multi-Agent Systems. Porto, Portugal,
     Gabriel Skantze, and Ramin Yaghoubzadeh. 2010. Middleware for incremental                           284–295. https://doi.org/10.1007/978-3-319-59930-4_23
     processing in conversational agents. In Proceedings of the 11th Annual Meeting of              [32] Ramin Yaghoubzadeh, Hendrik Buschmeier, and Stefan Kopp. 2015. Socially
     the Special Interest Group in Discourse and Dialogue. Tokyo, Japan, 51–54.                          cooperative behavior for artificial companions for elderly and cognitively im-
[22] David Schlangen and Gabriel Skantze. 2011. A general, abstract model of                             paired people. In Proceedings of the 1st International Symposium on Companion-
     incremental dialogue processing. Dialogue and Discourse 2 (2011), 83–111.                           Technology. Ulm, Germany, 15–19.
     https://doi.org/10.5087/dad.2011.105                                                           [33] Ramin Yaghoubzadeh and Stefan Kopp. 2016. Towards graceful turn management
[23] Candace Sidner, Timothy Bickmore, Charles Rich, Barbara Barry, Lazlo Ring,                          in human-agent interaction for people with cognitive impairments. In Proceedings
     Morteza Behrooz, and Mohammad Shayganfar. 2013. Demonstration of an always-                         of the 7th Workshop on Speech and Language Processing for Assistive Technologies.
     on companion for isolated older adults. In Proceedings of the 14th Annual Meeting                   San Francisco, CA, USA, 26–31.
     of the Special Interest Group on Discourse and Dialogue. Metz, France, 148–150.                [34] Ramin Yaghoubzadeh and Stefan Kopp. 2017. Enabling robust and fluid spoken
[24] Gabriel Skantze. 2007. Error Handling in Spoken Dialogue Systems. Managing                          dialogue with cognitively impaired users. In Proceedings of the 18th Annual
     Uncertainty, Grounding and Miscommunication. Ph.D. Dissertation. Computer                           Meeting of the Special Interest Group on Discourse and Dialogue. Saarbrücken,
     Science and Communication, Department of Speech, Music and Hearing, KTH                             Germany, 273–283.
     Stockholm, Stockholm, Sweden.                                                                  [35] Ramin Yaghoubzadeh, Marcel Kramer, Karola Pitsch, and Stefan Kopp. 2013.
[25] Carolin Straßmann and Nicole C. Krämer. 2017. A categorization of virtual                           Virtual agents as daily assistants for elderly or cognitively impaired people.
     agent appearances and a qualitative study on age-related user preferences. In                       In Proceedings of the 13th International Conference on Intelligent Virtual Agents.
     Proceedings of the 17th International Conference on Intelligent Virtual Agents.                     Edinburgh, United Kingdom, 79–91. https://doi.org/10.1007/978-3-642-40415-3_7
     Stockholm, Sweden, 413–422. https://doi.org/10.1007/978-3-319-67401-8_51                       [36] Ramin Yaghoubzadeh, Karola Pitsch, and Stefan Kopp. 2015. Adaptive ground-
[26] Carolin Straßmann, Astrid Rosenthal von der Pütten, Ramin Yaghoubzadeh,                             ing and dialogue management for autonomous conversational assistants for
     Raffael Kaminski, and Nicole Krämer. 2016. The effect of an intelligent virtual                     elderly users. In Proceedings of the 15th International Conference on Intelli-
     agent’s nonverbal behavior with regard to dominance and cooperativity. In                           gent Virtual Agents. Delft, The Netherlands, 28–38. https://doi.org/10.1007/
     Proceedings of the 16th International Conference on Intelligent Virtual Agents. Los                 978-3-319-21996-7_3
     Angeles, CA, USA, 15–28. https://doi.org/10.1007/978-3-319-47665-0_2                           [37] Victoria Young and Alex Mihailidis. 2010. Difficulties in automatic speech re-
[27] David R. Traum. 1994. A Computational Theory of Grounding in Natural Language                       cognition of dysarthric speakers and implications for speech-based applications
     Conversation. Ph.D. Dissertation. University of Rochester, Rochester, NY, USA.                      used by the elderly: A literature review. Assistive Technology 22 (2010), 99–112.
[28] Maria Velana, Sascha Gruss, Georg Layher, et al. 2017. The SenseEmotion data-                       https://doi.org/10.1080/10400435.2010.483646
     base: A multimodal database for the development and systematic validation of


                                                                                               17