Crowdsourcing for Reminiscence Chatbot Design

   Svetlana Nikitina                        Florian Daniel           Marcos Baez, Fabio Casati             Georgy Kopanitsa
   University of Trento and          Politecnico di Milano, DEIB      University of Trento and       Tomsk Polytechnic University
Tomsk Polytechnic University           florian.daniel@polimi.it     Tomsk Polytechnic University          kopanitsa@tpu.ru
  svetlana.nikitina@unitn.it                                    baez@disi.unitn.it, fabio.casati@unitn.it


                            Abstract                                   us how limited our knowledge is in terms of effective
                                                                       strategies to maintain dialogs with older adults. Success
  In this work-in-progress paper we discuss the challenges in          stories are mostly limited to Wizard of Oz evaluations
  identifying effective and scalable crowd-based strategies for
                                                                       (Schlögl, Doherty, and Luz 2014), in which system func-
  designing content, conversation logic, and meaningful met-
  rics for a reminiscence chatbot targeted at older adults. We         tionality is partially emulated by a human operator, or based
  formalize the problem and outline the main research ques-            on fully human-operated agents. The few attempts at au-
  tions that drive the research agenda in chatbot design for rem-      tonomous agents highlight issues with the mismatch be-
  iniscence and for relational agents for older adults in general.     tween user expectations and the actual social capabilities
                                                                       of the agents (Tsiourti et al. 2016a), general challenges
                                                                       with designing conversations suitable to the target popu-
                 Context & Objectives                                  lation (Yaghoubzadeh, Pitsch, and Kopp 2015), and chal-
Reminiscence is the process of collecting and recalling                lenges with engaging older adults in question-based inter-
past memories through pictures, stories and other memen-               actions in particular (Fuketa, Morita, and Aoe 2013).
tos (Webster and Gould 2007). The practice of reminiscence                In this position paper we aim at identifying effective and
has well documented benefits on social, mental and emo-                scalable crowd-based strategies for designing content, con-
tional wellbeing (Subramaniam and Woods 2012; Huldtgren                versation rules, and meaningful metrics for a reminiscence
et al. 2015), making it a very desirable practice, especially          chatbot targeted at older adults. We build on the concept in-
for older adults. Research on technology-mediated reminis-             troduced in (Nikitina, Callaioli, and Baez 2018) and identify
cence has advanced our understanding into how to effec-                where and how crowdsourcing can help design and maintain
tively support this process, but has reached a limit in terms          of an agent-mediated reminiscence process, while address-
of the approaches to support more engaging reminiscence                ing the specific challenges posed by the target population.
sessions, effectively elicit information about the person, and
extend the practice of reminiscence to those with less oppor-                          Reminiscence Chatbot
tunities for face to face interactions.
   In our previous work (Nikitina, Callaioli, and Baez 2018)           The envisioned chatbot is based on the idea of automatically
we made a case for conversational agents in this domain,               guiding older adults through multimedia reminiscence ses-
and proposed the concept of a smart conversational agent               sions (Nikitina, Callaioli, and Baez 2018). It has the dual
that can drive personal and social reminiscence sessions               purpose of i) collecting and organising memories and pro-
with older adults in a way that is engaging and fun, while             file information, and ii) engaging older adults in conversa-
effectively collecting and organising memories and stories.            tions that are stimulating and fun. In Figure 1 we show an
The idea of conversational agents for older adults is not              example conversation and related main actions.
new, and they have been explored to support a wide vari-                  The example starts from the subject (the elder) providing
ety of activities and everyday tasks (Tsiourti et al. 2016a;           a memory in the form of a picture. In response, the chatbot
Vardoulakis et al. 2012; Hanke et al. 2016; Tsiourti et al.            poses a contextual question. In order to do so, it must be able
2016b), to act as social companions (Ring et al. 2013; 2015;           to understand the theme of the picture (big city) and to ex-
Demiris et al. 2016) and even to engage older adults in rem-           tract and understand information from pictures and text. In
iniscence sessions (Fuketa, Morita, and Aoe 2013).                     order to keep the conversation natural, it must further be able
   While these works give us valuable insights into the                to reference related conversation topics (the city of Trento)
opportunities of using conversational agents as an instru-             and, in order to show empathy, it must be able to sense the
ment to support reminiscence sessions, they also show                  feelings of the subject as the conversation evolves (e.g., it
                                                                       looks like the subject likes rock music, so it could be an idea
Copyright c 2018 for this paper by its authors. Copying permitted      to talk about that for some time). It would also be good if
for private and academic purposes.                                     the bot be able to sense the presence of peers (e.g., family
            Skyline           Married in Trento   Main actions                               conversational policy), π : S × A → {(s, p)} associating to
                                                     Showing understanding
                                                  Posing comments that show
                                                                                             each state and action a set {(s, p)} of possible target states s
                                                  understanding of the content and the       and the probability p with which that action should be cho-
                                                  user input. e.g., formulating question
         Nice picture!. It looks like a big
         city. Where was it taken?
                                                  based on the picture.                      sen (to model that conversations are not deterministic).
                                                     Eliciting questions                        In practice however the state space is infinite and the
      It was taken in Chicago                     Posing questions about the life of the
          That’s far away from Trento!.
                                                  person, using the pictures as triggers     possible conversations are also infinite so this FSM is not
                                                  (e.g.,“Were you visiting Chicago?”).
          Were you visiting Chicago?                                                         the right model. An alternative model is based on Event-
                                                     User engagement inquiry
      No, actually I was living there. I even
      got married in Chicago.                     Posing questions to check the user
                                                                                             Condition-Action (ECA) rules, where the event for example
                                                  engagement, and testing potential          is the sentence by the subject (the elder) and the condition
                                                  topics to address next.
          How long did you live there?
                                                                                             is some expression over what we know about the subject as
                                                     Bringing up content
      Moved back in the 80s, stayed 10
      years…                                      Bringing up multimedia material that       well as past events. This has however the same limitations
                                                  will help elicit information (e.g., rock   just discussed.
          Did you listen to rock back then?       video from the 80s).
                                                     Recovery                                   We observe that what we really want to have is a defini-
      All the time!                               Recovery strategy when the bot has
                                                  made an incorrect assessment in the
                                                                                             tion of the domain and range of the policy function π so that
                                                  conversation or the user has provided      we can learn a useful policy that can be applied to real life
                                                  conflictive information (e.g., ignoring
                                                  topic in incoherent marriage info).        conversations. On the action side (the range), we approach
                                                                                             the problem by clustering similar actions along several di-
 Figure 1: Example reminiscence session with bot actions                                     mensions, such as i) the type of actions (ask information,
                                                                                             make a comment, show interesting content) and ii) the topic
                                                                                             of conversation (talk about the picture you are showing, or
members or moderators helping with the chat). All this in-                                   about childhood, or about hobbies). Given the action type
formation helps the bot decide on appropriate next actions                                   and topic, there are many actual conversations and utter-
taking into account possible conversational goals (e.g., elicit                              ances, but at this level we are focused on learning types and
basic user profile data). Among the most complex decisions                                   topics rather than conducting an interaction within a topic or
to be taken is deciding if and when to change context in a                                   paraphrasing sentences.
conversation (e.g., to make the elder laugh).                                                   In terms of the domain a policy is defined on, what we
   All these requirements are particularly challenging since                                 wish to have is a description of the characteristics of the
special attention must be paid to the subject’s abilities and                                state (or event and condition) to which the policy applies.
limitations (Nurgalieva et al. 2017; Hawthorn 2000). For in-                                 For example, the crowd may tell us that after they learn the
stance, it is hard to cope with user-initiated context switches                              date of birth, they show newspaper covers of that year, or fa-
or to keep knowledge about subjects coherent due to cog-                                     mous people born the same day, or songs that where popular
nitive decline associated with age (Park, O’Connell, and                                     when the subject was very young. In this case the trigger of
Thomson 2003). Coping with these challenges is difficult                                     the action is the last conversation element where the subject
even for humans (Miron et al. 2017).                                                         is notifying the state of birth (or, in terms of events, it is the
   In the long term, our goal is to develop a crowd-powered                                  event of the system, somehow, coming to know the date of
chatbot that implements the necessary conversational logic,                                  birth of the person).
sensibility and tricks to engage older adults in pleasant and                                   The challenge here is therefore to understand what is the
satisfactory reminiscence sessions. The crowd should not                                     reasoning of crowd workers when they decide to take ac-
be involved in direct interactions with the elderly (like in                                 tions, and based on this reasoning identify the classes of state
some real-time crowdsourcing approaches studied in litera-                                   and event information we need to attach policies to.
ture (López et al. 2016; Ring et al. 2015)), nor should it be
used just to train black-box AI algorithms. The idea is to in-                               Crowdsourcing tasks
volve the crowd to elicit and represent reminiscence-specific
conversation knowledge explicitly in the form of some ded-                                   The counterpart of the model is the learning process, which
icated model, in order to be able to actively steer the conver-                              has to do with how to design and process the results of
sation into specific directions (e.g., to elicit health issues or                            crowdsourcing tasks. The objective we have in seeking the
family memories). In this paper, we focus on an intermedi-                                   proper task designs are the following: (i) identifying action
ate set of research objectives: identifying (i) how to model                                 types and topics (unless we want to fixe them a-priori), (ii)
the conversational knowledge the chatbot may rely on and                                     identifying when (based on which state or trigger) a person
(ii) how to use the crowd to learn and evaluate the model.                                   changes topic or shows specific content, and (iii) identifying
                                                                                             why (based on which state or trigger) the agent initiates a
                                                                                             conversation on a topic.
         Crowd-Supported Chatbot Design
                                                                                                To do this, we envision crowdsourcing tasks that aim at
Conversational Model Representation                                                          (i) exploring possible conversations (these can be Wizard of
Conceptually, a simple model we can imagine for a chatbot                                    Oz simulations), (ii) reflecting over previous conversations
is a state machine (S, A, δ, π, F ), where S denotes the states                              by the same worker or other workers to derive the “rules”
(a state includes the information on the subject and the con-                                that made the worker take a certain course of action, and
versation history), F denotes the final states, A is the set of                              (iii) aggregating these “rules” into a smaller coherent set that
(conversational) actions, δ is a state transition function (our                              reveals the characteristics that the policy model should have.
   For example, the crowd may reveal that they change topic        measure); ii) number of turns of conversation made before
whenever they sense that the person is sad talking about the       it drops; iii) times conversation drops overall; iv) domain-
current topic. This would tell us that an important compo-         specific metrics like the amount of content which the user
nent of the policy domain is the perceived emotional state,        has provided during one conversation session (amount of
something that therefore the agent should try to detect, and       pictures uploaded, amount of data attributes filled about a
that change in this emotional state should be a trigger to ei-     relevant person), and other task-completion metrics.
ther continue or change topic.
   We thus focus on the following research question (RQ):                                Related work
Which crowd-based strategies can help elicit effective con-        Crowdsourcing has been used to support all aspects of chat-
versation logic for conversations (reminiscence sessions)          bot design, from holding direct conversations with final
targeting older adults, and how?                                   users, to supporting conversation design – the latter be-
   Conversational logic includes understanding of: compo-          ing the family of approaches under which we position our
sition of Dialog State, when and how the State has to be           work. Prior work on crowdsourcing has addressed the boot-
changed, and what are the most important variables that af-        strapping challenge, investigating strategies to create di-
fect the state. That is, given:                                    alog datasets to train algorithms (Takahashi and Yokono
• the set of States S = {S1 , S2 , ...Sm }, where S is the state   2017; Lin, D’Haro, and Banchs 2016), infer conversation
  of the conversation that consists of multiple features (such     templates (Mitchell, Bohus, and Kamar 2014) or declar-
  as user profile info, dialog history, sentiments);               ative conversation models (Negi et al. 2009). It has also
                                                                   been explored to enrich conversation dialogs to provide
• the set of possible Goals in the conversation G =                meaning and context, by annotating dialogs with seman-
  {G1 , G2 , ...Gn }, where G is the current goal aimed at         tics and labels with, for example, polarity and appropri-
  (e.g., elicit information, tell a joke, show engagement con-     ateness (Lin, D’Haro, and Banchs 2016), extracting enti-
  tent); and                                                       ties (Huang 2016), as well as providing additional utter-
• the set of Actions A = {A1 , A2 , ...An }, A being the chat-     ances for more natural conversations (paraphrasing) (Jiang,
  bot action performed, which changes the state and satis-         Kummerfeld, and Laseck 2017). Other approaches incorpo-
  fies the current goal (e.g ask question to elicit info);         rate the crowd in the evaluation of chatbot quality, mak-
                                                                   ing sure crowd contributions are valid and safe (Chkroun
the aim is to:
                                                                   and Azaria 2018; Huang et al. 2016) and even allowing
• identify the composition of current State; and                   users to train chatbots directly (Chkroun and Azaria 2018).
• identify the policy, i.e., which Action to take given current    Acknowledging that chatbot conversations are not perfect,
  state S and the Goals G                                          some approaches explore strategies to escalate conversation
                                                                   decisions to the crowd in cases where the chatbot is not able
• such that                                                        to interpret or serve the user request (Behera 2016).
                          π(G, S) → S 0                               The above highlight the potential of crowdsourcing for
  where Policy π is a rule that defines the transition from        designing chatbots. We take these approaches as the start-
  state S to state S 0 and depends on the Current State S and      ing point for exploring the specific challenges of design-
  current Goals G of the conversation.                             ing and maintaining a reminiscence bot. Previous work in
                                                                   this domain – though valuable in insights – has been limited
  The research question is actually of more general nature,        to human-operated chatbots and Wizard of Oz evaluations,
and the resulting approach can be applied to any social chat-      highlighting the complexity of chatbot design in general and
bot. To us, reminiscence is an application domain we have          in particular for our target population (Tsiourti et al. 2016a;
experience with and we want to contribute to.                      Fuketa, Morita, and Aoe 2013; Yaghoubzadeh, Pitsch, and
                                                                   Kopp 2015).
Success Metrics
Different metrics have been proposed for evaluating the                         Ongoing and Future Work
quality of conversations with dialog agents, such as: i) user
                                                                   Next, we are going to define concrete crowdsoursing strate-
engagement (Cervone et al. 2017; Fitzpatrick, Darcy, and
                                                                   gies to elicit the nature of the states, goals and actions that
Vierhile 2017), ii) task completion (Huang, Lasecki, and
                                                                   will give structure to the model. Then, we will focus on tasks
Bigham 2015), iii) conversation quality: including dialog
                                                                   to fill the model with data and on algorithms to effectively
consistency and memory of past events (Lasecki et al. 2013),
                                                                   aggregate and apply the elicited knowledge.
iv) human-like communication (Kopp et al. 2005). The ap-
proach to evaluation – and therefore the choice of metrics –
is based on the aim of the agent: having an engaging chat                             Acknowledgments
or performing a specific task (e.g., booking a flight). In our     This work has received funding from the EU Horizon 2020
case, the reminiscence chatbot is a combination of conver-         Marie Skłodowska-Curie grant agreement No 690962. It
sational and task-based agent, as it aims at both having an        was also supported by the project “Evaluation and enhance-
engaging conversation with the user and collecting infor-          ment of social, economic and emotional wellbeing of older
mation while doing so. Therefore, we consider metrics for          adults” under the agreement No.14.Z50.31.0029, Tomsk
both types of agents, including: i) engagement (as subjective      Polytechnic University.
                       References                               Lin, L.; D’Haro, L. F.; and Banchs, R. 2016. A web-based
Behera, B. 2016. Chappie-a semi-automatic intelligent chat-     platform for collection of human-chatbot interactions. In
bot.                                                            Proceedings of the Fourth International Conference on Hu-
Cervone, A.; Tortoreto, G.; Mezza, S.; Gambi, E.; Ric-          man Agent Interaction, 363–366. ACM.
cardi, G.; et al.      2017.    Roving mind: a balancing        López, A.; Ratni, A.; Trong, T. N.; Olaso, J. M.; Montene-
act between open–domain and engaging dialogue systems.          gro, S.; Lee, M.; Haider, F.; Schlögl, S.; Chollet, G.; Joki-
In Alexa Prize, volume 1. https://developer. amazon.            nen, K.; et al. 2016. Lifeline dialogues with roberta. In
com/alexaprize/proceedings.                                     International Workshop on Future and Emerging Trends in
Chkroun, M., and Azaria, A. 2018. did i say something           Language Technology, 73–85. Springer.
wrong?: Towards a safe collaborative chatbot.                   Miron, A. M.; Thompson, A. E.; McFadden, S. H.; and
Demiris, G.; Thompson, H. J.; Lazar, A.; and Lin, S.-Y.         Ebert, A. R. 2017. Young adults concerns and cop-
2016. Evaluation of a digital companion for older adults        ing strategies related to their interactions with their grand-
with mild cognitive impairment. In AMIA Annual Sympo-           parents and great-grandparents with dementia. Dementia
sium Proceedings, volume 2016, 496. American Medical            1471301217700965.
Informatics Association.                                        Mitchell, M.; Bohus, D.; and Kamar, E. 2014. Crowd-
Fitzpatrick, K. K.; Darcy, A.; and Vierhile, M. 2017. Deliv-    sourcing language generation templates for dialogue sys-
ering cognitive behavior therapy to young adults with symp-     tems. Proceedings of the INLG and SIGDIAL 2014 Joint
toms of depression and anxiety using a fully automated con-     Session 172–180.
versational agent (woebot): A randomized controlled trial.      Negi, S.; Joshi, S.; Chalamalla, A. K.; and Subramaniam,
JMIR Mental Health 4(2):e19.                                    L. V. 2009. Automatically extracting dialog models from
Fuketa, M.; Morita, K.; and Aoe, J.-i. 2013. Agent–             conversation transcripts. In Data Mining, 2009. ICDM’09.
based communication systems for elders using a reminis-         Ninth IEEE International Conference on, 890–895. IEEE.
cence therapy. International Journal of Intelligent Systems     Nikitina, S.; Callaioli, S.; and Baez, M. 2018. Smart con-
Technologies and Applications 12(3-4):254–267.                  versational agents for reminiscence.
Hanke, S.; Sandner, E.; Kadyrov, S.; and Stainer-               Nurgalieva, L.; Laconich, J. J. J.; Baez, M.; Casati, F.;
Hochgatterer, A. 2016. Daily life support at home through       and Marchese, M. 2017. Designing for older adults:
a virtual support partner.                                      review of touchscreen design guidelines. arXiv preprint
Hawthorn, D. 2000. Possible implications of aging for inter-    arXiv:1703.06317.
face designers. Interacting with computers 12(5):507–528.
                                                                Park, H. L.; O’Connell, J. E.; and Thomson, R. G. 2003.
Huang, T.-H. K.; Lasecki, W. S.; Azaria, A.; and Bigham,        A systematic review of cognitive decline in the general el-
J. P. 2016. ” is there anything else i can help you with?”      derly population. International journal of geriatric psychia-
challenges in deploying an on-demand crowd-powered con-         try 18(12):1121–1134.
versational agent. In Fourth AAAI Conference on Human
Computation and Crowdsourcing.                                  Ring, L.; Barry, B.; Totzke, K.; and Bickmore, T. 2013. Ad-
                                                                dressing loneliness and isolation in older adults: Proactive
Huang, T.-H. K.; Lasecki, W. S.; and Bigham, J. P. 2015.        affective agents provide better support. In Affective Comput-
Guardian: A crowd-powered spoken dialog system for web          ing and Intelligent Interaction (ACII), 2013 Humaine Asso-
apis. In Third AAAI conference on human computation and         ciation Conference on, 61–66. IEEE.
crowdsourcing.
Huang, T.-H. K. 2016. Crowd-powered conversational              Ring, L.; Shi, L.; Totzke, K.; and Bickmore, T. 2015. Social
agents.                                                         support agents for older adults: longitudinal affective com-
                                                                puting in the home. Journal on Multimodal User Interfaces
Huldtgren, A.; Mertl, F.; Vormann, A.; and Geiger, C.           9(1):79–88.
2015. Probing the potential of multimedia artefacts to sup-
port communication of people with dementia. In Human-           Schlögl, S.; Doherty, G.; and Luz, S. 2014. Wizard of oz ex-
Computer Interaction, 71–79. Springer.                          perimentation for language technology applications: Chal-
                                                                lenges and tools. Interacting with Computers 27(6):592–
Jiang, Y.; Kummerfeld, J. K.; and Laseck, W. S. 2017.
                                                                615.
Understanding task design trade-offs in crowdsourced para-
phrase collection. arXiv preprint arXiv:1704.05753.             Subramaniam, P., and Woods, B. 2012. The impact of
Kopp, S.; Gesellensetter, L.; Krämer, N. C.; and Wachsmuth,    individual reminiscence therapy for people with dementia:
I. 2005. A conversational agent as museum guide–design          systematic review. Expert Review of Neurotherapeutics
and evaluation of a real-world application. In International    12(5):545–555.
Workshop on Intelligent Virtual Agents, 329–343. Springer.      Takahashi, T., and Yokono, H. 2017. Two persons dialogue
Lasecki, W. S.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen,   corpus made by multiple crowd-workers. In Proceedings of
J. F.; and Bigham, J. P. 2013. Chorus: a crowd-powered          the 8th International Workshop on Spoken Dialogue Systems
conversational assistant. In Proceedings of the 26th annual     (IWSDS).
ACM symposium on User interface software and technology,        Tsiourti, C.; Moussa, M. B.; Quintas, J.; Loke, B.; Jochem,
151–162. ACM.                                                   I.; Lopes, J. A.; and Konstantas, D. 2016a. A virtual assistive
companion for older adults: design implications for a real-
world application. In Proceedings of SAI Intelligent Systems
Conference, 1014–1033. Springer.
Tsiourti, C.; Quintas, J.; Ben-Moussa, M.; Hanke, S.; Nij-
dam, N. A.; and Konstantas, D. 2016b. The cameli frame-
worka multimodal virtual companion for older adults. In
Proceedings of SAI Intelligent Systems Conference, 196–
217. Springer.
Vardoulakis, L. P.; Ring, L.; Barry, B.; Sidner, C. L.; and
Bickmore, T. 2012. Designing relational agents as long term
social companions for older adults. In International Confer-
ence on Intelligent Virtual Agents, 289–302. Springer.
Webster, J. D., and Gould, O. 2007. Reminiscence and vivid
personal memories across adulthood. The International
Journal of Aging and Human Development 64(2):149–170.
Yaghoubzadeh, R.; Pitsch, K.; and Kopp, S. 2015. Adap-
tive grounding and dialogue management for autonomous
conversational assistants for elderly users. In International
Conference on Intelligent Virtual Agents, 28–38. Springer.