Crowdsourcing for Reminiscence Chatbot Design Svetlana Nikitina Florian Daniel Marcos Baez, Fabio Casati Georgy Kopanitsa University of Trento and Politecnico di Milano, DEIB University of Trento and Tomsk Polytechnic University Tomsk Polytechnic University florian.daniel@polimi.it Tomsk Polytechnic University kopanitsa@tpu.ru svetlana.nikitina@unitn.it baez@disi.unitn.it, fabio.casati@unitn.it Abstract us how limited our knowledge is in terms of effective strategies to maintain dialogs with older adults. Success In this work-in-progress paper we discuss the challenges in stories are mostly limited to Wizard of Oz evaluations identifying effective and scalable crowd-based strategies for (Schlögl, Doherty, and Luz 2014), in which system func- designing content, conversation logic, and meaningful met- rics for a reminiscence chatbot targeted at older adults. We tionality is partially emulated by a human operator, or based formalize the problem and outline the main research ques- on fully human-operated agents. The few attempts at au- tions that drive the research agenda in chatbot design for rem- tonomous agents highlight issues with the mismatch be- iniscence and for relational agents for older adults in general. tween user expectations and the actual social capabilities of the agents (Tsiourti et al. 2016a), general challenges with designing conversations suitable to the target popu- Context & Objectives lation (Yaghoubzadeh, Pitsch, and Kopp 2015), and chal- Reminiscence is the process of collecting and recalling lenges with engaging older adults in question-based inter- past memories through pictures, stories and other memen- actions in particular (Fuketa, Morita, and Aoe 2013). tos (Webster and Gould 2007). The practice of reminiscence In this position paper we aim at identifying effective and has well documented benefits on social, mental and emo- scalable crowd-based strategies for designing content, con- tional wellbeing (Subramaniam and Woods 2012; Huldtgren versation rules, and meaningful metrics for a reminiscence et al. 2015), making it a very desirable practice, especially chatbot targeted at older adults. We build on the concept in- for older adults. Research on technology-mediated reminis- troduced in (Nikitina, Callaioli, and Baez 2018) and identify cence has advanced our understanding into how to effec- where and how crowdsourcing can help design and maintain tively support this process, but has reached a limit in terms of an agent-mediated reminiscence process, while address- of the approaches to support more engaging reminiscence ing the specific challenges posed by the target population. sessions, effectively elicit information about the person, and extend the practice of reminiscence to those with less oppor- Reminiscence Chatbot tunities for face to face interactions. In our previous work (Nikitina, Callaioli, and Baez 2018) The envisioned chatbot is based on the idea of automatically we made a case for conversational agents in this domain, guiding older adults through multimedia reminiscence ses- and proposed the concept of a smart conversational agent sions (Nikitina, Callaioli, and Baez 2018). It has the dual that can drive personal and social reminiscence sessions purpose of i) collecting and organising memories and pro- with older adults in a way that is engaging and fun, while file information, and ii) engaging older adults in conversa- effectively collecting and organising memories and stories. tions that are stimulating and fun. In Figure 1 we show an The idea of conversational agents for older adults is not example conversation and related main actions. new, and they have been explored to support a wide vari- The example starts from the subject (the elder) providing ety of activities and everyday tasks (Tsiourti et al. 2016a; a memory in the form of a picture. In response, the chatbot Vardoulakis et al. 2012; Hanke et al. 2016; Tsiourti et al. poses a contextual question. In order to do so, it must be able 2016b), to act as social companions (Ring et al. 2013; 2015; to understand the theme of the picture (big city) and to ex- Demiris et al. 2016) and even to engage older adults in rem- tract and understand information from pictures and text. In iniscence sessions (Fuketa, Morita, and Aoe 2013). order to keep the conversation natural, it must further be able While these works give us valuable insights into the to reference related conversation topics (the city of Trento) opportunities of using conversational agents as an instru- and, in order to show empathy, it must be able to sense the ment to support reminiscence sessions, they also show feelings of the subject as the conversation evolves (e.g., it looks like the subject likes rock music, so it could be an idea Copyright c 2018 for this paper by its authors. Copying permitted to talk about that for some time). It would also be good if for private and academic purposes. the bot be able to sense the presence of peers (e.g., family Skyline Married in Trento Main actions conversational policy), π : S × A → {(s, p)} associating to Showing understanding Posing comments that show each state and action a set {(s, p)} of possible target states s understanding of the content and the and the probability p with which that action should be cho- user input. e.g., formulating question Nice picture!. It looks like a big city. Where was it taken? based on the picture. sen (to model that conversations are not deterministic). Eliciting questions In practice however the state space is infinite and the It was taken in Chicago Posing questions about the life of the That’s far away from Trento!. person, using the pictures as triggers possible conversations are also infinite so this FSM is not (e.g.,“Were you visiting Chicago?”). Were you visiting Chicago? the right model. An alternative model is based on Event- User engagement inquiry No, actually I was living there. I even got married in Chicago. Posing questions to check the user Condition-Action (ECA) rules, where the event for example engagement, and testing potential is the sentence by the subject (the elder) and the condition topics to address next. How long did you live there? is some expression over what we know about the subject as Bringing up content Moved back in the 80s, stayed 10 years… Bringing up multimedia material that well as past events. This has however the same limitations will help elicit information (e.g., rock just discussed. Did you listen to rock back then? video from the 80s). Recovery We observe that what we really want to have is a defini- All the time! Recovery strategy when the bot has made an incorrect assessment in the tion of the domain and range of the policy function π so that conversation or the user has provided we can learn a useful policy that can be applied to real life conflictive information (e.g., ignoring topic in incoherent marriage info). conversations. On the action side (the range), we approach the problem by clustering similar actions along several di- Figure 1: Example reminiscence session with bot actions mensions, such as i) the type of actions (ask information, make a comment, show interesting content) and ii) the topic of conversation (talk about the picture you are showing, or members or moderators helping with the chat). All this in- about childhood, or about hobbies). Given the action type formation helps the bot decide on appropriate next actions and topic, there are many actual conversations and utter- taking into account possible conversational goals (e.g., elicit ances, but at this level we are focused on learning types and basic user profile data). Among the most complex decisions topics rather than conducting an interaction within a topic or to be taken is deciding if and when to change context in a paraphrasing sentences. conversation (e.g., to make the elder laugh). In terms of the domain a policy is defined on, what we All these requirements are particularly challenging since wish to have is a description of the characteristics of the special attention must be paid to the subject’s abilities and state (or event and condition) to which the policy applies. limitations (Nurgalieva et al. 2017; Hawthorn 2000). For in- For example, the crowd may tell us that after they learn the stance, it is hard to cope with user-initiated context switches date of birth, they show newspaper covers of that year, or fa- or to keep knowledge about subjects coherent due to cog- mous people born the same day, or songs that where popular nitive decline associated with age (Park, O’Connell, and when the subject was very young. In this case the trigger of Thomson 2003). Coping with these challenges is difficult the action is the last conversation element where the subject even for humans (Miron et al. 2017). is notifying the state of birth (or, in terms of events, it is the In the long term, our goal is to develop a crowd-powered event of the system, somehow, coming to know the date of chatbot that implements the necessary conversational logic, birth of the person). sensibility and tricks to engage older adults in pleasant and The challenge here is therefore to understand what is the satisfactory reminiscence sessions. The crowd should not reasoning of crowd workers when they decide to take ac- be involved in direct interactions with the elderly (like in tions, and based on this reasoning identify the classes of state some real-time crowdsourcing approaches studied in litera- and event information we need to attach policies to. ture (López et al. 2016; Ring et al. 2015)), nor should it be used just to train black-box AI algorithms. The idea is to in- Crowdsourcing tasks volve the crowd to elicit and represent reminiscence-specific conversation knowledge explicitly in the form of some ded- The counterpart of the model is the learning process, which icated model, in order to be able to actively steer the conver- has to do with how to design and process the results of sation into specific directions (e.g., to elicit health issues or crowdsourcing tasks. The objective we have in seeking the family memories). In this paper, we focus on an intermedi- proper task designs are the following: (i) identifying action ate set of research objectives: identifying (i) how to model types and topics (unless we want to fixe them a-priori), (ii) the conversational knowledge the chatbot may rely on and identifying when (based on which state or trigger) a person (ii) how to use the crowd to learn and evaluate the model. changes topic or shows specific content, and (iii) identifying why (based on which state or trigger) the agent initiates a conversation on a topic. Crowd-Supported Chatbot Design To do this, we envision crowdsourcing tasks that aim at Conversational Model Representation (i) exploring possible conversations (these can be Wizard of Conceptually, a simple model we can imagine for a chatbot Oz simulations), (ii) reflecting over previous conversations is a state machine (S, A, δ, π, F ), where S denotes the states by the same worker or other workers to derive the “rules” (a state includes the information on the subject and the con- that made the worker take a certain course of action, and versation history), F denotes the final states, A is the set of (iii) aggregating these “rules” into a smaller coherent set that (conversational) actions, δ is a state transition function (our reveals the characteristics that the policy model should have. For example, the crowd may reveal that they change topic measure); ii) number of turns of conversation made before whenever they sense that the person is sad talking about the it drops; iii) times conversation drops overall; iv) domain- current topic. This would tell us that an important compo- specific metrics like the amount of content which the user nent of the policy domain is the perceived emotional state, has provided during one conversation session (amount of something that therefore the agent should try to detect, and pictures uploaded, amount of data attributes filled about a that change in this emotional state should be a trigger to ei- relevant person), and other task-completion metrics. ther continue or change topic. We thus focus on the following research question (RQ): Related work Which crowd-based strategies can help elicit effective con- Crowdsourcing has been used to support all aspects of chat- versation logic for conversations (reminiscence sessions) bot design, from holding direct conversations with final targeting older adults, and how? users, to supporting conversation design – the latter be- Conversational logic includes understanding of: compo- ing the family of approaches under which we position our sition of Dialog State, when and how the State has to be work. Prior work on crowdsourcing has addressed the boot- changed, and what are the most important variables that af- strapping challenge, investigating strategies to create di- fect the state. That is, given: alog datasets to train algorithms (Takahashi and Yokono • the set of States S = {S1 , S2 , ...Sm }, where S is the state 2017; Lin, D’Haro, and Banchs 2016), infer conversation of the conversation that consists of multiple features (such templates (Mitchell, Bohus, and Kamar 2014) or declar- as user profile info, dialog history, sentiments); ative conversation models (Negi et al. 2009). It has also been explored to enrich conversation dialogs to provide • the set of possible Goals in the conversation G = meaning and context, by annotating dialogs with seman- {G1 , G2 , ...Gn }, where G is the current goal aimed at tics and labels with, for example, polarity and appropri- (e.g., elicit information, tell a joke, show engagement con- ateness (Lin, D’Haro, and Banchs 2016), extracting enti- tent); and ties (Huang 2016), as well as providing additional utter- • the set of Actions A = {A1 , A2 , ...An }, A being the chat- ances for more natural conversations (paraphrasing) (Jiang, bot action performed, which changes the state and satis- Kummerfeld, and Laseck 2017). Other approaches incorpo- fies the current goal (e.g ask question to elicit info); rate the crowd in the evaluation of chatbot quality, mak- ing sure crowd contributions are valid and safe (Chkroun the aim is to: and Azaria 2018; Huang et al. 2016) and even allowing • identify the composition of current State; and users to train chatbots directly (Chkroun and Azaria 2018). • identify the policy, i.e., which Action to take given current Acknowledging that chatbot conversations are not perfect, state S and the Goals G some approaches explore strategies to escalate conversation decisions to the crowd in cases where the chatbot is not able • such that to interpret or serve the user request (Behera 2016). π(G, S) → S 0 The above highlight the potential of crowdsourcing for where Policy π is a rule that defines the transition from designing chatbots. We take these approaches as the start- state S to state S 0 and depends on the Current State S and ing point for exploring the specific challenges of design- current Goals G of the conversation. ing and maintaining a reminiscence bot. Previous work in this domain – though valuable in insights – has been limited The research question is actually of more general nature, to human-operated chatbots and Wizard of Oz evaluations, and the resulting approach can be applied to any social chat- highlighting the complexity of chatbot design in general and bot. To us, reminiscence is an application domain we have in particular for our target population (Tsiourti et al. 2016a; experience with and we want to contribute to. Fuketa, Morita, and Aoe 2013; Yaghoubzadeh, Pitsch, and Kopp 2015). Success Metrics Different metrics have been proposed for evaluating the Ongoing and Future Work quality of conversations with dialog agents, such as: i) user Next, we are going to define concrete crowdsoursing strate- engagement (Cervone et al. 2017; Fitzpatrick, Darcy, and gies to elicit the nature of the states, goals and actions that Vierhile 2017), ii) task completion (Huang, Lasecki, and will give structure to the model. Then, we will focus on tasks Bigham 2015), iii) conversation quality: including dialog to fill the model with data and on algorithms to effectively consistency and memory of past events (Lasecki et al. 2013), aggregate and apply the elicited knowledge. iv) human-like communication (Kopp et al. 2005). The ap- proach to evaluation – and therefore the choice of metrics – is based on the aim of the agent: having an engaging chat Acknowledgments or performing a specific task (e.g., booking a flight). In our This work has received funding from the EU Horizon 2020 case, the reminiscence chatbot is a combination of conver- Marie Skłodowska-Curie grant agreement No 690962. It sational and task-based agent, as it aims at both having an was also supported by the project “Evaluation and enhance- engaging conversation with the user and collecting infor- ment of social, economic and emotional wellbeing of older mation while doing so. Therefore, we consider metrics for adults” under the agreement No.14.Z50.31.0029, Tomsk both types of agents, including: i) engagement (as subjective Polytechnic University. References Lin, L.; D’Haro, L. F.; and Banchs, R. 2016. A web-based Behera, B. 2016. Chappie-a semi-automatic intelligent chat- platform for collection of human-chatbot interactions. In bot. Proceedings of the Fourth International Conference on Hu- Cervone, A.; Tortoreto, G.; Mezza, S.; Gambi, E.; Ric- man Agent Interaction, 363–366. ACM. cardi, G.; et al. 2017. Roving mind: a balancing López, A.; Ratni, A.; Trong, T. N.; Olaso, J. M.; Montene- act between open–domain and engaging dialogue systems. gro, S.; Lee, M.; Haider, F.; Schlögl, S.; Chollet, G.; Joki- In Alexa Prize, volume 1. https://developer. amazon. nen, K.; et al. 2016. Lifeline dialogues with roberta. In com/alexaprize/proceedings. International Workshop on Future and Emerging Trends in Chkroun, M., and Azaria, A. 2018. did i say something Language Technology, 73–85. Springer. wrong?: Towards a safe collaborative chatbot. Miron, A. M.; Thompson, A. E.; McFadden, S. H.; and Demiris, G.; Thompson, H. J.; Lazar, A.; and Lin, S.-Y. Ebert, A. R. 2017. Young adults concerns and cop- 2016. Evaluation of a digital companion for older adults ing strategies related to their interactions with their grand- with mild cognitive impairment. In AMIA Annual Sympo- parents and great-grandparents with dementia. Dementia sium Proceedings, volume 2016, 496. American Medical 1471301217700965. Informatics Association. Mitchell, M.; Bohus, D.; and Kamar, E. 2014. Crowd- Fitzpatrick, K. K.; Darcy, A.; and Vierhile, M. 2017. Deliv- sourcing language generation templates for dialogue sys- ering cognitive behavior therapy to young adults with symp- tems. Proceedings of the INLG and SIGDIAL 2014 Joint toms of depression and anxiety using a fully automated con- Session 172–180. versational agent (woebot): A randomized controlled trial. Negi, S.; Joshi, S.; Chalamalla, A. K.; and Subramaniam, JMIR Mental Health 4(2):e19. L. V. 2009. Automatically extracting dialog models from Fuketa, M.; Morita, K.; and Aoe, J.-i. 2013. Agent– conversation transcripts. In Data Mining, 2009. ICDM’09. based communication systems for elders using a reminis- Ninth IEEE International Conference on, 890–895. IEEE. cence therapy. International Journal of Intelligent Systems Nikitina, S.; Callaioli, S.; and Baez, M. 2018. Smart con- Technologies and Applications 12(3-4):254–267. versational agents for reminiscence. Hanke, S.; Sandner, E.; Kadyrov, S.; and Stainer- Nurgalieva, L.; Laconich, J. J. J.; Baez, M.; Casati, F.; Hochgatterer, A. 2016. Daily life support at home through and Marchese, M. 2017. Designing for older adults: a virtual support partner. review of touchscreen design guidelines. arXiv preprint Hawthorn, D. 2000. Possible implications of aging for inter- arXiv:1703.06317. face designers. Interacting with computers 12(5):507–528. Park, H. L.; O’Connell, J. E.; and Thomson, R. G. 2003. Huang, T.-H. K.; Lasecki, W. S.; Azaria, A.; and Bigham, A systematic review of cognitive decline in the general el- J. P. 2016. ” is there anything else i can help you with?” derly population. International journal of geriatric psychia- challenges in deploying an on-demand crowd-powered con- try 18(12):1121–1134. versational agent. In Fourth AAAI Conference on Human Computation and Crowdsourcing. Ring, L.; Barry, B.; Totzke, K.; and Bickmore, T. 2013. Ad- dressing loneliness and isolation in older adults: Proactive Huang, T.-H. K.; Lasecki, W. S.; and Bigham, J. P. 2015. affective agents provide better support. In Affective Comput- Guardian: A crowd-powered spoken dialog system for web ing and Intelligent Interaction (ACII), 2013 Humaine Asso- apis. In Third AAAI conference on human computation and ciation Conference on, 61–66. IEEE. crowdsourcing. Huang, T.-H. K. 2016. Crowd-powered conversational Ring, L.; Shi, L.; Totzke, K.; and Bickmore, T. 2015. Social agents. support agents for older adults: longitudinal affective com- puting in the home. Journal on Multimodal User Interfaces Huldtgren, A.; Mertl, F.; Vormann, A.; and Geiger, C. 9(1):79–88. 2015. Probing the potential of multimedia artefacts to sup- port communication of people with dementia. In Human- Schlögl, S.; Doherty, G.; and Luz, S. 2014. Wizard of oz ex- Computer Interaction, 71–79. Springer. perimentation for language technology applications: Chal- lenges and tools. Interacting with Computers 27(6):592– Jiang, Y.; Kummerfeld, J. K.; and Laseck, W. S. 2017. 615. Understanding task design trade-offs in crowdsourced para- phrase collection. arXiv preprint arXiv:1704.05753. Subramaniam, P., and Woods, B. 2012. The impact of Kopp, S.; Gesellensetter, L.; Krämer, N. C.; and Wachsmuth, individual reminiscence therapy for people with dementia: I. 2005. A conversational agent as museum guide–design systematic review. Expert Review of Neurotherapeutics and evaluation of a real-world application. In International 12(5):545–555. Workshop on Intelligent Virtual Agents, 329–343. Springer. Takahashi, T., and Yokono, H. 2017. Two persons dialogue Lasecki, W. S.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen, corpus made by multiple crowd-workers. In Proceedings of J. F.; and Bigham, J. P. 2013. Chorus: a crowd-powered the 8th International Workshop on Spoken Dialogue Systems conversational assistant. In Proceedings of the 26th annual (IWSDS). ACM symposium on User interface software and technology, Tsiourti, C.; Moussa, M. B.; Quintas, J.; Loke, B.; Jochem, 151–162. ACM. I.; Lopes, J. A.; and Konstantas, D. 2016a. A virtual assistive companion for older adults: design implications for a real- world application. In Proceedings of SAI Intelligent Systems Conference, 1014–1033. Springer. Tsiourti, C.; Quintas, J.; Ben-Moussa, M.; Hanke, S.; Nij- dam, N. A.; and Konstantas, D. 2016b. The cameli frame- worka multimodal virtual companion for older adults. In Proceedings of SAI Intelligent Systems Conference, 196– 217. Springer. Vardoulakis, L. P.; Ring, L.; Barry, B.; Sidner, C. L.; and Bickmore, T. 2012. Designing relational agents as long term social companions for older adults. In International Confer- ence on Intelligent Virtual Agents, 289–302. Springer. Webster, J. D., and Gould, O. 2007. Reminiscence and vivid personal memories across adulthood. The International Journal of Aging and Human Development 64(2):149–170. Yaghoubzadeh, R.; Pitsch, K.; and Kopp, S. 2015. Adap- tive grounding and dialogue management for autonomous conversational assistants for elderly users. In International Conference on Intelligent Virtual Agents, 28–38. Springer.