=Paper=
{{Paper
|id=Vol-3318/short28
|storemode=property
|title=A Child-Friendly Approach to Spoken Conversational Search
|pdfUrl=https://ceur-ws.org/Vol-3318/short28.pdf
|volume=Vol-3318
|authors=Thomas Beelen,Khiet P. Truong,Roeland Ordelman,Ella Velner,Vanessa Evers,Theo Huibers
|dblpUrl=https://dblp.org/rec/conf/cikm/BeelenTOVEH22
}}
==A Child-Friendly Approach to Spoken Conversational Search==
A Child-Friendly Approach to Spoken Conversational Search Thomas Beelen1,∗ , Khiet P. Truong1 , Roeland Ordelman1 , Ella Velner1 , Vanessa Evers2 and Theo Huibers1 1 University of Twente, Drienerlolaan 5, 7522NB, Enschede, The Netherlands 2 NTU Institute of Science and Technology for Humanity, 50 Nanyang Avenue, 639798, Singapore, Singapore 1. Introduction project we first develop an agent that uses a simplified conversational approach. The simplified conversational Increasingly, search engines can be accessed via speech, approach employs open-ended elicitation questions to often through Voice Assistants (VAs). As outlined be- prompt the child to talk more about their information fore by Beelen et al. [1], children are insufficiently sup- need (for example: ”What do you want to know about ported by technology during their search process, both animal species?”). The elicitation questions are based on by search engines on a computer, as well as by VAs. Chil- templates that do not require any domain knowledge. dren have more difficulty in formulating effective search Instead, the templates extract keywords from the child’s queries that represent their information need well due speech to generate questions. Subsequently extracted to a smaller knowledge base and vocabulary [2]. Using keywords are added to the memory until a threshold is speech does not inherently solve these obstacles. Further- reached and the agent moves on to result presentation more, most VAs provide only a limited question-answer (more on this in section 3). Secondly, we evaluate in interaction style where the agent directly tries to find experiments whether the simplified approach is able to results based on the initial query. This interaction style elicit more keywords from children compared to the tradi- causes several issues for children. Firstly, it is necessary tional question-answer paradigm. We also study whether to put all the required context into one query, which they such a robot is engaging to use because this is a precon- struggle with [3, 4]. Secondly, usually it is not possible dition for potential long term use. To study these topics to ask follow-up questions, a functionality that children we conducted a pilot study with eleven children which often expect [5, 3]. Lastly, children are typically not will be described in section 4. This pilot is the basis for a supported in formulating queries, for instance by query main study that we describe in section 5. Here we also suggestions or clarifying questions [4, 6]. describe a different use case, which is to search an archive Many researchers have thus concluded that search in a museum. In this case the physical embodiment is tools should assist children in query formulation. With especially suited to draw visitors’ attention. this goal in mind, our research project called CHATTERS (https://chatters-cri.github.io/), focuses on a robot for children’s conversational search. We investigate if a spo- 2. Problem statement ken conversational search approach (see [7]) works better for children than the question-answer style interaction The problem we address is that children are insufficiently that current commercial VAs usually employ. The goal supported in communicating their information needs to is to help children communicate their information need voice-based search systems. They are required to for- better via the conversational interaction with the system. mulate queries that contain all necessary context in one We opt for a physically embodied robot to provide a natu- statement, which is too complex. We study whether a ral conversation, and an engaging experience [8]. In this robot that uses a back and forth conversation can help children communicate their need more effectively by MICROS’22: Mixed-Initiative ConveRsatiOnal Systems workshop at leveraging multiple turns. Furthermore, we are inter- CIKM 2022, October 21, 2022, Atlanta, GA ested if a conversational approach provides a more social ∗ Corresponding author. and engaging experience. Our target audience are chil- Envelope-Open t.h.j.beelen@utwente.nl (T. Beelen); k.p.truong@utwente.nl dren between 10–12 years old. (K. P. Truong); roeland.ordelman@utwente.nl (R. Ordelman); p.c.velner@utwente.nl (E. Velner); vanessa.evers@ntu.edu.sg (V. Evers); t.w.c.huibers@utwente.nl (T. Huibers) GLOBE https://thomasbeelen.com/ (T. Beelen) 3. Proposed system Orcid 0000-0002-5650-2830 (V. Evers); 0000-0002-9837-8639 (T. Huibers) We first develop a robot that uses simplified conversa- © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). tional search. In this interaction, the robot initiates the CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) conversation with an introduction. It states its name, asks the home. In the future we change to a museum archive the child for theirs, and says it is pleased to meet them. search task. Then the robot asks if the child wants help searching for information, and the child can ask a question. Since the 4.1. Method child’s speech may contain words that are not relevant to the search (such as ”uhm, let’s see”), keywords need Our study is a within-subjects comparison with two con- to be extracted. In the pilot study described in section ditions. In one condition the robot uses the simplified 4, keyword extraction was done by matching against a conversational interaction style (see section 3), in the pre-programmed list of possible keywords. Detected key- other it uses a question-answer interaction style. The words are added to the robot’s memory. If the number order of the conditions was alternated between partici- of keywords is below a threshold, the robot will pose an pants. The Furhat robot [9] and its software environment elicitation question. This threshold will be optimized in are used for both conditions. This keeps the two experi- the future. Too few keywords lead to an unspecific rank- mental conditions similar while the style of interaction ing of results, while trying to elicit too many keywords is varied. leads to a long interaction and possibly frustration. In There are two search tasks, one for each condition the pilot study (section 4) this threshold was set to three (since condition order was alternated, each task was used keywords, which was an estimate based on the number on different conditions). Both tasks are factoid questions of words in the search tasks. The elicitation questions based on the work by Landoni et al. [5]. The first was are based on a simple pattern that includes the keywords to find out what hail is. The second was to find three that were extracted so far. The two question patterns are endangered animal species. The tasks were presented (translated from Dutch): in one sentence in Dutch on the task sheet. The search results that could be retrieved were pre-programmed in • What is it you want to know about [recognised this pilot, and were the same in both conditions. When keywords]? presenting results, the robot mentions the website where • What don’t you know yet about [recognised it found the information, the name of the article, and keywords]? then reads aloud a snippet of the web page. In the question-answer condition, the robot mimics the The word ”and” is added to the list of keywords where interaction of a commercial VA. This means it first waits necessary. The robot loops over the questions and adds for a wake word, in this case “Hey robot”, or “Hey Furhat”. new keywords until the threshold is met. After this phase, Then the robot’s LED ring lights up green to signal it the robot will move on to present search results. Then the is awake, and a question can be asked. The wake word robot asks the child if it can be of any further assistance. and question can also be combined into one statement. Otherwise it goes to a closing interaction. The robot will then present results right away in the way described above. After the results the robot goes back to 4. Pilot study waiting for the wake word. We conducted a pilot study as a first step to find out 4.2. Measures if the simplified conversational approach elicits more keywords (as described in the introduction), and how The measurements consist of observational notes, Likert children experience such a robot. Furthermore, the pilot scales using emojis (Smileyometers [10]), logs containing is a way to discover methodological issues that may still a raw transcript, and interview questions. The children be corrected for the main study. In the pilot, we evaluated were also asked about their current VA usage. Observa- a robot that uses the simplified conversational search tions focused on how the children behaved and spoke to approach. As described in the evaluation framework the robot. The Smileyometer was used to gauge the user by Landoni et al. [5], studies in IR for children can be experience of the interactions. The questions are based described by the intended search strategy, for a particular on [5], they include questions on: fun, ease of finding user group, given a task, in a certain context. Our user answers, intention to use again, kindness, and ease of group are children ages 10–12 years old. These children conversing with. The logs are a transcript of how the are in the final years of primary school in the Netherlands robot interpreted the child and how it responded. This and are starting to work more on assignments such as is used to evaluate whether the approach is able to elicit presentations. We compare our conversational robot more keywords from children compared to the traditional strategy to a traditional question-answer style interaction paradigm. The goal of the interview at the end is to study that is common with commercial VAs. The task in the children’s perception of the differences and advantages experiments is searching information related to school of the two systems, and to add more qualitative data on subjects. The context of the searches is at school or in their experience 4.3. Participants & procedure 3,00 Eleven children participated in the pilot at a local af- 2,50 ter school care over two days in June 2022. Children whose parents consented joined the experiments. Our 2,00 user group are children ages 10-12, but the children that participated were mostly younger (mean age = 8.8, SD = 1,50 1.3). We decided to also let younger children participate 1,00 due to the difficulty of recruiting children in our targeted age range. 0,50 The robot was set up on a table in a separate room with an open door to the main area. The researcher sat at 0,00 the same table and height as the child and explained that Enjoyment Easy searching with Use again Friendly / nice Easy talking with they will be talking to two versions of the same robot. It Conversational Question-answer was explained that the robots may not be fully functional yet, and that the children are helping to further develop Figure 1: Smileyometer outcomes pilot study them. The children were also explained that one of the robots will start talking right away, and the other requires Table 1 a wake word. The children received a task sheet that Children’s statements on the two robots also includes the questionnaire questions. The interview Robot Statements happened after interacting with both robots. It speaks directly (it is awake and thus faster). It is more fun because there is more talking. Conversational 4.4. Results It is easier to use. It requires less talking. The first day the speech recognition often did not under- I have to wake it up first. stand children correctly due to background noise. There- Question-answer It is better because I have to talk less. It’s faster. fore, the second day a headset was used, and Wizard- I like that it turns green [after wake word]. of-Oz (undisclosed human operator) functionality was added. This way the researcher could take over when some responses were not understood correctly by the more likely to use it again. The robots were seen as automatic speech recognition. Due to speech recogniser roughly equally friendly and easy to talk to. The results errors the logs were not usable for analysis. are shown in figure 1. Observation Many children resorted to reading the Interview Four children preferred the question-answer search task directly from the sheet. This caused the condition, while five preferred the conversational. Two queries in both conditions to be similar. Mainly for children had no preference. Some of the interesting state- younger users the task seemed complex and they re- ments on the robots are in table 1. The statements in- quired more input from the researcher. The children dicate there is a potential trade-off between efficiency around the target age seemed more comfortable with the and fun. There are also clear individual differences, as level of complexity, working more independently. In the some children seemed to enjoy talking more, while oth- conversational condition, the system entered the elicita- ers preferred a fast interaction. Concerning participants’ tion question loop in many cases. Sometimes, the child VA usage, five children had no experience, three children wrongfully assumed they already provided all the words used them a few times, and three used them frequently. in the search task, and got confused by the elicitation The frequent users ask VAs about the weather, jokes, and question. Especially younger children tended to look at finding information. No effects of prior VA usage on the the researcher when they were unsure how to continue outcome could be deterined in this pilot. Some children the interaction. This also happened at the elicitation gave tips to improve the robot. These tips were: an easier questions where the researcher had to give a hint. In to understand voice, and a touch screen on the robot’s other cases the child kept the conversation going and face to be able to select search results visually as well. answered the elicitation question naturally. 4.5. Conclusion and limitations Smileyometers The robots scored relatively similar in the Smileyometers. The results suggest that the conversa- Based on the pilot, the number of elicited keywords could tional robot may be more enjoyable to use and easier to not yet be compared due to errors and task design. Chil- find answers with. Children also indicated being slightly dren rated the robots quite similar but possibly find the conversational system more fun and easier to find an- some important methodological issues. We look forward swers with. More children preferred the conversational to learning more about children’s conversational search system. They perceived the robots as nearly equally process in our main study. friendly. The interview answers shed light on the indi- vidual preferences regarding the amount of conversation during the search process. Acknowledgments The pilot study also gave insights that influence the This research is supported by the Dutch SIDN fund https: method of the main study. Firstly, the small sample size //www.sidn.nl/ and TKI CLICKNL funding of the Dutch means the current findings have low confidence. In the Ministry of Economic Affairs https://www.clicknl.nl/. main study we will increase the sample size by relying on We would also like to thank the staff at BSO Partou de cooperation with partners such as museums. Secondly, Vlinder for their time and cooperation. background noise led to errors and required the addition of Wizard-of-Oz controls. Another limitation is that most participants were younger than our target audience. A References few years can have a significant developmental difference in children, therefore the interaction may be less complex [1] T. Beelen, E. Velner, R. Ordelman, K. P. Truong, for children in the target age. However, the complexity of V. Evers, T. Huibers, Does your robot know? En- the interaction seemed suitable for the older participants hancing children’s information retrieval through around the target age. Finally, the search tasks were spoken conversation with responsible robots, mostly read aloud from the task sheet, which likely affects arXiv:2106.07931 [cs] (2021). URL: http://arxiv.org/ the naturalness of children’s queries. In the next section abs/2106.07931, arXiv: 2106.07931. we describe how a different task in the museum context [2] H. Hutchinson, A. Druin, B. B. Bederson, K. Reuter, may address this. A. Rose, A. C. Weeks, How do I find blue books about dogs? The errors and frustrations of young digital library users, Proceedings of HCII 2005 5. Future work (2005) 22–27. Publisher: Citeseer. [3] S. B. Lovato, A. M. Piper, E. A. Wartella, Hey Google, The next step in creating the conversational robot is to Do Unicorns Exist? Conversational Agents as a connect it the API of the Netherlands Institute for Sound Path to Answers to Children’s Questions, in: Pro- and Vision1 , containing Dutch public broadcasting me- ceedings of the 18th ACM International Conference dia. In line with the tip by one of the participants, this on Interaction Design and Children, IDC ’19, Asso- use case will also introduce a display for multi media ciation for Computing Machinery, Boise, ID, USA, search results. The API connection will enable us to 2019, pp. 301–313. URL: https://doi.org/10.1145/ study more natural search tasks and move away from 3311927.3323150. doi:10.1145/3311927.3323150 . pre-programmed search results. The tasks that were [4] S. Yarosh, S. Thompson, K. Watson, A. Chase, used in the pilot are fact finding and stated directly on A. Senthilkumar, Y. Yuan, A. J. B. Brush, Chil- the task sheet. The API connection would allow children dren asking questions: speech interface reformu- to search for TV fragments that they come up with them- lations and personification preferences, in: Pro- selves, which is a more open search task than fact finding. ceedings of the 17th ACM Conference on Interac- This enables us to study children in a more natural set- tion Design and Children, IDC ’18, Association for ting, where their query formulation process more closely Computing Machinery, Trondheim, Norway, 2018, reflects a realistic scenario instead of reading from a task pp. 300–312. URL: https://doi.org/10.1145/3202185. sheet. Elicitation questions may become more useful in 3202207. doi:10.1145/3202185.3202207 . this case. A more advanced keyword extraction from [5] M. Landoni, D. Matteri, E. Murgia, T. Huibers, speech method will need to be implemented as well, such M. S. Pera, Sonny, Cerca! Evaluating the Im- as the one by Habibi and Popescu-Belis [11]. The API pact of Using a Vocal Assistant to Search at School, connected robot will be tested in a similar method as the in: F. Crestani, M. Braschler, J. Savoy, A. Rauber, pilot study described above. The method compares the H. Müller, D. E. Losada, G. Heinatz Bürki, L. Cappel- style of interaction without changing other aspects about lato, N. Ferro (Eds.), Experimental IR Meets Multi- the robot between conditions. The within-subjects setup linguality, Multimodality, and Interaction, Lecture allowed children to reflect on differences between the sys- Notes in Computer Science, Springer International tems and their preference. The Smileyometers worked Publishing, Cham, 2019, pp. 101–113. doi:10.1007/ well even for participants that are younger than the tar- 978- 3- 030- 28577- 7_6 . get audience. With our pilot findings we can account for [6] S. Druga, R. Williams, C. Breazeal, M. Resnick, 1 https://www.beeldengeluid.nl ”Hey Google is it OK if I eat you?”: Initial Explo- rations in Child-Agent Interaction, in: Proceed- ings of the 2017 Conference on Interaction Design and Children, ACM, Stanford California USA, 2017, pp. 595–600. URL: https://dl.acm.org/doi/10.1145/ 3078072.3084330. doi:10.1145/3078072.3084330 . [7] H. Zamani, J. R. Trippas, J. Dalton, F. Radlinski, Con- versational Information Seeking, arXiv preprint arXiv:2201.08808 (2022). [8] J. Li, The benefit of being physically present: A survey of experimental works comparing copre- sent robots, telepresent robots and virtual agents, International Journal of Human-Computer Stud- ies 77 (2015) 23–37. URL: http://www.sciencedirect. com/science/article/pii/S107158191500004X. doi:10. 1016/j.ijhcs.2015.01.001 . [9] S. Al Moubayed, J. Beskow, G. Skantze, B. Granström, Furhat : A Back-projected Human- like Robot Head for Multiparty Human-Machine Interaction, Springer Berlin/Heidelberg, 2012, pp. 114–130. URL: http://urn.kb.se/resolve?urn=urn: nbn:se:kth:diva-105606. [10] J. C. Read, S. MacFarlane, Using the fun toolkit and other survey methods to gather opinions in child computer interaction, in: Proceedings of the 2006 conference on Interaction design and children, IDC ’06, Association for Computing Machinery, New York, NY, USA, 2006, pp. 81–88. URL: http: //doi.org/10.1145/1139073.1139096. doi:10.1145/ 1139073.1139096 . [11] M. Habibi, A. Popescu-Belis, Diverse keyword ex- traction from conversations, in: Proceedings of the 51st Annual Meeting of the Association for Compu- tational Linguistics (Volume 2: Short Papers), 2013, pp. 651–657.