1. Introduction

International Journal of Human

10.1145/3078072.3084330

.kb.se/resolve?urn=urn:

Theo Huibers

t.w.c.huibers@utwente.nl 1 2

Thomas Beelen

t.h.j.beelen@utwente.nl 1 2

Khiet P. Truong

k.p.truong@utwente.nl 1 2

Roeland Ordelman

roeland.ordelman@utwente.nl 1 2

Ella Velner

p.c.velner@utwente.nl 1 2

Vanessa Evers

vanessa.evers@ntu.edu.sg 0 2 0 NTU Institute of Science and Technology for Humanity , 50 Nanyang Avenue, 639798 , Singapore, Singapore 1 University of Twente , Drienerlolaan 5, 7522NB, Enschede , The Netherlands 2 Workshop Proce dings

2017

2 10 12

Increasingly, search engines can be accessed via speech, approach employs open-ended elicitation questions to by search engines on a computer, as well as by VAs. Chil- templates that do not require any domain knowledge. often through Voice Assistants (VAs). As outlined before by Beelen et al. [1], children are insuficiently supported by technology during their search process, both dren have more dificulty in formulating efective search queries that represent their information need well due to a smaller knowledge base and vocabulary [2]. Using speech does not inherently solve these obstacles. Furthermore, most VAs provide only a limited question-answer interaction style where the agent directly tries to find results based on the initial query. This interaction style causes several issues for children. Firstly, it is necessary to put all the required context into one query, which they struggle with [3, 4]. Secondly, usually it is not possible to ask follow-up questions, a functionality that children often expect [ 5, 3]. Lastly, children are typically not supported in formulating queries, for instance by query suggestions or clarifying questions [4, 6].

are added to the memory until a threshold is

1. Introduction

LGOBE 0000-0002-5650-2830 (V. Evers); 0000-0002-9837-8639 (T. Huibers)

3. Proposed system We first develop a robot that uses simplified conversa

MICROS’22: Mixed-Initiative ConveRsatiOnal Systems workshop at leveraging multiple turns. Furthermore, we are interconversation with an introduction. It states its name, asks the home. In the future we change to a museum archive the child for theirs, and says it is pleased to meet them. search task.

Then the robot asks if the child wants help searching for information, and the child can ask a question. Since the 4.1. Method child’s speech may contain words that are not relevant to the search (such as ”uhm, let’s see”), keywords need Our study is a within-subjects comparison with two conto be extracted. In the pilot study described in section ditions. In one condition the robot uses the simplified 4, keyword extraction was done by matching against a conversational interaction style (see section 3), in the pre-programmed list of possible keywords. Detected key- other it uses a question-answer interaction style. The words are added to the robot’s memory. If the number order of the conditions was alternated between particiof keywords is below a threshold, the robot will pose an pants. The Furhat robot [9] and its software environment elicitation question. This threshold will be optimized in are used for both conditions. This keeps the two experithe future. Too few keywords lead to an unspecific rank- mental conditions similar while the style of interaction ing of results, while trying to elicit too many keywords is varied. leads to a long interaction and possibly frustration. In There are two search tasks, one for each condition the pilot study (section 4) this threshold was set to three (since condition order was alternated, each task was used keywords, which was an estimate based on the number on diferent conditions). Both tasks are factoid questions of words in the search tasks. The elicitation questions based on the work by Landoni et al. [ 5 ]. The first was are based on a simple pattern that includes the keywords to find out what hail is. The second was to find three that were extracted so far. The two question patterns are endangered animal species. The tasks were presented (translated from Dutch): in one sentence in Dutch on the task sheet. The search results that could be retrieved were pre-programmed in • What is it you want to know about [recognised this pilot, and were the same in both conditions. When keywords]? presenting results, the robot mentions the website where • What don’t you know yet about [recognised it found the information, the name of the article, and keywords]? then reads aloud a snippet of the web page.

In the question-answer condition, the robot mimics the The word ”and” is added to the list of keywords where interaction of a commercial VA. This means it first waits necessary. The robot loops over the questions and adds for a wake word, in this case “Hey robot”, or “Hey Furhat”. new keywords until the threshold is met. After this phase, Then the robot’s LED ring lights up green to signal it the robot will move on to present search results. Then the is awake, and a question can be asked. The wake word robot asks the child if it can be of any further assistance. and question can also be combined into one statement. Otherwise it goes to a closing interaction. The robot will then present results right away in the way described above. After the results the robot goes back to waiting for the wake word.

4. Pilot study

We conducted a pilot study as a first step to find out if the simplified conversational approach elicits more keywords (as described in the introduction), and how children experience such a robot. Furthermore, the pilot is a way to discover methodological issues that may still be corrected for the main study. In the pilot, we evaluated a robot that uses the simplified conversational search approach. As described in the evaluation framework by Landoni et al. [ 5 ], studies in IR for children can be described by the intended search strategy, for a particular user group, given a task, in a certain context. Our user group are children ages 10–12 years old. These children are in the final years of primary school in the Netherlands and are starting to work more on assignments such as presentations. We compare our conversational robot strategy to a traditional question-answer style interaction that is common with commercial VAs. The task in the experiments is searching information related to school subjects. The context of the searches is at school or in

4.2. Measures

The measurements consist of observational notes, Likert scales using emojis (Smileyometers [10]), logs containing a raw transcript, and interview questions. The children were also asked about their current VA usage. Observations focused on how the children behaved and spoke to the robot. The Smileyometer was used to gauge the user experience of the interactions. The questions are based on [ 5 ], they include questions on: fun, ease of finding answers, intention to use again, kindness, and ease of conversing with. The logs are a transcript of how the robot interpreted the child and how it responded. This is used to evaluate whether the approach is able to elicit more keywords from children compared to the traditional paradigm. The goal of the interview at the end is to study children’s perception of the diferences and advantages of the two systems, and to add more qualitative data on their experience

4.3. Participants & procedure

Eleven children participated in the pilot at a local af- 2,50 ter school care over two days in June 2022. Children whose parents consented joined the experiments. Our 2,00 user group are children ages 10-12, but the children that participated were mostly younger (mean age = 8.8, SD = 1,50 1.3). We decided to also let younger children participate due to the dificulty of recruiting children in our targeted 1,00 age range.

The robot was set up on a table in a separate room 0,50 with an open door to the main area. The researcher sat at 0,00 the same table and height as the child and explained that they will be talking to two versions of the same robot. It was explained that the robots may not be fully functional yet, and that the children are helping to further develop them. The children were also explained that one of the robots will start talking right away, and the other requires a wake word. The children received a task sheet that also includes the questionnaire questions. The interview happened after interacting with both robots.

4.4. Results

The first day the speech recognition often did not understand children correctly due to background noise. Therefore, the second day a headset was used, and Wizardof-Oz (undisclosed human operator) functionality was added. This way the researcher could take over when some responses were not understood correctly by the automatic speech recognition. Due to speech recogniser errors the logs were not usable for analysis.

Enjoyment Easy searching with Use again

Friendly / nice Easy talking with

Observation Many children resorted to reading the Interview Four children preferred the question-answer search task directly from the sheet. This caused the condition, while five preferred the conversational. Two queries in both conditions to be similar. Mainly for children had no preference. Some of the interesting stateyounger users the task seemed complex and they re- ments on the robots are in table 1. The statements inquired more input from the researcher. The children dicate there is a potential trade-of between eficiency around the target age seemed more comfortable with the and fun. There are also clear individual diferences, as level of complexity, working more independently. In the some children seemed to enjoy talking more, while othconversational condition, the system entered the elicita- ers preferred a fast interaction. Concerning participants’ tion question loop in many cases. Sometimes, the child VA usage, five children had no experience, three children wrongfully assumed they already provided all the words used them a few times, and three used them frequently. in the search task, and got confused by the elicitation The frequent users ask VAs about the weather, jokes, and question. Especially younger children tended to look at finding information. No efects of prior VA usage on the the researcher when they were unsure how to continue outcome could be deterined in this pilot. Some children the interaction. This also happened at the elicitation gave tips to improve the robot. These tips were: an easier questions where the researcher had to give a hint. In to understand voice, and a touch screen on the robot’s other cases the child kept the conversation going and face to be able to select search results visually as well. answered the elicitation question naturally.

Smileyometers The robots scored relatively similar in

the Smileyometers. The results suggest that the conversational robot may be more enjoyable to use and easier to ifnd answers with. Children also indicated being slightly

4.5. Conclusion and limitations

Based on the pilot, the number of elicited keywords could not yet be compared due to errors and task design. Children rated the robots quite similar but possibly find the conversational system more fun and easier to find answers with. More children preferred the conversational system. They perceived the robots as nearly equally friendly. The interview answers shed light on the individual preferences regarding the amount of conversation during the search process.

The pilot study also gave insights that influence the method of the main study. Firstly, the small sample size means the current findings have low confidence. In the main study we will increase the sample size by relying on cooperation with partners such as museums. Secondly, background noise led to errors and required the addition of Wizard-of-Oz controls. Another limitation is that most participants were younger than our target audience. A few years can have a significant developmental diference in children, therefore the interaction may be less complex for children in the target age. However, the complexity of the interaction seemed suitable for the older participants around the target age. Finally, the search tasks were mostly read aloud from the task sheet, which likely afects the naturalness of children’s queries. In the next section we describe how a diferent task in the museum context may address this.

5. Future work

The next step in creating the conversational robot is to connect it the API of the Netherlands Institute for Sound and Vision1, containing Dutch public broadcasting media. In line with the tip by one of the participants, this use case will also introduce a display for multi media search results. The API connection will enable us to study more natural search tasks and move away from pre-programmed search results. The tasks that were used in the pilot are fact finding and stated directly on the task sheet. The API connection would allow children to search for TV fragments that they come up with themselves, which is a more open search task than fact finding. This enables us to study children in a more natural setting, where their query formulation process more closely reflects a realistic scenario instead of reading from a task sheet. Elicitation questions may become more useful in this case. A more advanced keyword extraction from speech method will need to be implemented as well, such as the one by Habibi and Popescu-Belis [11]. The API connected robot will be tested in a similar method as the pilot study described above. The method compares the style of interaction without changing other aspects about the robot between conditions. The within-subjects setup allowed children to reflect on diferences between the systems and their preference. The Smileyometers worked well even for participants that are younger than the target audience. With our pilot findings we can account for

1https://www.beeldengeluid.nl

some important methodological issues. We look forward to learning more about children’s conversational search process in our main study.

Acknowledgments This research is supported by the Dutch SIDN fund https:

//www.sidn.nl/ and TKI CLICKNL funding of the Dutch Ministry of Economic Afairs https://www.clicknl.nl/. We would also like to thank the staf at BSO Partou de Vlinder for their time and cooperation.

[1]

Beelen , E. Velner,

Ordelman ,

K. P.

Truong ,

Evers , T. Huibers, Does your robot know? Enhancing children's information retrieval through spoken conversation with responsible robots , arXiv:2106 .07931 [cs] ( 2021 ). URL: http://arxiv.org/ abs/2106.07931, arXiv: 2106 . 07931 .

[2]

Hutchinson ,

Druin ,

B. B.

Bederson ,

Reuter ,

Rose ,

A. C.

Weeks , How do I find blue books about dogs? The errors and frustrations of young digital library users , Proceedings of HCII 2005 ( 2005 ) 22 - 27 . Publisher: Citeseer.

[3]

S. B.

Lovato ,

A. M.

Piper ,

E. A.

Wartella , Hey Google, Do Unicorns Exist? Conversational Agents as a Path to Answers to Children's Questions , in: Proceedings of the 18th ACM International Conference on Interaction Design and Children , IDC '19, Association for Computing Machinery, Boise, ID , USA, 2019 , pp. 301 - 313 . URL: https://doi.org/10.1145/ 3311927.3323150. doi: 10 .1145/3311927.3323150.

[4]

Yarosh ,

Thompson ,

Watson ,

Chase ,

Senthilkumar ,

Yuan ,

A. J. B.

Brush , Children asking questions: speech interface reformulations and personification preferences , in: Proceedings of the 17th ACM Conference on Interaction Design and Children , IDC '18, Association for Computing Machinery, Trondheim, Norway, 2018 , pp. 300 - 312 . URL: https://doi.org/10.1145/3202185. 3202207. doi: 10 .1145/3202185.3202207.

[5]

Landoni ,

Matteri , E. Murgia,

Huibers ,

M. S.

Pera , Sonny, Cerca! Evaluating the Impact of Using a Vocal Assistant to Search at School , in: F. Crestani,

Braschler ,

Savoy ,

Rauber ,

Müller ,

D. E.

Losada ,

G. Heinatz

Bürki ,

Cappellato , N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction, Lecture Notes in Computer Science , Springer International Publishing, Cham, 2019 , pp. 101 - 113 . doi: 10 .1007/ 978- 3- 030 - 28577- 7 _ 6 .

[6]

Druga ,

Williams ,

Breazeal , M. Resnick, ” Hey Google is it OK if I eat you ?”: Initial Explo-