=Paper=
{{Paper
|id=Vol-3318/short28
|storemode=property
|title=A Child-Friendly Approach to Spoken Conversational Search
|pdfUrl=https://ceur-ws.org/Vol-3318/short28.pdf
|volume=Vol-3318
|authors=Thomas Beelen,Khiet P. Truong,Roeland Ordelman,Ella Velner,Vanessa Evers,Theo Huibers
|dblpUrl=https://dblp.org/rec/conf/cikm/BeelenTOVEH22
}}
==A Child-Friendly Approach to Spoken Conversational Search==
<pdf width="1500px">https://ceur-ws.org/Vol-3318/short28.pdf</pdf>
<pre>
A Child-Friendly Approach to Spoken Conversational
Search
Thomas Beelen1,∗ , Khiet P. Truong1 , Roeland Ordelman1 , Ella Velner1 , Vanessa Evers2 and
Theo Huibers1
1
    University of Twente, Drienerlolaan 5, 7522NB, Enschede, The Netherlands
2
    NTU Institute of Science and Technology for Humanity, 50 Nanyang Avenue, 639798, Singapore, Singapore


1. Introduction                                                                                                                       project we first develop an agent that uses a simplified
                                                                                                                                      conversational approach. The simplified conversational
Increasingly, search engines can be accessed via speech,                                                                              approach employs open-ended elicitation questions to
often through Voice Assistants (VAs). As outlined be-                                                                                 prompt the child to talk more about their information
fore by Beelen et al. [1], children are insufficiently sup-                                                                           need (for example: ”What do you want to know about
ported by technology during their search process, both                                                                                animal species?”). The elicitation questions are based on
by search engines on a computer, as well as by VAs. Chil-                                                                             templates that do not require any domain knowledge.
dren have more difficulty in formulating effective search                                                                             Instead, the templates extract keywords from the child’s
queries that represent their information need well due                                                                                speech to generate questions. Subsequently extracted
to a smaller knowledge base and vocabulary [2]. Using                                                                                 keywords are added to the memory until a threshold is
speech does not inherently solve these obstacles. Further-                                                                            reached and the agent moves on to result presentation
more, most VAs provide only a limited question-answer                                                                                 (more on this in section 3). Secondly, we evaluate in
interaction style where the agent directly tries to find                                                                              experiments whether the simplified approach is able to
results based on the initial query. This interaction style                                                                            elicit more keywords from children compared to the tradi-
causes several issues for children. Firstly, it is necessary                                                                          tional question-answer paradigm. We also study whether
to put all the required context into one query, which they                                                                            such a robot is engaging to use because this is a precon-
struggle with [3, 4]. Secondly, usually it is not possible                                                                            dition for potential long term use. To study these topics
to ask follow-up questions, a functionality that children                                                                             we conducted a pilot study with eleven children which
often expect [5, 3]. Lastly, children are typically not                                                                               will be described in section 4. This pilot is the basis for a
supported in formulating queries, for instance by query                                                                               main study that we describe in section 5. Here we also
suggestions or clarifying questions [4, 6].                                                                                           describe a different use case, which is to search an archive
   Many researchers have thus concluded that search                                                                                   in a museum. In this case the physical embodiment is
tools should assist children in query formulation. With                                                                               especially suited to draw visitors’ attention.
this goal in mind, our research project called CHATTERS
(https://chatters-cri.github.io/), focuses on a robot for
children’s conversational search. We investigate if a spo-                                                                            2. Problem statement
ken conversational search approach (see [7]) works better
for children than the question-answer style interaction                                                                               The problem we address is that children are insufficiently
that current commercial VAs usually employ. The goal                                                                                  supported in communicating their information needs to
is to help children communicate their information need                                                                                voice-based search systems. They are required to for-
better via the conversational interaction with the system.                                                                            mulate queries that contain all necessary context in one
We opt for a physically embodied robot to provide a natu-                                                                             statement, which is too complex. We study whether a
ral conversation, and an engaging experience [8]. In this                                                                             robot that uses a back and forth conversation can help
                                                                                                                                      children communicate their need more effectively by
MICROS’22: Mixed-Initiative ConveRsatiOnal Systems workshop at                                                                        leveraging multiple turns. Furthermore, we are inter-
CIKM 2022, October 21, 2022, Atlanta, GA                                                                                              ested if a conversational approach provides a more social
∗
     Corresponding author.                                                                                                            and engaging experience. Our target audience are chil-
Envelope-Open t.h.j.beelen@utwente.nl (T. Beelen); k.p.truong@utwente.nl
                                                                                                                                      dren between 10–12 years old.
(K. P. Truong); roeland.ordelman@utwente.nl (R. Ordelman);
p.c.velner@utwente.nl (E. Velner); vanessa.evers@ntu.edu.sg
(V. Evers); t.w.c.huibers@utwente.nl (T. Huibers)
GLOBE https://thomasbeelen.com/ (T. Beelen)                                                                                           3. Proposed system
Orcid 0000-0002-5650-2830 (V. Evers); 0000-0002-9837-8639
(T. Huibers)                                                                                                                          We first develop a robot that uses simplified conversa-
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).                                                     tional search. In this interaction, the robot initiates the
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
conversation with an introduction. It states its name, asks   the home. In the future we change to a museum archive
the child for theirs, and says it is pleased to meet them.    search task.
Then the robot asks if the child wants help searching for
information, and the child can ask a question. Since the      4.1. Method
child’s speech may contain words that are not relevant
to the search (such as ”uhm, let’s see”), keywords need      Our study is a within-subjects comparison with two con-
to be extracted. In the pilot study described in section     ditions. In one condition the robot uses the simplified
4, keyword extraction was done by matching against a         conversational interaction style (see section 3), in the
pre-programmed list of possible keywords. Detected key-      other it uses a question-answer interaction style. The
words are added to the robot’s memory. If the number         order of the conditions was alternated between partici-
of keywords is below a threshold, the robot will pose an     pants. The Furhat robot [9] and its software environment
elicitation question. This threshold will be optimized in    are used for both conditions. This keeps the two experi-
the future. Too few keywords lead to an unspecific rank-     mental conditions similar while the style of interaction
ing of results, while trying to elicit too many keywords     is varied.
leads to a long interaction and possibly frustration. In        There are two search tasks, one for each condition
the pilot study (section 4) this threshold was set to three  (since condition order was alternated, each task was used
keywords, which was an estimate based on the number          on different conditions). Both tasks are factoid questions
of words in the search tasks. The elicitation questions      based on the work by Landoni et al. [5]. The first was
are based on a simple pattern that includes the keywords     to find out what hail is. The second was to find three
that were extracted so far. The two question patterns are    endangered animal species. The tasks were presented
(translated from Dutch):                                     in one sentence in Dutch on the task sheet. The search
                                                             results that could be retrieved were pre-programmed in
      • What is it you want to know about [recognised this pilot, and were the same in both conditions. When
        keywords]?                                           presenting results, the robot mentions the website where
      • What don’t you know yet about [recognised it found the information, the name of the article, and
        keywords]?                                           then reads aloud a snippet of the web page.
                                                                In the question-answer condition, the robot mimics the
The word ”and” is added to the list of keywords where
                                                             interaction of a commercial VA. This means it first waits
necessary. The robot loops over the questions and adds
                                                             for a wake word, in this case “Hey robot”, or “Hey Furhat”.
new keywords until the threshold is met. After this phase,
                                                             Then the robot’s LED ring lights up green to signal it
the robot will move on to present search results. Then the
                                                             is awake, and a question can be asked. The wake word
robot asks the child if it can be of any further assistance.
                                                             and question can also be combined into one statement.
Otherwise it goes to a closing interaction.
                                                             The robot will then present results right away in the way
                                                             described above. After the results the robot goes back to
4. Pilot study                                               waiting for the wake word.

We conducted a pilot study as a first step to find out        4.2. Measures
if the simplified conversational approach elicits more
keywords (as described in the introduction), and how          The measurements consist of observational notes, Likert
children experience such a robot. Furthermore, the pilot      scales using emojis (Smileyometers [10]), logs containing
is a way to discover methodological issues that may still     a raw transcript, and interview questions. The children
be corrected for the main study. In the pilot, we evaluated   were also asked about their current VA usage. Observa-
a robot that uses the simplified conversational search        tions focused on how the children behaved and spoke to
approach. As described in the evaluation framework            the robot. The Smileyometer was used to gauge the user
by Landoni et al. [5], studies in IR for children can be      experience of the interactions. The questions are based
described by the intended search strategy, for a particular   on [5], they include questions on: fun, ease of finding
user group, given a task, in a certain context. Our user      answers, intention to use again, kindness, and ease of
group are children ages 10–12 years old. These children       conversing with. The logs are a transcript of how the
are in the final years of primary school in the Netherlands   robot interpreted the child and how it responded. This
and are starting to work more on assignments such as          is used to evaluate whether the approach is able to elicit
presentations. We compare our conversational robot            more keywords from children compared to the traditional
strategy to a traditional question-answer style interaction   paradigm. The goal of the interview at the end is to study
that is common with commercial VAs. The task in the           children’s perception of the differences and advantages
experiments is searching information related to school        of the two systems, and to add more qualitative data on
subjects. The context of the searches is at school or in      their experience
4.3. Participants & procedure                                      3,00


Eleven children participated in the pilot at a local af- 2,50
ter school care over two days in June 2022. Children
whose parents consented joined the experiments. Our 2,00
user group are children ages 10-12, but the children that
participated were mostly younger (mean age = 8.8, SD = 1,50
1.3). We decided to also let younger children participate
                                                             1,00
due to the difficulty of recruiting children in our targeted
age range.                                                   0,50
   The robot was set up on a table in a separate room
with an open door to the main area. The researcher sat at 0,00
the same table and height as the child and explained that           Enjoyment Easy searching with Use again   Friendly / nice Easy talking with

they will be talking to two versions of the same robot. It                     Conversational               Question-answer
was explained that the robots may not be fully functional
yet, and that the children are helping to further develop Figure 1: Smileyometer outcomes pilot study
them. The children were also explained that one of the
robots will start talking right away, and the other requires
                                                             Table 1
a wake word. The children received a task sheet that Children’s statements on the two robots
also includes the questionnaire questions. The interview
                                                                    Robot                                Statements
happened after interacting with both robots.
                                                                                            It speaks directly (it is awake and thus faster).
                                                                                            It is more fun because there is more talking.
                                                                     Conversational
4.4. Results                                                                                It is easier to use.
                                                                                            It requires less talking.
The first day the speech recognition often did not under-                                   I have to wake it up first.
stand children correctly due to background noise. There-    Question-answer                 It is better because I have to talk less. It’s faster.
fore, the second day a headset was used, and Wizard-                                        I like that it turns green [after wake word].
of-Oz (undisclosed human operator) functionality was
added. This way the researcher could take over when
some responses were not understood correctly by the more likely to use it again. The robots were seen as
automatic speech recognition. Due to speech recogniser roughly equally friendly and easy to talk to. The results
errors the logs were not usable for analysis.             are shown in figure 1.

Observation Many children resorted to reading the                  Interview Four children preferred the question-answer
search task directly from the sheet. This caused the               condition, while five preferred the conversational. Two
queries in both conditions to be similar. Mainly for               children had no preference. Some of the interesting state-
younger users the task seemed complex and they re-                 ments on the robots are in table 1. The statements in-
quired more input from the researcher. The children                dicate there is a potential trade-off between efficiency
around the target age seemed more comfortable with the             and fun. There are also clear individual differences, as
level of complexity, working more independently. In the            some children seemed to enjoy talking more, while oth-
conversational condition, the system entered the elicita-          ers preferred a fast interaction. Concerning participants’
tion question loop in many cases. Sometimes, the child             VA usage, five children had no experience, three children
wrongfully assumed they already provided all the words             used them a few times, and three used them frequently.
in the search task, and got confused by the elicitation            The frequent users ask VAs about the weather, jokes, and
question. Especially younger children tended to look at            finding information. No effects of prior VA usage on the
the researcher when they were unsure how to continue               outcome could be deterined in this pilot. Some children
the interaction. This also happened at the elicitation             gave tips to improve the robot. These tips were: an easier
questions where the researcher had to give a hint. In              to understand voice, and a touch screen on the robot’s
other cases the child kept the conversation going and              face to be able to select search results visually as well.
answered the elicitation question naturally.
                                                                   4.5. Conclusion and limitations
Smileyometers The robots scored relatively similar in
the Smileyometers. The results suggest that the conversa- Based on the pilot, the number of elicited keywords could
tional robot may be more enjoyable to use and easier to not yet be compared due to errors and task design. Chil-
find answers with. Children also indicated being slightly dren rated the robots quite similar but possibly find the
conversational system more fun and easier to find an-          some important methodological issues. We look forward
swers with. More children preferred the conversational         to learning more about children’s conversational search
system. They perceived the robots as nearly equally            process in our main study.
friendly. The interview answers shed light on the indi-
vidual preferences regarding the amount of conversation
during the search process.                                     Acknowledgments
   The pilot study also gave insights that influence the
                                                               This research is supported by the Dutch SIDN fund https:
method of the main study. Firstly, the small sample size
                                                               //www.sidn.nl/ and TKI CLICKNL funding of the Dutch
means the current findings have low confidence. In the
                                                               Ministry of Economic Affairs https://www.clicknl.nl/.
main study we will increase the sample size by relying on
                                                               We would also like to thank the staff at BSO Partou de
cooperation with partners such as museums. Secondly,
                                                               Vlinder for their time and cooperation.
background noise led to errors and required the addition
of Wizard-of-Oz controls. Another limitation is that most
participants were younger than our target audience. A          References
few years can have a significant developmental difference
in children, therefore the interaction may be less complex      [1] T. Beelen, E. Velner, R. Ordelman, K. P. Truong,
for children in the target age. However, the complexity of          V. Evers, T. Huibers, Does your robot know? En-
the interaction seemed suitable for the older participants          hancing children’s information retrieval through
around the target age. Finally, the search tasks were               spoken conversation with responsible robots,
mostly read aloud from the task sheet, which likely affects         arXiv:2106.07931 [cs] (2021). URL: http://arxiv.org/
the naturalness of children’s queries. In the next section          abs/2106.07931, arXiv: 2106.07931.
we describe how a different task in the museum context          [2] H. Hutchinson, A. Druin, B. B. Bederson, K. Reuter,
may address this.                                                   A. Rose, A. C. Weeks, How do I find blue books
                                                                    about dogs? The errors and frustrations of young
                                                                    digital library users, Proceedings of HCII 2005
5. Future work                                                      (2005) 22–27. Publisher: Citeseer.
                                                                [3] S. B. Lovato, A. M. Piper, E. A. Wartella, Hey Google,
The next step in creating the conversational robot is to
                                                                    Do Unicorns Exist? Conversational Agents as a
connect it the API of the Netherlands Institute for Sound
                                                                    Path to Answers to Children’s Questions, in: Pro-
and Vision1 , containing Dutch public broadcasting me-
                                                                    ceedings of the 18th ACM International Conference
dia. In line with the tip by one of the participants, this
                                                                    on Interaction Design and Children, IDC ’19, Asso-
use case will also introduce a display for multi media
                                                                    ciation for Computing Machinery, Boise, ID, USA,
search results. The API connection will enable us to
                                                                    2019, pp. 301–313. URL: https://doi.org/10.1145/
study more natural search tasks and move away from
                                                                    3311927.3323150. doi:10.1145/3311927.3323150 .
pre-programmed search results. The tasks that were
                                                                [4] S. Yarosh, S. Thompson, K. Watson, A. Chase,
used in the pilot are fact finding and stated directly on
                                                                    A. Senthilkumar, Y. Yuan, A. J. B. Brush, Chil-
the task sheet. The API connection would allow children
                                                                    dren asking questions: speech interface reformu-
to search for TV fragments that they come up with them-
                                                                    lations and personification preferences, in: Pro-
selves, which is a more open search task than fact finding.
                                                                    ceedings of the 17th ACM Conference on Interac-
This enables us to study children in a more natural set-
                                                                    tion Design and Children, IDC ’18, Association for
ting, where their query formulation process more closely
                                                                    Computing Machinery, Trondheim, Norway, 2018,
reflects a realistic scenario instead of reading from a task
                                                                    pp. 300–312. URL: https://doi.org/10.1145/3202185.
sheet. Elicitation questions may become more useful in
                                                                    3202207. doi:10.1145/3202185.3202207 .
this case. A more advanced keyword extraction from
                                                                [5] M. Landoni, D. Matteri, E. Murgia, T. Huibers,
speech method will need to be implemented as well, such
                                                                    M. S. Pera, Sonny, Cerca! Evaluating the Im-
as the one by Habibi and Popescu-Belis [11]. The API
                                                                    pact of Using a Vocal Assistant to Search at School,
connected robot will be tested in a similar method as the
                                                                    in: F. Crestani, M. Braschler, J. Savoy, A. Rauber,
pilot study described above. The method compares the
                                                                    H. Müller, D. E. Losada, G. Heinatz Bürki, L. Cappel-
style of interaction without changing other aspects about
                                                                    lato, N. Ferro (Eds.), Experimental IR Meets Multi-
the robot between conditions. The within-subjects setup
                                                                    linguality, Multimodality, and Interaction, Lecture
allowed children to reflect on differences between the sys-
                                                                    Notes in Computer Science, Springer International
tems and their preference. The Smileyometers worked
                                                                    Publishing, Cham, 2019, pp. 101–113. doi:10.1007/
well even for participants that are younger than the tar-
                                                                    978- 3- 030- 28577- 7_6 .
get audience. With our pilot findings we can account for
                                                                [6] S. Druga, R. Williams, C. Breazeal, M. Resnick,
1
    https://www.beeldengeluid.nl                                    ”Hey Google is it OK if I eat you?”: Initial Explo-
     rations in Child-Agent Interaction, in: Proceed-
     ings of the 2017 Conference on Interaction Design
     and Children, ACM, Stanford California USA, 2017,
     pp. 595–600. URL: https://dl.acm.org/doi/10.1145/
     3078072.3084330. doi:10.1145/3078072.3084330 .
 [7] H. Zamani, J. R. Trippas, J. Dalton, F. Radlinski, Con-
     versational Information Seeking, arXiv preprint
     arXiv:2201.08808 (2022).
 [8] J. Li, The benefit of being physically present: A
     survey of experimental works comparing copre-
     sent robots, telepresent robots and virtual agents,
     International Journal of Human-Computer Stud-
     ies 77 (2015) 23–37. URL: http://www.sciencedirect.
     com/science/article/pii/S107158191500004X. doi:10.
     1016/j.ijhcs.2015.01.001 .
 [9] S. Al Moubayed, J. Beskow, G. Skantze,
     B. Granström, Furhat : A Back-projected Human-
     like Robot Head for Multiparty Human-Machine
     Interaction, Springer Berlin/Heidelberg, 2012, pp.
     114–130. URL: http://urn.kb.se/resolve?urn=urn:
     nbn:se:kth:diva-105606.
[10] J. C. Read, S. MacFarlane, Using the fun toolkit
     and other survey methods to gather opinions in
     child computer interaction, in: Proceedings of the
     2006 conference on Interaction design and children,
     IDC ’06, Association for Computing Machinery,
     New York, NY, USA, 2006, pp. 81–88. URL: http:
     //doi.org/10.1145/1139073.1139096. doi:10.1145/
     1139073.1139096 .
[11] M. Habibi, A. Popescu-Belis, Diverse keyword ex-
     traction from conversations, in: Proceedings of the
     51st Annual Meeting of the Association for Compu-
     tational Linguistics (Volume 2: Short Papers), 2013,
     pp. 651–657.

</pre>