=Paper= {{Paper |id=Vol-2390/PaperA3 |storemode=property |title=Alexa as a CALL platform for children: Where do we start? |pdfUrl=https://ceur-ws.org/Vol-2390/PaperA3.pdf |volume=Vol-2390 |authors=Nikos Tsourakis,Manny Rayner,Hanieh Habibi,Pierre-Emmanuel Gallais,Cathy Chua,Matt Butterweck }} ==Alexa as a CALL platform for children: Where do we start?== https://ceur-ws.org/Vol-2390/PaperA3.pdf
                Alexa as a CALL Platform for Children: Where do we Start?
                              Nikos Tsourakis1 , Manny Rayner1 , Hanieh Habibi1
                          Pierre-Emmanuel Gallais2 , Cathy Chua2 , Matt Butterweck2
                                                           1
                                                    Geneva University
                                                       2
                                                  Independent researcher
                             {Nikolaos.Tsourakis,Emmanuel.Rayner,Hanieh.Habibi}@unige.ch
                       cathyc@pioneerbooks.com.au, matthias@butterweck.de, gallais2009@hotmail.fr

                                                                Abstract
Amazon’s Alexa is now widely available and shows interesting potential as a platform for hosting CALL games aimed at children. In
this paper, we describe an initial informal experiment where we created some simple CALL games and made them available to a few
child testers. We report the children’s and parents’ reactions. Our overall conclusion is that, although Alexa has many positive features,
there are still fundamental platform issues in the current version that make it very difficult to build compelling CALL games for children.
The games used will soon be freely available for download on the Alexa store.

Keywords: CALL, children, Alexa


                     1.    Introduction                                 fers a communal experience where multiple members of the
In the four years since its release, Amazon’s Alexa has                 same family or friends can interact with the device. Chil-
become a major platform for developing and deploying                    dren do not look at an individual screen and, other things
spoken language applications; according to Amazon, over                 being equal, will find it easier to collaborate with others
one hundred million Alexa-enabled devices have now been                 than they would if they were using a smartphone or tablet.
sold. Amazon’s advertising highlights the attractiveness of             Here, we report an initial experiment where we created a
the platform to children, and one only needs to spend ten               few CALL games and gave them to some children we were
minutes watching a couple of kids playing with Alexa to see             in contact with to see what happened. We recruited six
that this is not all hype. The device clearly has good acous-           children — coincidentally, all boys — aged between four
tic models for children’s speech; the far-field recognition             and ten years old and belonging to four separate families,
and hands-free operation work well, allowing children to                and gave an Echo Dot device to each family. A set of in-
do other things while talking to the device; and the default            structions was provided describing how to use the appli-
“always-on” mode eliminates start-up time. Further attrac-              cations. Our unashamedly anecdotal analysis (you have to
tive properties include a powerful and well-maintained API              start somewhere) is based on observations while the sub-
for developing and fielding applications (“skills”), and ex-            jects interacted with the system and from informal discus-
cellent scalability. We have for some time been develop-                sions with children and their parents. For reasons of space,
ing a speech-enabled CALL platform (Rayner et al., 2015)                we will focus on the three most active users, referred to here
which among other things has been used to build CALL                    as HK, JK and VT.
games for children (Baur et al., 2013; Baur, 2015), and were            The rest of the paper is structured as follows. In the next
curious to find out what we could do if we ported some of               section, we describe the CALL games used. §3. presents
this functionality to Alexa.                                            the results. The final section concludes.
Other differences when compared with conventional plat-
forms also seemed potentially positive. The smartphone                                   2.    Alexa games used
revolution in particular has paved the way for an interac-              We began by creating five simple games. Each game was
tion paradigm which proves to be a minefield of distrac-                constructed in three versions, for English, German and
tions, dominated by social media applications whose pri-                French. The games, listed in Table 1, will be available for
mary goal is to occupy as much of the user’s time as pos-               free download from the Alexa store by May 15 2019. The
sible. For example, according to a dscout Mobile Touches                structure of each game is the same; the basic strategy is
study,1 smartphone users on average touch, swipe or tap                 prompt/response, where the prompt is either a recorded au-
their phone over 2,500 times a day. The situation is no bet-            dio file (the games “Which movie?”, “Which language?”
ter on a desktop PC, where similar distractions are avail-              and “Which animal?”), or a piece of text spoken using
able. The interaction paradigm inherent in the Echo, in                 Alexa’s TTS functionality (the games “Number game” and
contrast, emphasises quick and purposeful interactions; this            “Letter game”). Each game was first developed for English,
makes it an attractive candidate platform for child-oriented            then ported to German and French by native speakers.
CALL applications, given that children are notoriously                  In some of the games, prompts are divided into “lessons”.
prone to distraction. Turn taking between the device and the            In the arithmetic game, there are four lessons, for addi-
user is normally restricted to the context of the ‘skill’, with-        tion, subtraction, multiplication and division. In the animal
out being affected by other platform events. Finally, it of-            noises and language ID games, there are two lessons called
                                                                        “easy” and “difficult”. A lesson can optionally be fur-
   1
       https://blog.dscout.com/mobile-touches                           ther divided into numbered “groups”, with the convention


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                                    17
that prompts from the low-numbered groups are presented          were frustrated by the apps’ brittleness; in later versions,
before prompts from the high-numbered groups. Games              we used the robust matching method from (Rayner et al.,
are defined using a simple spreadsheet format. The first         2017), which anecdotally gives much more user-friendly
few lines of the spreadsheet contain metadata (invocation        performance.
phrase, L1, L2, etc); the body consists of lines defining
prompt/response pairs, where the first column gives the          2.1.   Personalised courses: “Pokémon”
group number, the second the prompt, and the third the per-      The courses described above are all basically generic ones,
mitted responses. Figure 1 gives an example of a game            though to some extent they were designed with the chil-
spreadsheet.                                                     dren’s interests in mind. (One of the children, living in
                                                                 highly multilingual Geneva, is very interested in languages
                                                                 and did indeed like the language ID game). During the
                                                                 course of the experiment, it occurred to us that it would be
                                                                 possible to go a step further and design courses that were
                                                                 explicitly personalised to a single student. We explored two
                                                                 versions of this idea.
                                                                 In the first experiment, we invited one of the subjects, a
                                                                 bilingual English/French seven year old boy we will call
                                                                 HK, to design his own game. HK is a passionate devotee of
                                                                 Pokémon, and this seemed like a natural subject. The pro-
                                                                 tocol for constructing the game was devised by his mother,
  Figure 1: Top of spreadsheet defining “Number game”            AK, who acted as coordinator and secretary.
                                                                 In AK’s scenario, HK and his friend DM, a francophone
                                                                 boy of about the same age, sat facing each other, with one
At each turn, the game speaks a prompt randomly chosen           boy holding a deck of Pokémon cards oriented so that only
from the currently active group and lesson, with the pro-        he could see them; the two boys alternated roles. At each
viso that prompts do not repeat in the same session. The         turn, the boy with the cards picked a card and made up a
player can respond immediately, in which case the game ei-       French question based on the card’s text. The other boy
ther confirms and moves on to the next turn if it judges the     tried to guess the Pokémon. After some discussion, they
response correct, or else repeats the prompt if it judges it     agreed on a question which AK wrote down in the spread-
incorrect. In both cases, the game echoes back the player’s      sheet. At the end of the session, AK mailed the spreadsheet
response, using different strategies for correct and incorrect   to us, and we compiled and deployed the game.
responses. If the response is judged incorrect, the echoed       A couple of days later, HK and DM tried out their game.
content is “I heard...” followed by the speech hypothesis        The initial reaction was very positive (they were amazed
from the Alexa recogniser. If the response is judged correct,    that their content had been turned into this new form), but
the echoed content is the matched phrase from the spread-        they rapidly lost interest. There were several problems: in
sheet. Instead of responding directly to the prompt, the         addition to the generic usability issues discussed in the next
player may also use one of the following navigation com-         section, the game was, unsurprisingly, not very well de-
mands:                                                           signed, with questions that were both overlong and often
                                                                 too difficult even for Pokémon experts. There was also not
Help: Give a choice of three possible answers the first          enough content — HK and DM only managed to generate a
    time; say the correct answer the second time.                dozen questions before getting bored — and a couple of the
Repeat: Repeat the last thing the app said.                      Pokémon names hardly ever got recognised. AK encour-
                                                                 aged the children to try and identify the problems them-
Wait: Give the player more time to think.                        selves and redesign the game. They produced a second
                                                                 version, with somewhat shorter prompts, one of the hard-
Next: Skip to the next prompt.                                   to-recognise questions removed, and a little more content;
Back: Return to the previous prompt.                             but enough of the problems remained that they soon lost in-
                                                                 terest again, and could not be persuaded to produce a third
Next lesson: Skip to the next lesson.                            version.

Lesson X: (or simply “X”). Skip to the lesson called “X”,        2.2.   Personalised courses: “V’s homework”
    e.g. “Lesson: addition” or “Addition” goes to the les-
                                                                 In the second personalised course the target child, VT, was
    son labelled “addition” in the spreadsheet.
                                                                 not part of the development loop. The basic motivation
Lessons: List the names of the available lessons.                stemmed from the actual need of the child to practise small
                                                                 dialogues at home as part of his homework. Specifically,
In initial versions of the games, the decision as to whether a   in French-speaking Switzerland children start learning Ger-
response was correct or incorrect was made by performing         man at school at the age of eight. During the first few
a simple comparison between the recognition hypothesis           months of the course, they are asked to develop their gen-
and the set of correct answers defined by the spreadsheet.       eration and comprehension skills by participating in small
Feedback from the children suggested that many of them           dialogues where they alternate the two roles. Normally,


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                        18
Table 1: Main Alexa games used in experiment. All games are available from the Alexa Store and free to download; search
for the name of the game in one of the first three columns.
                       Name of the game                       #Prompts Short description
           English            French               German
        Which movie?        Quel film?         Welcher Film?      76      Guess the movie from a short clip
      Which language? Quelle langue? Welche Sprache?              88      Guess the language from a short phrase
       Which animal?       Quel animal?         Welches Tier?     25      Guess the animal from its sound
        Number game       Jeu de chiffres        Zahlenspiel     100      Practice spoken arithmetic
         Letter game       Jeu de lettres     Buchstabenspiel     40      Name things starting with a given letter



one of the parents plays the role of the conversation part-     then have to ask for help a second time at the end of the
ner. It occurred to us that it would be easy to adapt the       session to switch back to French, which was the default
Alexa framework described above and create a course that        interaction language. In short, the kids had no autonomy.
included a set of these small dialogues. The basic interac-     They complained explicitly about this.
tion pattern for a turn is as follows:
                                                                3.2.   Feedback from VT
 1. Party 1:                        In contrast to HK and JK, VT used his Alexa device mostly
                                                                for educational purposes, the “V’s homework” course from
 2. Party 1:                     § 2.2.. Having previously been exposed to the content, it
 3. Party 2:                                was straightforward for VT to complete the task, although
                                                                it was obvious that performing the interaction with one of
We used a slightly modified version of the spreadsheet for-     the parents was more engaging. VT also tried out all the
mat described above to define the course, making each dia-      games for his L1 (again French). He seemed genuinely in-
logue into a “lesson”. Prompts were realised as before us-      terested in experimenting with this new gadget and contin-
ing Alexa’s TTS voice, with the L1 “hint” part marked up        ued to play until specifically told to stop. After the session,
to be spoken more quietly. An example dialogue is shown         however, he did not ask to play with the device again.
in Table 2.
                                                                3.3.   Common feedback
          3.   Experimenting with Alexa                         Three generic problems were apparent, and the subject of
                                                                repeated complaints from all subjects. First, Alexa is cur-
3.1.   Feedback from AK, HK and JK                              rently unable to handle barge-in. Since children tended to
The most diligent users in the study were definitely AK and     interrupt the spoken output of the device anyway, we intro-
her two children, HK and JK (7 and 11 year old boys). The       duced a distinctive sound that signifies the start of each turn.
family also encouraged several of the children’s friends to     The children had no trouble understanding the purpose of
try out the Alexa games when visiting.                          the earcon and interaction worked much better once it was
We interviewed AK, HK and JK to get their impressions,          introduced, but they did not like being forced to wait until
and watched the two boys using the games. It was immedi-        the game had finished speaking before they could respond,
ately obvious that they had mastered the technical problems     and said that the games were “too slow”. Shortening the
of interacting with the games, and had played them enough       prompts as much as possible did not correct the problem.
that they knew the content quite well. Unfortunately, our       In the opposite direction, Alexa also drops out of the game
impression was that they had not in fact used the games as      and returns to top level if the user stops speaking for more
CALL tools, but only as entertainment. As already noted,        than a few seconds. Here, the best fix we could come up
the boys, members of an English family who have grown up        with was to introduce a “wait” command (essentially an ex-
in French-speaking Geneva, are bilingual English/French.        tra turn), which again improved the situation. Nonetheless,
They had almost exclusively used the English and French         the bottom line was that when the children knew the an-
versions of the games and hardly tried the German ones at       swer, they were not allowed to give it at once, and when
all, despite the fact that JK had done a year of German at      they didn’t know it, they were not allowed to pause freely,
school and might well have benefited from using the Ger-        but had to remember to say “wait”. In addition, although
man versions.                                                   Alexa’s speech recognition is very good by current stan-
We are not sure we know why HK and JK were reluctant            dards, it was not perceived as being good enough; mis-
to use the games for an educational purpose, but it certainly   recognitions added to the general feeling of frustration.
seemed possible that this is related to a current misfeature    Despite this, all parents, in particular AK, stressed that they
of Alexa: the device language can only be changed from the      saw a great deal of positive potential in Alexa, and hoped
web control panel. It cannot be changed through a voice         that later versions of games like the ones we gave them
command. Since accessing the control panel requires the         would be able to realise that potential. It seems, however,
Amazon password, which AK was unwilling to give to her          that the current platform has too many negative aspects for
children, they could not activate the German versions of        CALL games like ours to work well with children in an
the games without asking AK for assistance; they would          unsupervised home setting.


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                         19
Table 2: Sample dialogue for “V’s homework”. In each turn, the first element is the German phrase spoken by the app, the
second element is the French “hint” spoken by the app, and the third element is an example of a correct response.

 Party                            Interaction                                             (English gloss)
 1                             Guten Morgen                                              (Good morning)
 1                           Bonjour (le matin)                                          (Good morning)
 2                             Guten Morgen                                              (Good morning)
 1                              Wie heißt du?                                          (What is your name?)
 1                             Je m’appelle V                                             (My name is V)
 2                               Ich heiße V                                              (My name is V)
 1                        Was möchtest du kaufen?                                 (What would you like to buy?)
 1         J’aimerais du fromage et de la limonade, s’il vous plaı̂t   (I would like some cheese and some lemonade please)
 2                  Ich möchte Käse und Limonade bitte               (I would like some cheese and some lemonade please)
 1                                   Bitte                                                (There you are)
 1                                   Merci                                                 (Thank you)
 2                                  Danke                                                  (Thank you)



3.4.     Feedback from group session on Open Day                  is no clear turn taking mechanism, and participants can-
We carried out a short but interesting experiment in early        not easily coordinate who should speak at each time. The
November, 2018 in connection with “Futur en tous genres”,         token-passing workaround from §3.4. was a tentative rem-
a yearly Open Day organised for children of University of         edy. Another possibility would be to use Amazon’s “Echo
Geneva employees. A group of a dozen children, aged be-           buttons”.
tween ten and twelve, were scheduled to visit our lab and
spend an hour interacting with our CALL software.                           4.    Summary and conclusions
We had only two Alexa devices available; given this lim-          We have described a preliminary user study carried out to
itation, the protocol we decided to try was the following.        investigate the Amazon Echo’s potential as a CALL plat-
The Alexa device was placed on a table, with the chil-            form for children. Although the limited scope of the study
dren grouped around it in a semi-circle. The first child          and the small number of participants mean that conclusions
was handed a token; the group was told that only the child        should not be considered as more than suggestive, it seems
with the token was allowed to speak, and after speaking was       to us that the core problems we identified are inherent in the
obliged to hand the token to their right-hand neighbour. We       basic design of the current Echo and quite serious.
then launched several of the French and English versions of       On the positive side, we were interested to see that children
the games, and let the kids interact with them.                   often seemed motivated and engaged when other partici-
Somewhat to our surprise, this setup was very success-            pants interacted at the same time or when they were part
ful. The children followed the instructions without com-          of the development loop. If Amazon is able to address the
plaining, and gave every evidence of having a good time.          issues we name above, we think Alexa has a great deal of
There was a lot of smiling and laughing, and when someone         potential as a CALL platform for children.
got stuck they often received good-natured whispered help.
Alexa’s recognition functioned well, and things progressed                  5.   Bibliographical References
smoothly, with rapid passing of the token. Our impression         Baur, C., Rayner, M., and Tsourakis, N. (2013). A
was that our guests were disappointed when the hour was             textbook-based serious game for practising spoken lan-
up and could happily have stayed longer.                            guage. In Proceedings of ICERI 2013, Seville, Spain.
3.5.     Social aspects                                           Baur, C. (2015). The Potential of Interactive Speech-
                                                                    Enabled CALL in the Swiss Education System: A Large-
Finally, some general remarks. First, when children used
                                                                    Scale Experiment on the Basis of English CALL-SLT.
the device with other people — family, friends or fellow
                                                                    Ph.D. thesis, University of Geneva.
students — they seemed far more engaged in the gameplay.
                                                                  Rayner, M., Baur, C., Chua, C., Bouillon, P., and Tsourakis,
Conversely, when they were asked to play the games alone
                                                                    N. (2015). Helping non-expert users develop online spo-
they were less motivated. This is consistent with the view
                                                                    ken CALL courses. In Proceedings of the Sixth SLaTE
that Alexa devices bring together the concept of “interac-
                                                                    Workshop, Leipzig, Germany.
tions with a purpose” and the concept of “social media-
                                                                  Rayner, M., Tsourakis, N., and Gerlach, J. (2017).
tion” where two interactions happen simultaneously; one
                                                                    Lightweight spoken utterance classification with CFG,
with the device itself and one with the other participants,
                                                                    tf-idf and dynamic programming. In International Con-
the latter quite possibly being more important.
                                                                    ference on Statistical Language and Speech Processing,
Background input can be an issue as the device’s far field
                                                                    pages 143–154. Springer.
microphone often captures input coming from a distance.
Essentially, other people in the room who are not partic-
ipating in the game need to be quiet. Furthermore, there


EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands                                                         20