Alexa as a CALL Platform for Children: Where do we Start? Nikos Tsourakis1 , Manny Rayner1 , Hanieh Habibi1 Pierre-Emmanuel Gallais2 , Cathy Chua2 , Matt Butterweck2 1 Geneva University 2 Independent researcher {Nikolaos.Tsourakis,Emmanuel.Rayner,Hanieh.Habibi}@unige.ch cathyc@pioneerbooks.com.au, matthias@butterweck.de, gallais2009@hotmail.fr Abstract Amazon’s Alexa is now widely available and shows interesting potential as a platform for hosting CALL games aimed at children. In this paper, we describe an initial informal experiment where we created some simple CALL games and made them available to a few child testers. We report the children’s and parents’ reactions. Our overall conclusion is that, although Alexa has many positive features, there are still fundamental platform issues in the current version that make it very difficult to build compelling CALL games for children. The games used will soon be freely available for download on the Alexa store. Keywords: CALL, children, Alexa 1. Introduction fers a communal experience where multiple members of the In the four years since its release, Amazon’s Alexa has same family or friends can interact with the device. Chil- become a major platform for developing and deploying dren do not look at an individual screen and, other things spoken language applications; according to Amazon, over being equal, will find it easier to collaborate with others one hundred million Alexa-enabled devices have now been than they would if they were using a smartphone or tablet. sold. Amazon’s advertising highlights the attractiveness of Here, we report an initial experiment where we created a the platform to children, and one only needs to spend ten few CALL games and gave them to some children we were minutes watching a couple of kids playing with Alexa to see in contact with to see what happened. We recruited six that this is not all hype. The device clearly has good acous- children — coincidentally, all boys — aged between four tic models for children’s speech; the far-field recognition and ten years old and belonging to four separate families, and hands-free operation work well, allowing children to and gave an Echo Dot device to each family. A set of in- do other things while talking to the device; and the default structions was provided describing how to use the appli- “always-on” mode eliminates start-up time. Further attrac- cations. Our unashamedly anecdotal analysis (you have to tive properties include a powerful and well-maintained API start somewhere) is based on observations while the sub- for developing and fielding applications (“skills”), and ex- jects interacted with the system and from informal discus- cellent scalability. We have for some time been develop- sions with children and their parents. For reasons of space, ing a speech-enabled CALL platform (Rayner et al., 2015) we will focus on the three most active users, referred to here which among other things has been used to build CALL as HK, JK and VT. games for children (Baur et al., 2013; Baur, 2015), and were The rest of the paper is structured as follows. In the next curious to find out what we could do if we ported some of section, we describe the CALL games used. §3. presents this functionality to Alexa. the results. The final section concludes. Other differences when compared with conventional plat- forms also seemed potentially positive. The smartphone 2. Alexa games used revolution in particular has paved the way for an interac- We began by creating five simple games. Each game was tion paradigm which proves to be a minefield of distrac- constructed in three versions, for English, German and tions, dominated by social media applications whose pri- French. The games, listed in Table 1, will be available for mary goal is to occupy as much of the user’s time as pos- free download from the Alexa store by May 15 2019. The sible. For example, according to a dscout Mobile Touches structure of each game is the same; the basic strategy is study,1 smartphone users on average touch, swipe or tap prompt/response, where the prompt is either a recorded au- their phone over 2,500 times a day. The situation is no bet- dio file (the games “Which movie?”, “Which language?” ter on a desktop PC, where similar distractions are avail- and “Which animal?”), or a piece of text spoken using able. The interaction paradigm inherent in the Echo, in Alexa’s TTS functionality (the games “Number game” and contrast, emphasises quick and purposeful interactions; this “Letter game”). Each game was first developed for English, makes it an attractive candidate platform for child-oriented then ported to German and French by native speakers. CALL applications, given that children are notoriously In some of the games, prompts are divided into “lessons”. prone to distraction. Turn taking between the device and the In the arithmetic game, there are four lessons, for addi- user is normally restricted to the context of the ‘skill’, with- tion, subtraction, multiplication and division. In the animal out being affected by other platform events. Finally, it of- noises and language ID games, there are two lessons called “easy” and “difficult”. A lesson can optionally be fur- 1 https://blog.dscout.com/mobile-touches ther divided into numbered “groups”, with the convention EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 17 that prompts from the low-numbered groups are presented were frustrated by the apps’ brittleness; in later versions, before prompts from the high-numbered groups. Games we used the robust matching method from (Rayner et al., are defined using a simple spreadsheet format. The first 2017), which anecdotally gives much more user-friendly few lines of the spreadsheet contain metadata (invocation performance. phrase, L1, L2, etc); the body consists of lines defining prompt/response pairs, where the first column gives the 2.1. Personalised courses: “Pokémon” group number, the second the prompt, and the third the per- The courses described above are all basically generic ones, mitted responses. Figure 1 gives an example of a game though to some extent they were designed with the chil- spreadsheet. dren’s interests in mind. (One of the children, living in highly multilingual Geneva, is very interested in languages and did indeed like the language ID game). During the course of the experiment, it occurred to us that it would be possible to go a step further and design courses that were explicitly personalised to a single student. We explored two versions of this idea. In the first experiment, we invited one of the subjects, a bilingual English/French seven year old boy we will call HK, to design his own game. HK is a passionate devotee of Pokémon, and this seemed like a natural subject. The pro- tocol for constructing the game was devised by his mother, Figure 1: Top of spreadsheet defining “Number game” AK, who acted as coordinator and secretary. In AK’s scenario, HK and his friend DM, a francophone boy of about the same age, sat facing each other, with one At each turn, the game speaks a prompt randomly chosen boy holding a deck of Pokémon cards oriented so that only from the currently active group and lesson, with the pro- he could see them; the two boys alternated roles. At each viso that prompts do not repeat in the same session. The turn, the boy with the cards picked a card and made up a player can respond immediately, in which case the game ei- French question based on the card’s text. The other boy ther confirms and moves on to the next turn if it judges the tried to guess the Pokémon. After some discussion, they response correct, or else repeats the prompt if it judges it agreed on a question which AK wrote down in the spread- incorrect. In both cases, the game echoes back the player’s sheet. At the end of the session, AK mailed the spreadsheet response, using different strategies for correct and incorrect to us, and we compiled and deployed the game. responses. If the response is judged incorrect, the echoed A couple of days later, HK and DM tried out their game. content is “I heard...” followed by the speech hypothesis The initial reaction was very positive (they were amazed from the Alexa recogniser. If the response is judged correct, that their content had been turned into this new form), but the echoed content is the matched phrase from the spread- they rapidly lost interest. There were several problems: in sheet. Instead of responding directly to the prompt, the addition to the generic usability issues discussed in the next player may also use one of the following navigation com- section, the game was, unsurprisingly, not very well de- mands: signed, with questions that were both overlong and often too difficult even for Pokémon experts. There was also not Help: Give a choice of three possible answers the first enough content — HK and DM only managed to generate a time; say the correct answer the second time. dozen questions before getting bored — and a couple of the Repeat: Repeat the last thing the app said. Pokémon names hardly ever got recognised. AK encour- aged the children to try and identify the problems them- Wait: Give the player more time to think. selves and redesign the game. They produced a second version, with somewhat shorter prompts, one of the hard- Next: Skip to the next prompt. to-recognise questions removed, and a little more content; Back: Return to the previous prompt. but enough of the problems remained that they soon lost in- terest again, and could not be persuaded to produce a third Next lesson: Skip to the next lesson. version. Lesson X: (or simply “X”). Skip to the lesson called “X”, 2.2. Personalised courses: “V’s homework” e.g. “Lesson: addition” or “Addition” goes to the les- In the second personalised course the target child, VT, was son labelled “addition” in the spreadsheet. not part of the development loop. The basic motivation Lessons: List the names of the available lessons. stemmed from the actual need of the child to practise small dialogues at home as part of his homework. Specifically, In initial versions of the games, the decision as to whether a in French-speaking Switzerland children start learning Ger- response was correct or incorrect was made by performing man at school at the age of eight. During the first few a simple comparison between the recognition hypothesis months of the course, they are asked to develop their gen- and the set of correct answers defined by the spreadsheet. eration and comprehension skills by participating in small Feedback from the children suggested that many of them dialogues where they alternate the two roles. Normally, EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 18 Table 1: Main Alexa games used in experiment. All games are available from the Alexa Store and free to download; search for the name of the game in one of the first three columns. Name of the game #Prompts Short description English French German Which movie? Quel film? Welcher Film? 76 Guess the movie from a short clip Which language? Quelle langue? Welche Sprache? 88 Guess the language from a short phrase Which animal? Quel animal? Welches Tier? 25 Guess the animal from its sound Number game Jeu de chiffres Zahlenspiel 100 Practice spoken arithmetic Letter game Jeu de lettres Buchstabenspiel 40 Name things starting with a given letter one of the parents plays the role of the conversation part- then have to ask for help a second time at the end of the ner. It occurred to us that it would be easy to adapt the session to switch back to French, which was the default Alexa framework described above and create a course that interaction language. In short, the kids had no autonomy. included a set of these small dialogues. The basic interac- They complained explicitly about this. tion pattern for a turn is as follows: 3.2. Feedback from VT 1. Party 1: In contrast to HK and JK, VT used his Alexa device mostly for educational purposes, the “V’s homework” course from 2. Party 1: § 2.2.. Having previously been exposed to the content, it 3. Party 2: was straightforward for VT to complete the task, although it was obvious that performing the interaction with one of We used a slightly modified version of the spreadsheet for- the parents was more engaging. VT also tried out all the mat described above to define the course, making each dia- games for his L1 (again French). He seemed genuinely in- logue into a “lesson”. Prompts were realised as before us- terested in experimenting with this new gadget and contin- ing Alexa’s TTS voice, with the L1 “hint” part marked up ued to play until specifically told to stop. After the session, to be spoken more quietly. An example dialogue is shown however, he did not ask to play with the device again. in Table 2. 3.3. Common feedback 3. Experimenting with Alexa Three generic problems were apparent, and the subject of repeated complaints from all subjects. First, Alexa is cur- 3.1. Feedback from AK, HK and JK rently unable to handle barge-in. Since children tended to The most diligent users in the study were definitely AK and interrupt the spoken output of the device anyway, we intro- her two children, HK and JK (7 and 11 year old boys). The duced a distinctive sound that signifies the start of each turn. family also encouraged several of the children’s friends to The children had no trouble understanding the purpose of try out the Alexa games when visiting. the earcon and interaction worked much better once it was We interviewed AK, HK and JK to get their impressions, introduced, but they did not like being forced to wait until and watched the two boys using the games. It was immedi- the game had finished speaking before they could respond, ately obvious that they had mastered the technical problems and said that the games were “too slow”. Shortening the of interacting with the games, and had played them enough prompts as much as possible did not correct the problem. that they knew the content quite well. Unfortunately, our In the opposite direction, Alexa also drops out of the game impression was that they had not in fact used the games as and returns to top level if the user stops speaking for more CALL tools, but only as entertainment. As already noted, than a few seconds. Here, the best fix we could come up the boys, members of an English family who have grown up with was to introduce a “wait” command (essentially an ex- in French-speaking Geneva, are bilingual English/French. tra turn), which again improved the situation. Nonetheless, They had almost exclusively used the English and French the bottom line was that when the children knew the an- versions of the games and hardly tried the German ones at swer, they were not allowed to give it at once, and when all, despite the fact that JK had done a year of German at they didn’t know it, they were not allowed to pause freely, school and might well have benefited from using the Ger- but had to remember to say “wait”. In addition, although man versions. Alexa’s speech recognition is very good by current stan- We are not sure we know why HK and JK were reluctant dards, it was not perceived as being good enough; mis- to use the games for an educational purpose, but it certainly recognitions added to the general feeling of frustration. seemed possible that this is related to a current misfeature Despite this, all parents, in particular AK, stressed that they of Alexa: the device language can only be changed from the saw a great deal of positive potential in Alexa, and hoped web control panel. It cannot be changed through a voice that later versions of games like the ones we gave them command. Since accessing the control panel requires the would be able to realise that potential. It seems, however, Amazon password, which AK was unwilling to give to her that the current platform has too many negative aspects for children, they could not activate the German versions of CALL games like ours to work well with children in an the games without asking AK for assistance; they would unsupervised home setting. EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 19 Table 2: Sample dialogue for “V’s homework”. In each turn, the first element is the German phrase spoken by the app, the second element is the French “hint” spoken by the app, and the third element is an example of a correct response. Party Interaction (English gloss) 1 Guten Morgen (Good morning) 1 Bonjour (le matin) (Good morning) 2 Guten Morgen (Good morning) 1 Wie heißt du? (What is your name?) 1 Je m’appelle V (My name is V) 2 Ich heiße V (My name is V) 1 Was möchtest du kaufen? (What would you like to buy?) 1 J’aimerais du fromage et de la limonade, s’il vous plaı̂t (I would like some cheese and some lemonade please) 2 Ich möchte Käse und Limonade bitte (I would like some cheese and some lemonade please) 1 Bitte (There you are) 1 Merci (Thank you) 2 Danke (Thank you) 3.4. Feedback from group session on Open Day is no clear turn taking mechanism, and participants can- We carried out a short but interesting experiment in early not easily coordinate who should speak at each time. The November, 2018 in connection with “Futur en tous genres”, token-passing workaround from §3.4. was a tentative rem- a yearly Open Day organised for children of University of edy. Another possibility would be to use Amazon’s “Echo Geneva employees. A group of a dozen children, aged be- buttons”. tween ten and twelve, were scheduled to visit our lab and spend an hour interacting with our CALL software. 4. Summary and conclusions We had only two Alexa devices available; given this lim- We have described a preliminary user study carried out to itation, the protocol we decided to try was the following. investigate the Amazon Echo’s potential as a CALL plat- The Alexa device was placed on a table, with the chil- form for children. Although the limited scope of the study dren grouped around it in a semi-circle. The first child and the small number of participants mean that conclusions was handed a token; the group was told that only the child should not be considered as more than suggestive, it seems with the token was allowed to speak, and after speaking was to us that the core problems we identified are inherent in the obliged to hand the token to their right-hand neighbour. We basic design of the current Echo and quite serious. then launched several of the French and English versions of On the positive side, we were interested to see that children the games, and let the kids interact with them. often seemed motivated and engaged when other partici- Somewhat to our surprise, this setup was very success- pants interacted at the same time or when they were part ful. The children followed the instructions without com- of the development loop. If Amazon is able to address the plaining, and gave every evidence of having a good time. issues we name above, we think Alexa has a great deal of There was a lot of smiling and laughing, and when someone potential as a CALL platform for children. got stuck they often received good-natured whispered help. Alexa’s recognition functioned well, and things progressed 5. Bibliographical References smoothly, with rapid passing of the token. Our impression Baur, C., Rayner, M., and Tsourakis, N. (2013). A was that our guests were disappointed when the hour was textbook-based serious game for practising spoken lan- up and could happily have stayed longer. guage. In Proceedings of ICERI 2013, Seville, Spain. 3.5. Social aspects Baur, C. (2015). The Potential of Interactive Speech- Enabled CALL in the Swiss Education System: A Large- Finally, some general remarks. First, when children used Scale Experiment on the Basis of English CALL-SLT. the device with other people — family, friends or fellow Ph.D. thesis, University of Geneva. students — they seemed far more engaged in the gameplay. Rayner, M., Baur, C., Chua, C., Bouillon, P., and Tsourakis, Conversely, when they were asked to play the games alone N. (2015). Helping non-expert users develop online spo- they were less motivated. This is consistent with the view ken CALL courses. In Proceedings of the Sixth SLaTE that Alexa devices bring together the concept of “interac- Workshop, Leipzig, Germany. tions with a purpose” and the concept of “social media- Rayner, M., Tsourakis, N., and Gerlach, J. (2017). tion” where two interactions happen simultaneously; one Lightweight spoken utterance classification with CFG, with the device itself and one with the other participants, tf-idf and dynamic programming. In International Con- the latter quite possibly being more important. ference on Statistical Language and Speech Processing, Background input can be an issue as the device’s far field pages 143–154. Springer. microphone often captures input coming from a distance. Essentially, other people in the room who are not partic- ipating in the game need to be quiet. Furthermore, there EnetCollect WG3&WG5 Meeting, 24-25 October 2018, Leiden, Netherlands 20