<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Alexa as a CALL Platform for Children: Where do we Start?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikos Tsourakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manny Rayner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanieh Habibi</string-name>
          <email>Hanieh.Habibig@unige.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre-Emmanuel Gallais</string-name>
          <email>gallais2009@hotmail.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cathy Chua</string-name>
          <email>cathyc@pioneerbooks.com.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matt Butterweck</string-name>
          <email>matthias@butterweck.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Geneva University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Independent researcher</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>24</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>Amazon's Alexa is now widely available and shows interesting potential as a platform for hosting CALL games aimed at children. In this paper, we describe an initial informal experiment where we created some simple CALL games and made them available to a few child testers. We report the children's and parents' reactions. Our overall conclusion is that, although Alexa has many positive features, there are still fundamental platform issues in the current version that make it very difficult to build compelling CALL games for children. The games used will soon be freely available for download on the Alexa store.</p>
      </abstract>
      <kwd-group>
        <kwd>CALL</kwd>
        <kwd>children</kwd>
        <kwd>Alexa</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the four years since its release, Amazon’s Alexa has
become a major platform for developing and deploying
spoken language applications; according to Amazon, over
one hundred million Alexa-enabled devices have now been
sold. Amazon’s advertising highlights the attractiveness of
the platform to children, and one only needs to spend ten
minutes watching a couple of kids playing with Alexa to see
that this is not all hype. The device clearly has good
acoustic models for children’s speech; the far-field recognition
and hands-free operation work well, allowing children to
do other things while talking to the device; and the default
“always-on” mode eliminates start-up time. Further
attractive properties include a powerful and well-maintained API
for developing and fielding applications (“skills”), and
excellent scalability. We have for some time been
developing a speech-enabled CALL platform
        <xref ref-type="bibr" rid="ref3">(Rayner et al., 2015)</xref>
        which among other things has been used to build CALL
games for children
        <xref ref-type="bibr" rid="ref1 ref2 ref3">(Baur et al., 2013; Baur, 2015)</xref>
        , and were
curious to find out what we could do if we ported some of
this functionality to Alexa.
      </p>
      <p>Other differences when compared with conventional
platforms also seemed potentially positive. The smartphone
revolution in particular has paved the way for an
interaction paradigm which proves to be a minefield of
distractions, dominated by social media applications whose
primary goal is to occupy as much of the user’s time as
possible. For example, according to a dscout Mobile Touches
study,1 smartphone users on average touch, swipe or tap
their phone over 2,500 times a day. The situation is no
better on a desktop PC, where similar distractions are
available. The interaction paradigm inherent in the Echo, in
contrast, emphasises quick and purposeful interactions; this
makes it an attractive candidate platform for child-oriented
CALL applications, given that children are notoriously
prone to distraction. Turn taking between the device and the
user is normally restricted to the context of the ‘skill’,
without being affected by other platform events. Finally, it
of1https://blog.dscout.com/mobile-touches
fers a communal experience where multiple members of the
same family or friends can interact with the device.
Children do not look at an individual screen and, other things
being equal, will find it easier to collaborate with others
than they would if they were using a smartphone or tablet.
Here, we report an initial experiment where we created a
few CALL games and gave them to some children we were
in contact with to see what happened. We recruited six
children — coincidentally, all boys — aged between four
and ten years old and belonging to four separate families,
and gave an Echo Dot device to each family. A set of
instructions was provided describing how to use the
applications. Our unashamedly anecdotal analysis (you have to
start somewhere) is based on observations while the
subjects interacted with the system and from informal
discussions with children and their parents. For reasons of space,
we will focus on the three most active users, referred to here
as HK, JK and VT.</p>
      <p>The rest of the paper is structured as follows. In the next
section, we describe the CALL games used. x3. presents
the results. The final section concludes.</p>
      <p>2.</p>
    </sec>
    <sec id="sec-2">
      <title>Alexa games used</title>
      <p>We began by creating five simple games. Each game was
constructed in three versions, for English, German and
French. The games, listed in Table 1, will be available for
free download from the Alexa store by May 15 2019. The
structure of each game is the same; the basic strategy is
prompt/response, where the prompt is either a recorded
audio file (the games “Which movie?”, “Which language?”
and “Which animal?”), or a piece of text spoken using
Alexa’s TTS functionality (the games “Number game” and
“Letter game”). Each game was first developed for English,
then ported to German and French by native speakers.
In some of the games, prompts are divided into “lessons”.
In the arithmetic game, there are four lessons, for
addition, subtraction, multiplication and division. In the animal
noises and language ID games, there are two lessons called
“easy” and “difficult”. A lesson can optionally be
further divided into numbered “groups”, with the convention
that prompts from the low-numbered groups are presented
before prompts from the high-numbered groups. Games
are defined using a simple spreadsheet format. The first
few lines of the spreadsheet contain metadata (invocation
phrase, L1, L2, etc); the body consists of lines defining
prompt/response pairs, where the first column gives the
group number, the second the prompt, and the third the
permitted responses. Figure 1 gives an example of a game
spreadsheet.
At each turn, the game speaks a prompt randomly chosen
from the currently active group and lesson, with the
proviso that prompts do not repeat in the same session. The
player can respond immediately, in which case the game
either confirms and moves on to the next turn if it judges the
response correct, or else repeats the prompt if it judges it
incorrect. In both cases, the game echoes back the player’s
response, using different strategies for correct and incorrect
responses. If the response is judged incorrect, the echoed
content is “I heard...” followed by the speech hypothesis
from the Alexa recogniser. If the response is judged correct,
the echoed content is the matched phrase from the
spreadsheet. Instead of responding directly to the prompt, the
player may also use one of the following navigation
commands:
Help: Give a choice of three possible answers the first
time; say the correct answer the second time.</p>
      <sec id="sec-2-1">
        <title>Repeat: Repeat the last thing the app said.</title>
      </sec>
      <sec id="sec-2-2">
        <title>Wait: Give the player more time to think.</title>
      </sec>
      <sec id="sec-2-3">
        <title>Next: Skip to the next prompt.</title>
      </sec>
      <sec id="sec-2-4">
        <title>Back: Return to the previous prompt.</title>
      </sec>
      <sec id="sec-2-5">
        <title>Next lesson: Skip to the next lesson.</title>
        <p>Lesson X: (or simply “X”). Skip to the lesson called “X”,
e.g. “Lesson: addition” or “Addition” goes to the
lesson labelled “addition” in the spreadsheet.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Lessons: List the names of the available lessons.</title>
        <p>
          In initial versions of the games, the decision as to whether a
response was correct or incorrect was made by performing
a simple comparison between the recognition hypothesis
and the set of correct answers defined by the spreadsheet.
Feedback from the children suggested that many of them
were frustrated by the apps’ brittleness; in later versions,
we used the robust matching method from
          <xref ref-type="bibr" rid="ref4">(Rayner et al.,
2017)</xref>
          , which anecdotally gives much more user-friendly
performance.
2.1.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Personalised courses: “Poke´mon”</title>
      <p>The courses described above are all basically generic ones,
though to some extent they were designed with the
children’s interests in mind. (One of the children, living in
highly multilingual Geneva, is very interested in languages
and did indeed like the language ID game). During the
course of the experiment, it occurred to us that it would be
possible to go a step further and design courses that were
explicitly personalised to a single student. We explored two
versions of this idea.</p>
      <p>In the first experiment, we invited one of the subjects, a
bilingual English/French seven year old boy we will call
HK, to design his own game. HK is a passionate devotee of
Poke´mon, and this seemed like a natural subject. The
protocol for constructing the game was devised by his mother,
AK, who acted as coordinator and secretary.</p>
      <p>In AK’s scenario, HK and his friend DM, a francophone
boy of about the same age, sat facing each other, with one
boy holding a deck of Poke´mon cards oriented so that only
he could see them; the two boys alternated roles. At each
turn, the boy with the cards picked a card and made up a
French question based on the card’s text. The other boy
tried to guess the Poke´mon. After some discussion, they
agreed on a question which AK wrote down in the
spreadsheet. At the end of the session, AK mailed the spreadsheet
to us, and we compiled and deployed the game.
A couple of days later, HK and DM tried out their game.
The initial reaction was very positive (they were amazed
that their content had been turned into this new form), but
they rapidly lost interest. There were several problems: in
addition to the generic usability issues discussed in the next
section, the game was, unsurprisingly, not very well
designed, with questions that were both overlong and often
too difficult even for Poke´mon experts. There was also not
enough content — HK and DM only managed to generate a
dozen questions before getting bored — and a couple of the
Poke´mon names hardly ever got recognised. AK
encouraged the children to try and identify the problems
themselves and redesign the game. They produced a second
version, with somewhat shorter prompts, one of the
hardto-recognise questions removed, and a little more content;
but enough of the problems remained that they soon lost
interest again, and could not be persuaded to produce a third
version.
2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Personalised courses: “V’s homework”</title>
      <p>In the second personalised course the target child, VT, was
not part of the development loop. The basic motivation
stemmed from the actual need of the child to practise small
dialogues at home as part of his homework. Specifically,
in French-speaking Switzerland children start learning
German at school at the age of eight. During the first few
months of the course, they are asked to develop their
generation and comprehension skills by participating in small
dialogues where they alternate the two roles. Normally,
one of the parents plays the role of the conversation
partner. It occurred to us that it would be easy to adapt the
Alexa framework described above and create a course that
included a set of these small dialogues. The basic
interaction pattern for a turn is as follows:</p>
      <sec id="sec-4-1">
        <title>1. Party 1: &lt;poses a question in German&gt;</title>
      </sec>
      <sec id="sec-4-2">
        <title>2. Party 1: &lt;gives a hint answer in French&gt;</title>
      </sec>
      <sec id="sec-4-3">
        <title>3. Party 2: &lt;responds in German&gt;</title>
        <p>We used a slightly modified version of the spreadsheet
format described above to define the course, making each
dialogue into a “lesson”. Prompts were realised as before
using Alexa’s TTS voice, with the L1 “hint” part marked up
to be spoken more quietly. An example dialogue is shown
in Table 2.</p>
        <p>3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimenting with Alexa</title>
      <p>3.1.</p>
    </sec>
    <sec id="sec-6">
      <title>Feedback from AK, HK and JK</title>
      <p>The most diligent users in the study were definitely AK and
her two children, HK and JK (7 and 11 year old boys). The
family also encouraged several of the children’s friends to
try out the Alexa games when visiting.</p>
      <p>We interviewed AK, HK and JK to get their impressions,
and watched the two boys using the games. It was
immediately obvious that they had mastered the technical problems
of interacting with the games, and had played them enough
that they knew the content quite well. Unfortunately, our
impression was that they had not in fact used the games as
CALL tools, but only as entertainment. As already noted,
the boys, members of an English family who have grown up
in French-speaking Geneva, are bilingual English/French.
They had almost exclusively used the English and French
versions of the games and hardly tried the German ones at
all, despite the fact that JK had done a year of German at
school and might well have benefited from using the
German versions.</p>
      <p>We are not sure we know why HK and JK were reluctant
to use the games for an educational purpose, but it certainly
seemed possible that this is related to a current misfeature
of Alexa: the device language can only be changed from the
web control panel. It cannot be changed through a voice
command. Since accessing the control panel requires the
Amazon password, which AK was unwilling to give to her
children, they could not activate the German versions of
the games without asking AK for assistance; they would
then have to ask for help a second time at the end of the
session to switch back to French, which was the default
interaction language. In short, the kids had no autonomy.
They complained explicitly about this.
3.2.</p>
    </sec>
    <sec id="sec-7">
      <title>Feedback from VT</title>
      <p>In contrast to HK and JK, VT used his Alexa device mostly
for educational purposes, the “V’s homework” course from
x 2.2.. Having previously been exposed to the content, it
was straightforward for VT to complete the task, although
it was obvious that performing the interaction with one of
the parents was more engaging. VT also tried out all the
games for his L1 (again French). He seemed genuinely
interested in experimenting with this new gadget and
continued to play until specifically told to stop. After the session,
however, he did not ask to play with the device again.
3.3.</p>
    </sec>
    <sec id="sec-8">
      <title>Common feedback</title>
      <p>Three generic problems were apparent, and the subject of
repeated complaints from all subjects. First, Alexa is
currently unable to handle barge-in. Since children tended to
interrupt the spoken output of the device anyway, we
introduced a distinctive sound that signifies the start of each turn.
The children had no trouble understanding the purpose of
the earcon and interaction worked much better once it was
introduced, but they did not like being forced to wait until
the game had finished speaking before they could respond,
and said that the games were “too slow”. Shortening the
prompts as much as possible did not correct the problem.
In the opposite direction, Alexa also drops out of the game
and returns to top level if the user stops speaking for more
than a few seconds. Here, the best fix we could come up
with was to introduce a “wait” command (essentially an
extra turn), which again improved the situation. Nonetheless,
the bottom line was that when the children knew the
answer, they were not allowed to give it at once, and when
they didn’t know it, they were not allowed to pause freely,
but had to remember to say “wait”. In addition, although
Alexa’s speech recognition is very good by current
standards, it was not perceived as being good enough;
misrecognitions added to the general feeling of frustration.
Despite this, all parents, in particular AK, stressed that they
saw a great deal of positive potential in Alexa, and hoped
that later versions of games like the ones we gave them
would be able to realise that potential. It seems, however,
that the current platform has too many negative aspects for
CALL games like ours to work well with children in an
unsupervised home setting.</p>
      <p>Interaction</p>
      <p>Guten Morgen
Bonjour (le matin)</p>
      <p>Guten Morgen
Wie heißt du?
Je m’appelle V</p>
      <p>Ich heiße V</p>
      <p>Was mo¨chtest du kaufen?
J’aimerais du fromage et de la limonade, s’il vous plaˆıt
Ich mo¨chte Ka¨se und Limonade bitte</p>
      <p>Bitte
Merci
Danke
(English gloss)
(Good morning)
(Good morning)
(Good morning)
(What is your name?)
(My name is V)
(My name is V)
(What would you like to buy?)
(I would like some cheese and some lemonade please)
(I would like some cheese and some lemonade please)
(There you are)
(Thank you)
(Thank you)</p>
    </sec>
    <sec id="sec-9">
      <title>3.4. Feedback from group session on Open Day</title>
      <p>We carried out a short but interesting experiment in early
November, 2018 in connection with “Futur en tous genres”,
a yearly Open Day organised for children of University of
Geneva employees. A group of a dozen children, aged
between ten and twelve, were scheduled to visit our lab and
spend an hour interacting with our CALL software.
We had only two Alexa devices available; given this
limitation, the protocol we decided to try was the following.
The Alexa device was placed on a table, with the
children grouped around it in a semi-circle. The first child
was handed a token; the group was told that only the child
with the token was allowed to speak, and after speaking was
obliged to hand the token to their right-hand neighbour. We
then launched several of the French and English versions of
the games, and let the kids interact with them.</p>
      <p>Somewhat to our surprise, this setup was very
successful. The children followed the instructions without
complaining, and gave every evidence of having a good time.
There was a lot of smiling and laughing, and when someone
got stuck they often received good-natured whispered help.
Alexa’s recognition functioned well, and things progressed
smoothly, with rapid passing of the token. Our impression
was that our guests were disappointed when the hour was
up and could happily have stayed longer.</p>
    </sec>
    <sec id="sec-10">
      <title>3.5. Social aspects</title>
      <p>Finally, some general remarks. First, when children used
the device with other people — family, friends or fellow
students — they seemed far more engaged in the gameplay.
Conversely, when they were asked to play the games alone
they were less motivated. This is consistent with the view
that Alexa devices bring together the concept of
“interactions with a purpose” and the concept of “social
mediation” where two interactions happen simultaneously; one
with the device itself and one with the other participants,
the latter quite possibly being more important.</p>
      <p>Background input can be an issue as the device’s far field
microphone often captures input coming from a distance.
Essentially, other people in the room who are not
participating in the game need to be quiet. Furthermore, there
is no clear turn taking mechanism, and participants
cannot easily coordinate who should speak at each time. The
token-passing workaround from x3.4. was a tentative
remedy. Another possibility would be to use Amazon’s “Echo
buttons”.</p>
      <p>4.
We have described a preliminary user study carried out to
investigate the Amazon Echo’s potential as a CALL
platform for children. Although the limited scope of the study
and the small number of participants mean that conclusions
should not be considered as more than suggestive, it seems
to us that the core problems we identified are inherent in the
basic design of the current Echo and quite serious.
On the positive side, we were interested to see that children
often seemed motivated and engaged when other
participants interacted at the same time or when they were part
of the development loop. If Amazon is able to address the
issues we name above, we think Alexa has a great deal of
potential as a CALL platform for children.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Baur</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rayner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tsourakis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A textbook-based serious game for practising spoken language</article-title>
          .
          <source>In Proceedings of ICERI</source>
          <year>2013</year>
          , Seville, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Baur</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>The Potential of Interactive SpeechEnabled CALL in the Swiss Education System: A LargeScale Experiment on the Basis of English CALL-SLT</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Geneva.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Rayner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baur</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chua</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouillon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tsourakis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Helping non-expert users develop online spoken CALL courses</article-title>
          .
          <source>In Proceedings of the Sixth SLaTE Workshop</source>
          , Leipzig, Germany.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Rayner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsourakis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gerlach</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Lightweight spoken utterance classification with CFG, tf-idf and dynamic programming</article-title>
          .
          <source>In International Conference on Statistical Language and Speech Processing</source>
          , pages
          <fpage>143</fpage>
          -
          <lpage>154</lpage>
          . Springer.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>