<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Crowdsourcing for Research on Automatic Speech Recognition-enabled CALL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Catia Cucchiarini</string-name>
          <email>C.Cucchiarini@let.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helmer Strik</string-name>
          <email>W.Strik@let.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donders</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Radboud University</institution>
          ,
          <addr-line>Nijmegen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>24</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>Despite long-standing interest and recent innovative developments in ASR-based pronunciation instruction and CALL, there is still scepticism about the added value of ASR technology. In this paper we first review recent trends in pronunciation research and important requirements for pronunciation instruction. We go on to consider the difficulties involved in developing ASR-based systems for pronunciation instruction and the possible causes for the paucity of effectiveness studies in ASR-based CALL. We suggest that crowdsourcing could offer solutions for analyzing the large amounts of L2 speech that can be collected through ASR-based CALL applications and that are necessary for effectiveness studies. We provide a brief overview of our own research on ASR-based CALL and of the lessons we learned. Finally, we discuss possible future avenues for research and development.</p>
      </abstract>
      <kwd-group>
        <kwd>Computer Assisted Language Learning</kwd>
        <kwd>Automatic Speech Recognition</kwd>
        <kwd>Pronunciation Instruction</kwd>
        <kwd>Crowdsourcing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Speaking skills have always been considered particularly
challenging in language teaching, because of the time and
individual attention they require for practice and feedback.
This has been one of the reasons for the sustained interest
in using Automatic Speech Recognition (ASR) technology
in CALL applications. ASR technology has been around
for more than 30 years and its potential for CALL has been
emphasized from the beginning, but ASR-based CALL
systems have not really found their way in language
teaching contexts. This might have to do with a variety of
factors. The relatively high costs involved in the
development of new applications or in the acquisition of
some commercial products might have been a hurdle to
large-scale adoption, while for some products that are
available for free privacy issues might have played a role.
However, there is also another possible explanation for the
general reluctance to embrace ASR technology in CALL.
As a matter of fact, there are relatively few studies that have
thoroughly investigated the effectiveness of ASR-based
CALL in real-life environments, under realistic conditions
with real users. This also applies to pronunciation
instruction and training, which is the topic that has received
most attention in ASR-based research and development,
because of its potential for both language learning and
speech therapy applications.</p>
      <p>
        In the remainder of this paper we discuss the difficulties
involved in developing ASR-based systems for
pronunciation instruction, possible causes for the paucity of
effectiveness studies and then consider possible solutions.
In Section 2 we first discuss recent trends in pronunciation
research and requirements for pronunciation instruction.
We then consider important requirements for ASR-based
CALL research in Section 3. Sections 4 and 5 provide a
brief overview of our own research on ASR-based CALL
and crowdsourcing, respectively. Discussion and
conclusions are presented in Section 6 and 7.
In pronunciation research there are different views on what
the aim of pronunciation instruction should be. According
to the “nativeness principle”
        <xref ref-type="bibr" rid="ref15">(Levis, 2005: 370)</xref>
        ,
pronunciation instruction should help L2 learners lose any
traces of their L1 accent in order to achieve a nativelike
accent.
      </p>
      <p>
        The “intelligibility principle”, on the other hand, holds the
view that pronunciation instruction should help L2 learners
achieve intelligibility in the L2, which should be possible
even if traces of an L1 accent remain. In line with this
distinction, different constructs have been introduced in
pronunciation research
        <xref ref-type="bibr" rid="ref18 ref19">(Munro &amp; Derwing, 1995a)</xref>
        . Accent
has been taken to refer to subjective judgments of the extent
to which L2 speech is close to native speech and is usually
expressed by scalar ratings. Intelligibility has been defined
as the extent to which L2 speech can be correctly
reproduced in terms of orthographic transcription
        <xref ref-type="bibr" rid="ref18 ref19">(Munro
&amp; Derwing, 1995a)</xref>
        . A third construct, comprehensibility,
has been introduced to indicate the ease with which
listeners understand L2 speech, again expressed through
scalar ratings
        <xref ref-type="bibr" rid="ref18 ref19">(Munro &amp; Derwing, 1995a)</xref>
        . Research has
shown that communication can be successful even in the
presence of a non-native accent
        <xref ref-type="bibr" rid="ref18 ref19">(Munro &amp; Derwing,
1995b)</xref>
        . This combined with the knowledge that achieving
a nativelike accent is beyond reach for most language
learners, has led pronunciation researchers to advocate a
focus on intelligibility in pronunciation instruction as
opposed to nativeness
        <xref ref-type="bibr" rid="ref15 ref20 ref31 ref9">(Levis, 2005; 2007; Munro &amp;
Derwing 2015)</xref>
        .
      </p>
      <p>3.</p>
    </sec>
    <sec id="sec-2">
      <title>Requirements for ASR-based pronunciation research</title>
      <p>
        In line with these distinctions, pronunciation researchers
are interested in research that investigates to what extent
ASR-based pronunciation instruction contributes to
improving constructs such as accent, intelligibility or
comprehensibility of L2 learners. However, convincing
evidence is lacking
        <xref ref-type="bibr" rid="ref20 ref31 ref9">(Thomson &amp; Derwing, 2015)</xref>
        . Most of
the research on ASR-based pronunciation training has been
conducted offline on annotated speech corpora
        <xref ref-type="bibr" rid="ref1 ref8">(Cucchiarini &amp; Strik, 2017)</xref>
        . In general, such studies
evaluate the accuracy of specific algorithms
        <xref ref-type="bibr" rid="ref14 ref25 ref27">(Stanley,
Hacioglu, &amp; Pellom, 2011; Qian, Meng, Soong, 2012; Lee,
Zhang, &amp; Glass, 2013)</xref>
        in identifying pronunciation errors
or in grading L2 speech. To investigate the effectiveness of
ASR-based CALL complete systems are needed, in which
these algorithms are incorporated to provide speaking
practice and feedback on the utterances produced by L2
learners under realistic conditions. In addition, a certain
amount of learning content is needed so that learners can
practice for a sufficient amount of time. It is the kind of
longitudinal research that is needed to increase our
understanding of the contribution of ASR-based CALL to
pronunciation teaching and language learning in general.
Unfortunately, there are not so many complete systems that
employ ASR and that could be used in open, online
effectiveness research in real life conditions. This has to do
with a series of difficulties
        <xref ref-type="bibr" rid="ref1 ref8">(Cucchiarini &amp; Strik, 2017)</xref>
        .
First of all, the limited availability of large corpora that can
be used to develop, test and optimize the specific speech
technology that is required for learning applications.
Another difficulty is related to the nature of the expertise
required, which is highly varied and interdisciplinary as it
covers engineering, system design, pedagogy and language
learning. This can also pose problems in finding the
necessary funds for this type of cross-disciplinary research.
4.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our own research on ASR-based CALL</title>
      <p>
        In our own research over the last twenty years we have
pursued the goal of developing complete ASR-based
CALL systems. This research has been conducted in close
cooperation with speech technologists, language learning
researchers and teachers. The aim was to develop systems
that could be used to conduct more comprehensive research
contributing insights to both speech technology and
language learning research
        <xref ref-type="bibr" rid="ref10 ref28 ref28 ref29 ref29 ref33 ref34 ref5 ref6 ref7">(Cucchiarini et al. 2009, 2011,
2014; Strik, 2012; Strik et al. 2012; Van Doremalen et al.,
2010, 2013; 2016)</xref>
        . An important aspect in this research
was also how to boost user motivation either by providing
appealing, useful feedback
        <xref ref-type="bibr" rid="ref1 ref2 ref22 ref23 ref24 ref7">(Bodnar et al., 2016, 2017;
Cucchiarini et al., 2009; Penning de Vries et al., 2015,
2016, 2019)</xref>
        or by introducing gaming elements, see e.g.
Figure 1
        <xref ref-type="bibr" rid="ref12">(Ganzeboom et al. 2016)</xref>
        .
The more recent systems have been equipped with logging
capabilities
        <xref ref-type="bibr" rid="ref1 ref23">(Bodnar et al., 2017; Penning de Vries et al.,
2016)</xref>
        , so that they can collect huge amounts of speech data
produced by L2 learners practicing with the system, while
at the same time recording all system-user interactions.
These logged data can provide useful knowledge on
learners’ progress, increasing our insights not only into the
ultimate outcome of learning, but also into the processes
that are conducive to learning.
      </p>
      <p>One of the problems we have encountered in this research
is, however, how to process and analyze these large sets of
speech data that are produced by language learners or
patients during practice or therapy and that need to be
scored and analyzed to study the effectiveness of
ASRbased applications. To be able to provide information on
learning and effectiveness, these data need first of all to be
transcribed and/or scored, to obtain the subjective
judgments necessary to measure the constructs mentioned
above (accent, intelligibility, comprehensibility). This is
extremely time-consuming and expensive. In fact, the
amount of data is such that manual annotations are actually
not feasible. A possible alternative solution to obtain
annotations and scoring of vast amounts of speech data at
relatively low costs would then seem to be to employ
crowdsourcing, as will be explained in the next section.
5.</p>
    </sec>
    <sec id="sec-4">
      <title>Crowdsourcing for ASR-based CALL</title>
      <p>
        In ASR-based CALL pronunciation research
crowdsourcing could play a more prominent role by
providing transcriptions or intelligibility scores, which can
in turn be used for effectiveness evaluation. In our own
research, for example, we have used crowdsourcing to
obtain evaluations of intelligibility of L2 learner speech
        <xref ref-type="bibr" rid="ref26 ref3">(Burgos et al., 2015, Sanders et al., 2016)</xref>
        and pathological
speech
        <xref ref-type="bibr" rid="ref12">(Ganzeboom et al., 2016)</xref>
        .
      </p>
      <p>
        For the study described in
        <xref ref-type="bibr" rid="ref12">Ganzeboom et al. (2016)</xref>
        an
online listening experiment was carried out. Participants
were invited by email or via Facebook. They filled in a
questionnaire to gather some meta-information about
native language, gender, age, etc. In total 36 listeners
participated, 8 male and 28 female (age range 19-73), who
rated 50 utterances on intelligibility in three ways:
 Likert: 1. very low, to 7. very high
 Visual Analogue Scale (VAS): 0. very low, to
100. very high
 Orthographic Transcription (Orthog. Transc.)
 The latter was used to calculate three extra scores:
 OTW = Orthog. Transc. scored at Word level
 OTP = Orthog. Transc. scored at Phoneme level
 OTG = Orthog. Transc. scored at Grapheme level
VAS and Likert are intelligibility scores on utterance level
and were calculated as scores representing a percentage
(%) of intelligibility. The VAS scores were already on a
0-100 scale, while the scores on the 1-7 Likert scale were
transformed to percentage scores by first subtracting 1 and
then multiplying by 16.67 (i.e. 1=0%, 2=16.67%, 3=33%,
..., 7=100%).
      </p>
      <p>
        To obtain an intelligibility score at word level (OTW), we
compared the raters’ orthographic transcriptions to the
reference transcriptions, we counted the number of
identical word matches and calculated a percentage
correct score.
Intelligibility scores at the grapheme and phoneme level
(OTG and OTP, resp.) were automatically obtained from
the orthographic transcriptions through the Algorithm for
Dynamic Alignment of Phonetic Transcriptions (ADAPT)
        <xref ref-type="bibr" rid="ref10">(Elffers, et al. 2013)</xref>
        which computes the optimal alignment
between two strings of phonetic symbols using a matrix
that contains distances between the individual phonetic
symbols. For the intelligibility scores on phoneme level
(OTP), the orthographic transcriptions were converted to
their phonemic equivalent using the canonical
pronunciation variants from the lexicon of the Spoken
Dutch Corpus
        <xref ref-type="bibr" rid="ref21">(Oostdijk, 2000)</xref>
        . Some results are presented
in Table 1. For more details see
        <xref ref-type="bibr" rid="ref12">Ganzeboom et al. (2016)</xref>
        .
n = 50
Likert
VAS
OTW
OTP
OTG
      </p>
      <p>M (SD)
63.1
(21.1)
63.2
(19.0)
78.3
(16.1)
8.0 (6.5)
8.9 (7.4)</p>
      <p>VAS
.998</p>
      <p>OTW
.733
.732</p>
      <p>OTP
-.763
-.755
-.805</p>
      <p>OTG
-.773
-.764
-.869
.954
For Likert, VAS and OTW, higher scores correspond to
higher intelligibility (higher percentage correct); for OTP
and OTG lower scores correspond to lower distance and
thus higher intelligibility. All correlations were significant
(p &lt; .01).</p>
      <p>
        Important for research data in general, and especially for
data obtained by means of crowdsourcing, is their
reliability. In our study the reliability of each of the five
intelligibility measures was calculated using Intraclass
Correlation Coefficients (ICC) based on groups of raters.
The ICC values for all 36 raters together were very high,
ranging from .95 (OTP, OTG) to .97 (Likert, VAS, OTW).
As such a large number of raters may not always be
achievable, we also calculated average ICCs based on
randomly selected smaller subsets of the data (e.g. 9
subsets of 4 raters, or 6 of 6 raters). On average, for the
utterance and word level scorings sufficient reliability is
obtained with four raters (resulting in mean ICC values
ranging from .79 to .84), while for subword scorings at least
six raters are required (resulting in mean ICC values
ranging from .79 to .80).
In the L2 speech crowdsourcing experiment Palabras (see
Figure 2), a web application was developed for obtaining
transcriptions of Dutch words spoken by Spanish L2
learners that was accessible via Facebook. Participants
would listen and write down what they heard. Different
types of feedback were provided, like percentage correct,
words still to transcribe and the majority transcription
        <xref ref-type="bibr" rid="ref26">(Sanders et al. 2016)</xref>
        .
      </p>
      <p>Also in this case the quality of the data was checked by
applying filters to remove transcribers who did not conform
to our quality criteria (with other native languages than
Dutch, who did not reach our threshold of intra and inter
transcriber agreement, who entered more than once when
the server was slow in response). In total useful data were
obtained from 159 participants, which is definitely more
than would have been the case with traditional experiments.
6.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>
        So far crowdsourcing has been mainly used to produce
language resources like learner speech corpora
        <xref ref-type="bibr" rid="ref11">(Eskenazi
et al., 2013)</xref>
        , to obtain speech recordings with annotations
        <xref ref-type="bibr" rid="ref17 ref22">(Loukina et al. 2015a, b)</xref>
        , or to collect more complex and
realistic speech data such as dialogues through
conversational technologies
        <xref ref-type="bibr" rid="ref30">(Sydorenko et al. 2018)</xref>
        .
The experiences described in Section 5 would seem to be
good reasons for extending the use of crowdsourcing to the
larger sets of data that are obtained through the loggings in
ASR-based CALL systems. These would constitute an
enormous rich source of information for improving both
the technology and the learning systems. In addition, these
annotated data and speech files could be used to further
train and adapt the algorithms employed in the system and
thus to enhance the quality of the ASR technology.
This approach could be extended to ASR-based CALL that
addresses other aspects of L2 speaking to obtain
annotations of learner speaking performance, evaluations
of L2 proficiency in grammar and vocabulary or of turn
taking abilities, pragmatic competence, politeness
strategies and formulaic language in spoken dialogue
applications. An additional solution could be so-called
implicit crowdsourcing, which could be applied by
collecting additional speech data and subjective
evaluations when users engage with ASR-based CALL
systems. In other words, in this case the users of CALL
systems would form the crowd. There are some important
caveats to be taken into account, though. First of all, GDPR
puts limitations to using spoken data in crowdsourcing as
speech data are by definition sensitive data. Speech
intrinsically contains information on identity and other
personal features. Speech corpora often impose restrictions
to making speech fragments audible to the public. In any
case prior explicit consent has to be obtained for employing
user data for research and development purposes. Finally,
the reliability of the subjective data obtained through
crowdsourcing has to be checked before these data are used
for further research.
      </p>
      <p>7.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>ASR-based CALL applications hold great potential for
innovative research on language learning and future
developments for language teaching. Effectiveness studies
could help clarify their added value, but so far these studies
have been few and far between, among other things because
they require subjective judgments of large amounts of L2
speech. Crowdsourcing can be usefully applied for this
purpose. For the two crowdsourcing initiatives described in
section 5, the results were satisfactory as larger sets of data
could be annotated and scored than would have been the
case with traditional experiments. In turn these data
provided useful insights into important aspects of
intelligibility scoring measures with different degrees of
granularity. To conclude, there seem to be good reasons for
extending this approach to ASR-based CALL that
addresses other aspects of L2 speaking to obtain much
wanted subjective annotations and evaluations of learner
speaking performance.</p>
      <p>8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bodnar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Penning de Vries</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp; van Hout,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Learner affect in computerised L2 oral grammar practice with corrective feedback</article-title>
          .
          <source>Computer Assisted Language Learning</source>
          ,
          <volume>30</volume>
          ,
          <fpage>223</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bodnar</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Hout</surname>
            ,
            <given-names>R.W.N.M.</given-names>
          </string-name>
          <year>van</year>
          (
          <year>2016</year>
          ).
          <article-title>Evaluating the motivational impact of CALL systems: current practices and future directions</article-title>
          .
          <source>Computer Assisted Language Learning</source>
          ,
          <volume>29</volume>
          , (
          <issue>1</issue>
          ),
          <fpage>186</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Burgos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sanders</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hout</surname>
            , R. van; Strik,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Auris populi: crowdsourced native transcriptions of Dutch vowels spoken by adult Spanish learners</article-title>
          .
          <source>In: Proc. of Interspeech</source>
          <year>2015</year>
          ,
          <fpage>2819</fpage>
          -
          <lpage>2823</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cooke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lecumberri</surname>
            ,
            <given-names>M. L. G.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Crowdsourcing in speech perception</article-title>
          . In: M.
          <string-name>
            <surname>Eskenazi</surname>
            ,
            <given-names>G-A.</given-names>
          </string-name>
          <string-name>
            <surname>Levow</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Parent</surname>
          </string-name>
          &amp; D. Suendermann (Eds.),
          <article-title>Crowdsourcing for speech processing: Applications to data collection, transcription and assessment</article-title>
          (pp.
          <fpage>137</fpage>
          -
          <lpage>172</lpage>
          ). Somerset, GB: Wiley.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodnar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Penning de Vries, B.,
          <string-name>
            <surname>van</surname>
            <given-names>Hout</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , &amp;
            <surname>Strik</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>ASR-based CALL systems and learner speech data: new resources and opportunities for research and development in second language learning</article-title>
          .
          <source>Proceedings of LREC, Reykiavik.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heuvel</surname>
          </string-name>
          , H. van den,
          <string-name>
            <surname>Sanders</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Error selection for ASR-based English pronunciation training in 'My Pronunciation Coach'</article-title>
          .
          <source>Proceedings of Interspeech</source>
          ,
          <fpage>1165</fpage>
          -
          <lpage>1168</lpage>
          , Florence, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C</given-names>
          </string-name>
          , Neri,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Strik</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback</article-title>
          .
          <source>Speech Communication</source>
          ,
          <volume>51</volume>
          (
          <issue>10</issue>
          ),
          <fpage>853</fpage>
          -
          <lpage>863</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Automatic speech recognition for L2 pronunciation assessment and training</article-title>
          . In O. Kang,
          <string-name>
            <given-names>R.</given-names>
            <surname>Thomson</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>M. Murphy</surname>
          </string-name>
          (Eds.)
          <article-title>The Routledge handbook of English pronunciation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Derwing</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Munro</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Pronunciation fundamentals: Evidence-based perspectives for L2 teaching</article-title>
          . Amsterdam: John Benjamins.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Elffers</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Bael</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>ADAPT: Algorithm for Dynamic Alignment of Phonetic Transcriptions</article-title>
          .
          <source>Internal report, CLST</source>
          , Radboud University Nijmegen, The Netherlands.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Eskenazi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Levow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Parent</surname>
          </string-name>
          &amp; D. Suendermann (eds.) (
          <year>2013</year>
          ).
          <article-title>Crowdsourcing for speech processing: Applications to data collection, transcription assessment</article-title>
          . New York: Wiley.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Ganzeboom</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ; Bakker,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Cucchiarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Strik</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Intelligibility of Disordered Speech: Global and Detailed Scores</article-title>
          .
          <source>In: Proceedings of Interspeech</source>
          <year>2016</year>
          , pp.
          <fpage>2503</fpage>
          -
          <lpage>2507</lpage>
          ; San Francisco, CA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soong</surname>
            ,
            <given-names>F.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers</article-title>
          .
          <source>Speech Communication</source>
          ,
          <volume>67</volume>
          ,
          <fpage>154</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , &amp; J. Glass, (
          <year>2013</year>
          ).
          <article-title>Mispronunciation detection via Dynamic Time Warping on Deep Belief Network-based posteriorgrams</article-title>
          .
          <source>Proceedings ICASSP</source>
          <year>2013</year>
          , Vancouver, BC,
          <fpage>8227</fpage>
          -
          <lpage>8231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Levis</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Changing contexts and shifting paradigms in pronunciation teaching</article-title>
          .
          <source>TESOL Quarterly</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <fpage>369</fpage>
          -
          <lpage>377</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Levis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Computer technology in teaching and researching</article-title>
          .
          <source>Annual Review of Applied Linguistics</source>
          ,
          <volume>27</volume>
          ,
          <fpage>184</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Loukina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evanini</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suendermann-Oeft</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zechner</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Expert and crowdsourced annotation of pronunciation errors for automatic scoring systems</article-title>
          ,
          <source>Proceedings INTERSPEECH-2015</source>
          ,
          <fpage>2809</fpage>
          -
          <lpage>2813</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Munro</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Derwing</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          (
          <year>1995a</year>
          ).
          <article-title>Foreign accent, comprehensibility, and intelligibility in the speech of second language learners</article-title>
          .
          <source>Language Learning</source>
          ,
          <volume>45</volume>
          ,
          <fpage>73</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Munro</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Derwing</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          (
          <year>1995b</year>
          ).
          <article-title>Processing time, accent, and comprehensibility in the perception of native and foreign accented speech</article-title>
          .
          <source>Language and Speech</source>
          ,
          <volume>38</volume>
          ,
          <fpage>289</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Munro</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derwing</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Thomson</surname>
            ,
            <given-names>R. I.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Setting segmental priorities for English learners: Evidence from a longitudinal study</article-title>
          .
          <source>Int. Review of Applied Linguistics in Language Teaching</source>
          ,
          <volume>53</volume>
          (
          <issue>1</issue>
          ),
          <fpage>39</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Oostdijk</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>The Spoken Dutch Corpus: Overview and first evaluation</article-title>
          .
          <source>Proceedings of LREC</source>
          <year>2000</year>
          ,
          <volume>886</volume>
          -
          <fpage>894</fpage>
          , Athens, Greece.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Penning de Vries</surname>
            ,
            <given-names>B.W.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodnar</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Hout</surname>
            ,
            <given-names>R.W.N.M.</given-names>
          </string-name>
          <year>van</year>
          (
          <year>2015</year>
          ).
          <article-title>Spoken grammar practice and feedback in an ASR-based CALL system</article-title>
          .
          <source>Computer Assisted Language Learning</source>
          ,
          <volume>28</volume>
          (
          <issue>6</issue>
          ),
          <fpage>550</fpage>
          -
          <lpage>576</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Penning de Vries</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodnar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp; van Hout,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Effect of corrective feedback for learning verb second</article-title>
          ,
          <source>Int. Review of Applied Linguistics in Language Teaching (IRAL)</source>
          ,
          <volume>54</volume>
          (
          <issue>4</issue>
          ),
          <fpage>347</fpage>
          -
          <lpage>386</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Penning de Vries</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            , H., van Hout,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Spoken grammar practice in CALL: The effect of corrective feedback and education level in adult L2 learning</article-title>
          ,
          <source>Language Teaching Research.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soong</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>The Use of DBNHMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training</article-title>
          .
          <source>In: Proc. of Interspeech</source>
          ,
          <fpage>775</fpage>
          -
          <lpage>778</lpage>
          , Portland.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Sanders</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          ; Burgos,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Cucchiarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Hout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.W.N.M. van (2016)</given-names>
            <surname>Palabras</surname>
          </string-name>
          .
          <article-title>Crowdsourcing transcriptions of L2 speech</article-title>
          .
          <source>In: Proceedings of the Int. Conf. on Language Resources and Evaluation (LREC)</source>
          <year>2016</year>
          , pp.
          <fpage>3186</fpage>
          -
          <lpage>3191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Stanley</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hacioglu</surname>
            ,
            <given-names>K</given-names>
          </string-name>
          &amp; Pellom,
          <string-name>
            <surname>B.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Statistical machine translation framework for modeling phonological errors in computer assisted pronunciation training system</article-title>
          .
          <source>SLaTE</source>
          <year>2011</year>
          , Venice, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>ASR-based systems for language learning and therapy</article-title>
          .
          <source>International Symposium on Automatic Detection of Errors in Pronunciation Training (ISAdept)</source>
          . KTH: Stockholm, Sweden, June 6-8.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Colpaert</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Van Doremalen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>The DISCO ASR-based CALL system: practicing L2 oral skills and beyond</article-title>
          .
          <source>Proceedings LREC</source>
          , Istanbul.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Sydorenko</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smits</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evanini</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ramanarayanan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Simulated speaking environments for language learning: insights from three cases</article-title>
          .
          <source>Computer Assisted Language Learning.</source>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Thomson</surname>
            ,
            <given-names>R. I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Derwing</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>The effectiveness of L2 pronunciation instruction: A narrative review</article-title>
          .
          <source>Applied Linguistics</source>
          ,
          <volume>36</volume>
          (
          <issue>3</issue>
          ),
          <fpage>326</fpage>
          -
          <lpage>344</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Van Doremalen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boves</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strik</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Evaluating ASR-based language learning systems: A case study</article-title>
          .
          <source>Computer Assisted Language Learning</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ),
          <fpage>833</fpage>
          -
          <lpage>851</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Van Doremalen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Strik</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Optimizing automatic speech recognition for lowproficient non-native speakers</article-title>
          .
          <source>EURASIP Journal on Audio, Speech, and Music Processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Van Doremalen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cucchiarini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Strik</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Automatic pronunciation error detection in non-native speech: the case of vowel errors in Dutch</article-title>
          .
          <source>Journal of the Acoustical Society of America</source>
          ,
          <volume>134</volume>
          ,
          <fpage>1336</fpage>
          -
          <lpage>1347</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>