<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Phonetic Transcription by Untrained Annotators</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oldřich Krůza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2203</volume>
      <fpage>35</fpage>
      <lpage>40</lpage>
      <abstract>
        <p>The paper presents an application for lay, untrained users to generate high-quality, aligned phonetic transcription of speech. The application has been in use for several years and has served to transcribe over 600 thousand word forms over two versions of a web interface. We present measures for compensating the lack of expert training.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <sec id="sec-1-1">
        <title>Our Setting</title>
        <p>
          The work presented in this paper is a part of
the project that tends the spoken corpus of Karel
Makoň[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The corpus is of the single speaker and
has been recorded in amateur conditions, while the
author was speaking to his friends about a novel way
to interpret the teaching of Jesus and of mystic and
spirituality in general. Karel Makoň died in 1993 and
a community of favorers of his teachings has
persevered since then.
        </p>
        <p>The talks can be seen as companions to Makoň’s
written works. Together they form a unique,
extensive, consistent systematization of the spiritual path
tailored to modern westerners and accessible
primarily to Czech speakers. It draws heavily on traditional
Christian mysticism as well as ancient tradition of
India and China, adapting them for the present. The
whole system can be seen as a manual for entering
the eternal life prior to the physical death.</p>
        <p>There are over 1000 hours of digitized recordings
of Karel Makoň, they are accessible under the
CCBY license and the project aims at bringing the most
benefit out of them. The first step was digitizing the
recordings from the original magnetic tapes, the
second step was releasing all of them on the world-wide
web, the third step was developing a web-based
system for human / machine transcription of the bulk,
allowing for search.</p>
        <p>The transcription we do is both phonetic and
orthographic.1 Our users are supposed provide
orthographic transcription where the pronunciation is
standard and phonetic otherwise.</p>
        <p>1There is no actual focus on orthography. Instead, we mean
the natural way of transcribing the speech to human-readable
text. Where it matters, focus is directed at precise
correspondence with the utterances instead of language cleanliness.
1.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Architecture Overview</title>
        <sec id="sec-1-2-1">
          <title>The system consists of</title>
          <p>
            1. The corpus in compressed audio format. We
use mp3 and ogg/vorbis to accomodate most
browsers. These data are hosted on an external
CDN.
2. The exact copy of the corpus in parametrized
(MFCC) format. These data reside on the
backend server.
3. A complete, aligned transcription of the
recordings, hosted on the back-end server and mirrored
on a CDN.
4. Acoustic model trained on
transcribed part of the corpus.
the
human5. Language model trained using Srilm[
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] on a
combination of publicly available Czech texts, Karel
Makoň’s written works, and both the
humansubmitted and automatically-acquired
transcription.
6. Back-end API for collecting correcions to the
transcription, serving the transcription and
allowing full-text search with elasticsearch2.
7. Separately hosted front-end web application
serving as an interface for playing the recordings,
synchronously displaying the transcriptions and
collecting the corrections from users.
          </p>
          <p>To get the initial transcription, we have manually
transcribed some 10 minutes of the material using
Transcriber3, trained an acoustic model on it and
recognized the whole data using it.
2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Annotator Expertise</title>
      <p>Our case is on the edge of what can be called linguistic
data annotation. In our lucky part of the world where
alphabetization nears 100%, transcription of speech is
hardly expert work. On the other hand, ensuring that
the transcription exactly matches the audio
2https://www.elastic.co/products/elasticsearch
3http://trans.sourceforge.net/
• as a representation of the words uttered and of
their meaning,
• on the phonetic level, phone for phoneme,4
• on the time axis
is beyond what can be expected from an untrained
user.</p>
      <p>
        Linguistic data annotation in general requires
trained personnel. If we only look at the Prague
Dependency Treebank, we can notice the annotators
provided such a degree of expertise they have become the
co-authors[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Crowdsourcing, community-driven approach or
engaging volunteers is an ever stronger, popular way
of obtaining assets that would otherwise be
unbearably costly. Let us mention for example Mihalcea
(2004)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] who delegates word-sense disambiguation to
volunteers. The Wikicorpus[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as well as the MASC[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
gather annotation from volunteers.
      </p>
      <p>In most cases, quality is very important for data
annotation, so some kind of control is essential, no
matter how expert the annotators. Trivially, the less
expertise, the more control is needed.
2.1</p>
      <sec id="sec-2-1">
        <title>Quality Control</title>
        <p>A common way of dealing with quality control is to
inspect annotator agreement. This has the huge
downside that every piece of data must be annotated at
least twice, which reduces the yield by 50+%.</p>
        <p>There is another reason not to use it in our case.
Our application is designed for people who want to
listen to the recordings out of interest and their
contribution to the quality of the transcription is more
of a by-product. It would be hard to convince them
to choose exactly a recording that another user has
already transcribed.</p>
        <p>Luckily, we can implement automatic measures to
aid the annotators to deliver higher-quality
transcription.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Forced Alignment</title>
        <p>We always assume an existing transcription, so we can
see the user’s contribution as a correction. Every
submission has the form of replacing a text segment with
another. Since the transcriptions are time-aligned to
the audio, we also know exactly what is the
corresponding audio segment to the text submitted.</p>
        <p>This enables us to perform forced alignment on the
submitted text and the audio. With a well selected
pruning threshold, we can distinguish false
transcriptions and reject them, providing feedback to the
contibutor. Since every segment of audio fits the acoustic
4In the sense that each written phoneme corresponds to
exactly one uttered phone.
model to a diferent degree, both false positives and
false negatives will inevitably occur.</p>
        <p>False positives (when the system accepts a wrong
transcription) present a problem, since the error will
enter the training data set. But users can often
circumvent false negatives by submitting the
transcription divided in diferent segments. Of course, this
method can also be used to force a wrong
transcription but we assume no malevolence on the part of the
users.</p>
        <p>Apart from catching wrong transcription, the
forced alignment mechanism provides exact
synchronization on the time axis. This is a completely
missing element in the case of virtually all programs for
computer-aided transcription. For some examples,
Transcriber, a veteran open-source transcribing
program for Linux, expects the user to provide alignment
on the level of phrases; Transcribe,5 a commercial
web-based transcribing tool, allows the user to add
timestamps anywhere in the text. There is no
acoustic model, hence nothing to match against.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Phonetic Transcription</title>
      <sec id="sec-3-1">
        <title>Purpose</title>
        <p>We have originally built the acoustic model using
HTK,6 the Hidden Markov Model Toolkit. Here,
explicit phonetically labeled training data are
necessary for training. We are switching to DNN, using
Mozilla’s DeepSpeech,7 where no explicit phonetic
annotation is needed but for some purposes like forced
alignment, the original HMM is still irreplaceable.</p>
        <p>Also, the phonetic labeling is valuable per se for
research purposes.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Phoneme Set</title>
        <p>
          We use a subset of PACal[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We shall also refer to
individual phonemes in this paper using the PACal
notation in monospace font. For reference, Table 1
lists the phonemes used with their IPA notation.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Acquisition</title>
        <p>
          The phonetic transcription is in normal case also a
product of forced alignment, as in case of
pronunciation variants, it selects the most fitting one. This
requires a way to automatically obtain all
pronunciation variants of any word. We use a combination
of a rule-based system inspired by Psutka et al.[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ],
in combination with a dynamic dictionary. The
dynamic dictionary is a list of alternative pronunciations
of a word, which expands as the app is being used.
5https://transcribe.wreally.com/
6http://htk.eng.cam.ac.uk/
7https://github.com/mozilla/DeepSpeech
IPA
a
aː
aʊ̯
b
t͡s
t͡ʃ
d
ɟ
d͡z
d͡ʒ
ɛ
ɛː
eʊ̯
f
g
ɦ
i
iː
j
k
l
m
        </p>
        <p>PACal
a
aa
aw
b
c
ch
d
dj
dz
dzh
e
ee
ew
f
g
h
i
ii
j
k
l
m
common grapheme
a
á
au
b
c
č
d
ď
dz
dž
e
é
eu
f
g
h
i
í
j
k
l
mák</p>
        <p>IPA
ɱ
n
ŋ
ɲ
o
oː
oʊ̯
p
r
r̝̊
r̝
s
ʃ
t
c
ʊ
uː
v
x
z
ʒ</p>
        <p>PACal
mg
n
ng
nj
o
oo
ow
p
r
rsh
rzh
s
sh
t
tj
u
uu
v
x
z
zh
sil
sp
common grapheme
tramvaj
ne
tank
ň
o
ó
ou
p
r
tři
říz
s
š
t
ť
u
ú, ů
v
ch
z
ž</p>
        <p>The users are instructed to transcribe any words
with non-standard pronunciation phonetically and
then correct their orthographical form. This is one
of the few cases where we are coercing the users to
something.</p>
        <p>When the orthographically broken, phonetic
transcription of a word is submitted, if it passes the
forcedalignment phase, it is integrated into the displayed
transcription. The word’s data representation
consists of its
1. occurrence: the word as it appears in the text,
including capitalization and punctuation,
2. wordform: the word as it appears in the
language model and phonetic dictionary (computed
as the occurrence in lowercase and stripped of
non-alphabetic characters8),
3. pronunciation: an array of phonemes,
4. timestamp: distance of the beginning of the word
from the beginning of the file, in seconds, in
precision of 2 decimal digits,
5. manual/automatic: boolean lfag denoting
whether the word has been transcribed manually
or not,
8This implies that all non-alphabetic characters are always
a part of a token and never form a token on their own.
6. confidence measure: in case of automatically
acquired words, the confidence-measure score of the
recognizer.</p>
        <p>Once merged into the displayed transcription, each
word’s occurrence can be edited manually. Now the
user can enter the correct form deviating from Czech
pronunciation rules.</p>
        <p>Doing so results in adding the
wordformpronunciation couple to the dynamic pronunciation
dictionary and is also used for forced alignment.
Thus, this operation need only be performed once per
word and any subsequent time the word is entered in
its standard orthographic form, the correct
pronunciation is inferred.</p>
        <p>For example, let’s examine the scenario of
transcribing the sentence Proč se toto nestalo Marii
Markétě Alacoque? (Why hasn’t this happened to
Mary Margaret Alacoque?) Its phonetic
representation is p r o ch sp s e sp t o t o sp n e s t
a l o sp m a r i j i sp m a r k ee tj e sp a
l a k o k sil .</p>
        <p>1. Suppose the user enters the correct ortographic
transcription.
2. The phonetic transducer outputs p r o ch sp
s e sp t o t o sp n e s t a l o sp m a r
i j i sp m a r k ee tj e sp a l a c o k v
u e sil .
3. With a bit of luck, the forced alignment fails
because of the distiction of the phone sequence
k o k and c o k v u e.
4. The transcription is rejected, the user realizes
that the word is pronounced in a non-standard
way and re-tries with Proč se toto nestalo Marii
Markétě alakok?
5. Forced alignment succeeds now and the entered
transcription is merged into the view.
6. The user selects the non-existent word alakok?
and edits its occurrence to Alacoque?
7. Now the word is correctly stored and on any
subsequent user inputs of Alacoque with any
punctuation or capitalization, the pronunciation a l
a k o k is inferred by the forced alignment.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Phonetic Respelling</title>
        <p>With all advantages of using PACal as a
representation for phonemes, it is clearly not the most
natural way for lay Czechs to write down and read literal
pronunciation. Thanks to the simple, mostly
deterministic mapping between phonemes and graphemes,
pronunciation respelling is a reliable, natural way.
There’s not even a need for explicit syllable
separation as seen in English pronunciation respelling
(wikipedia9 gives the example “Diarrhoea” is
pronounced DYE-uh-REE-a). We postulate that the
phonetic respelling is natural to all alphabetized native
Czech speakers as a fact without any supporting
research, based on experience alone.</p>
        <p>The previous subsection gave an example of using
pronunciation respelling in Czech with the example
of alakok for Alacoque. The direction from the
phonetic respelling to the phoneme array is covered by
the ortographic-to-phonetic transducer. But we also
need the opposite direction to provide the users a way
to check whether the pronunciation selected by the
forced alignment fits.</p>
        <p>For this purpose, we have created a JavaScript
module for transduction between the array of phonemes
and the pronunciation respelling.10</p>
        <p>The algorithm is simple. In most cases, a phoneme
corresponds uniquely to one character in the
respelling. Exceptions are as follows:
1. The phoneme x is spelled ch.
2. The phonemes dz dzh are spelled dz dž.
3. The diphtongs aw ew ow are spelled au eu ou.
9https://en.wikipedia.org/wiki/Pronunciation_respelling
10https://github.com/Sixtease/MakonReact/
blob/master/src/lib/Phonet.js
4. Sequences c h, o u, a u, e u, d z, d zh are
spelled c’h, o’u, a’u, e’u, d’z, d’ž. Note though,
that the sequence c h is purely hypothetical, as
it contradicts voiced/voiceless assimilation.
5. Voiceless alveolar fricative trill is explicated as r’.
6. Palatal nasal and labiodental nasal are spelled n’,
m’.
7. Trailing silence is not represented.</p>
        <p>The module includes two-way transduction,
although only the one from array of phonemes to
human-readable phonetic respelling is needed in our
application. Still, the user can mark up special-case
pronunciation with the apostrophe, like the sequence
of phonemes o and u with the string o'u. The need
has never occurred during the six years’ lifespan of
the application.</p>
        <p>Note that when encoding into the phonetic
respelling, none of di ti ni dě tě ně is ever output. The
palatal consonants are always explicitly spelled out
and e.g. the sequence n i is always spelled ny</p>
        <p>A few examples of words, pronunciation and
phonetic respelling as output by the algorithm (given
the corresponding pronunciation is input as phoneme
list):
• nic /nj i c/: ňic,
• kdo /g d o/: gdo,
• disk /d i s k/: dysk,
• dřít /d rzh ii t/: dřít,
• třít /t rsh ii t/: tř’ít,
• auto /aw t o/: auto,
• nauka /n a u k a/: na’uka,
• džbán /dzh b aa n/: džbán,
• odžít /o d dz ii t/: od’žít,
• odznak /o dz n a k/: odznak,
• podzemí /p o d z e m ii/: pod’zemí,
• noc /n o c/: noc,
• tento /t e n t o/: tento,
• hangár /h a ng g aa r/: han’gár,
• samba /s a m b a/: samba,
• tonfa /t o mg f a/: tom’fa.</p>
        <p>The use of apostrophe for distinguishing
ambiguities and special cases is not 100% intuitive and
presents another point where instruction is necessary
for the user to use this feature properly.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>We have presented our web application as a tool that
enables gathering precisely aligned, phoneme-exact
transcription from untrained casual visitors. We have
presented measures for reaching this goal but the
degree to which it was reached remains unclear.</p>
      <p>We have no gold standard data to measure the
quality of our manual transcriptions. On the contrary, we
use the manual transcriptions as gold standard for the
automatic recognition. What we can to, however, is
look at some random samples and try to get a rough
idea of how the system performs.
4.1</p>
      <sec id="sec-4-1">
        <title>Validation by Forced Alignment</title>
        <p>One thing we can examine are the approvals /
rejections of the forced alignment. Of 109640 forced
alignment attempts, 3419 have failed, which makes
for 3.12% rejection rate. We have manually inspected
20 random failed attempts and came to the following
numbers:
• 11 cases were false negatives, where the
transcription was correct and should have been accepted,
• 4 cases were caused by acoustic irregularities like
noise,
• 4 cases were true negatives caused by wrongly
chosen segment boundaries and
• 1 case was true negative caused by wrong
transcription.</p>
        <p>Hence, in 25% of the minimalistic sample, the
forced alignment did its job of a validator and
prevented a piece of broken training data from entering
the dataset. In 55% it was a nuisance and failure, and
in the remaining 20%, it rejected a valid transcription
but prevented a bad training example from occurring,
so we can see this in positive light.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Non-Standard Pronunciation</title>
        <p>We can also track how the scenario described in
subsection 3.4 is applied. We have looked up four
promising example records in the dynamic dictionary and
checked submitted transcriptions containing them.
Table 2 lists for each of them the correct orthographic
form, the wrong pronunciation obtained by the
transducer, the correct pronunciation and finally the
phonetic respelling. Each is followed by the number of
occurrences in the manually transcribed data.</p>
        <p>We can see in Table 3 that the majority of cases
results in both orthographic and phonetic forms being
correct. Only in about 13% cases, the
orthographically incorrect form is kept. We attribute this to the
fact that those who use the phonetic respelling are
aware of the problematic and mostly go the whole
way and clean up.</p>
        <p>On the other hand, nearly a third of the cases show
the wrong phonetic representation. This is a serious
problem on at least two levels: Firstly, it shows that
the forced aligner failed to catch the error. Secondly,
it lets bad examples into the training dataset.</p>
        <p>One of the apparent reasons for this to happen
is that the dynamic dictionary only recognizes exact
matches. We can see in one file, for example, all
occurrences of the form Weinfurter to have correct
pronunciation while Weinfurterovi to have a broken one.</p>
        <p>Other factors likely include user carelessness or
ignorance, which is exactly what our application is
trying to compensate, but fails in these cases.</p>
        <p>The cases with false orthographic form don’t pose
much of a problem. It can harden searching for the
term in question but performing a search for the
phonetic respelling or even automatically searching the
pronunciation would easily mitigate this.</p>
        <p>The fourth combination of phonetic respelling and
false pronunciation is of course not occuring.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have presented an application that has been
providing access to the extensive corpus of Karel
Makoň and to acquire an almost complete
transcription thereof. Nearly 70 hours corresponding to over
600,000 word forms have been transcribed manually
with minimal financial 11 as well as development12
costs. Only some of the volunteers have indulged
instruction time in order of minutes. The rest of the
corpus has been transcribed using an ASR system
trained on these ever-growing data.</p>
      <p>We have presented the ways we use to aid the
untrained users to provide a high-quality orthographic
and phonetic time-aligned transcription. We have
attempted a rough evaluation of the success rate of the
measures presented. Though clearly far from perfect,
they do serve the purpose and set a baseline for
improvements or novel approaches.</p>
      <p>The system has been built with the motivation of
spreading the message contained in Karel Makoň’s
talks. However, to make the technology more
useful, we are actively looking for similar settings where
it could be deployed.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The research was supported by SVV project number
260 453.</p>
      <p>11In early stages, we kept a paid annotator to test the
application.</p>
      <p>12The system has been written by a single developer.</p>
      <sec id="sec-6-1">
        <title>Correct spelling Moody Descartes Weinfurter</title>
        <p>Michelangelo
orthographically correct
orthographically incorrect
This work has been using language resources
stored and distributed by the LINDAT/CLARIN
project of the Ministry of Education, Youth and
Sports of the Czech Republic (project LM2015071).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jurik</given-names>
            <surname>Hájek</surname>
          </string-name>
          .
          <source>Český mystik Karel Makoň. Dingir</source>
          ,
          <year>2007</year>
          /4:
          <fpage>142</fpage>
          -
          <lpage>143</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Stolcke</surname>
          </string-name>
          .
          <article-title>Srilm-an extensible language modeling toolkit</article-title>
          .
          <source>In Seventh international conference on spoken language processing</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Hajič</surname>
          </string-name>
          .
          <article-title>Complex corpus annotation: The prague dependency treebank</article-title>
          .
          <source>Insight into Slovak and Czech Corpus Linguistics. Veda Bratislava</source>
          ,
          <year>2005</year>
          :
          <fpage>54</fpage>
          -
          <lpage>73</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Chklovski</surname>
          </string-name>
          .
          <article-title>Building sense tagged corpora with volunteer contributions over the web</article-title>
          .
          <source>Recent Advances in Natural Language Processing III: Selected Papers from RANLP</source>
          <year>2003</year>
          ,
          <volume>260</volume>
          :
          <fpage>357</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Reese</surname>
          </string-name>
          , Gemma Boleda, Montse Cuadros, and
          <string-name>
            <given-names>German</given-names>
            <surname>Rigau</surname>
          </string-name>
          .
          <article-title>Wikicorpus: A word-sense disambiguated multilingual wikipedia corpus</article-title>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Nancy</given-names>
            <surname>Ide</surname>
          </string-name>
          , Christiane Fellbaum,
          <string-name>
            <given-names>Collin</given-names>
            <surname>Baker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Rebecca</given-names>
            <surname>Passonneau</surname>
          </string-name>
          .
          <article-title>The manually annotated subcorpus: A community resource for and by the people</article-title>
          .
          <source>In Proceedings of the ACL 2010 conference short papers</source>
          , pages
          <fpage>68</fpage>
          -
          <lpage>73</lpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Nouza</surname>
          </string-name>
          , Josef Psutka, and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Uhlír</surname>
          </string-name>
          .
          <article-title>Phonetic alphabet for speech recognition of czech</article-title>
          .
          <source>Radioengineering</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ):
          <fpage>16</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Josef</given-names>
            <surname>Psutka</surname>
          </string-name>
          , Jan Hajic, and
          <string-name>
            <given-names>William</given-names>
            <surname>Byrne</surname>
          </string-name>
          .
          <article-title>The development of asr for slavic languages in the malach project</article-title>
          .
          <source>In Acoustics, Speech, and Signal Processing</source>
          ,
          <year>2004</year>
          . Proceedings.
          <source>(ICASSP'04)</source>
          . IEEE International Conference on, volume
          <volume>3</volume>
          , pages
          <fpage>iii</fpage>
          -
          <lpage>749</lpage>
          . IEEE,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>