=Paper=
{{Paper
|id=None
|storemode=property
|title=A New Method to Combine Probability Estimates from Pairwise Binary Classifiers
|pdfUrl=https://ceur-ws.org/Vol-1422/194.pdf
|volume=Vol-1422
|dblpUrl=https://dblp.org/rec/conf/itat/SuchBT15
}}
==A New Method to Combine Probability Estimates from Pairwise Binary Classifiers==
J. Yaghob (Ed.): ITAT 2015 pp. 194–199
Charles University in Prague, Prague, 2015
A New Method to Combine Probability Estimates from Pairwise Binary
Classifiers
Ondrej Šuch1 , Štefan Beňuš2 , and Andrea Tinajová3
1University of Žilina and Slovak Academy of Sciences, Slovakia ondrejs@savbb.sk,
2 Constantine the Philosopher University and Slovak Academy of Sciences, Slovakia sbenus@ukf.sk
3 Slovak Academy of Sciences andrea.tinajova@gmail.com
Abstract: Estimating class membership probabilities is an Inspired by Bradley-Terry model, Hastie and Tibshirani
important step in many automated speech recognition sys- suggested [1] to require:
tems. Since binary classifiers are usually easier to train,
pi
one common approach to this problem is to construct pair- = ri j (3)
wise binary classifiers. Pairwise models yield an over- pi + p j
determined system of equations for the class membership ∑ pi = 1 (4)
probabilities. Motivated by probabilistic arguments we i
propose a new way for estimating individual class mem-
bership probabilities, which reduces to solving a linear Note that there are 1 + 2k equations for k unknowns, so
system of equations. A solution of this system is obtained the system of equations is over-determined for k ≥ 3 and
by finding the unique non-zero eigenvector of total prob- it may be not possible to solve them.
ability one, corresponding to eigenvalue one of a positive In the next section we review several approaches which
Markov matrix. This is a property shared by another al- have been suggested to find approximate solution of (3). In
gorithm previously proposed by Wu, Lin, and Weng. We Section 3 we will propose a new method to combine pair-
compare properties of these methods in two settings: a the- wise estimates. In Section 4 we will examine its perfor-
oretical three-way classification problem, and via classifi- mance with synthetic as well as real world acoustic data.
cation of English monophthongs from TIMIT corpus. In- In Conclusion we discuss findings of our experiments.
dex Terms: binary classifiers; multiclass classification;
phoneme recognition; English vowels; TIMIT 2 Existing Approaches
1 Introduction One natural requirement for an algorithm which deter-
mines probabilities pi is that if the system (3) has a so-
Probabilistic approach underlies most current automatic lution then the algorithm will find them exactly.
speech recognition (ASR) systems, and very likely also Several approaches satisfying this requirement are out-
human speech perception. In many ASR systems a com- lined in the work of Wu, Ling and Wen [2]. They consider
mon task is to provide estimates of probabilities of a given the following functionals:
sample belonging to multiple classes given the observed
values of its features. These classes may represent various k k
1 1
phonemes, diphones or other kinds of linguistic categories. δHT : min ∑ [ ∑ (ri j − pi )]2 , (5)
p
i=1 j: j6=i k 2
In machine learning it is easier to find the boundary
k k
between two classes rather than the boundary separating
δ1 : min ∑ [ ∑ (ri j p j − r ji pi )]2 , (6)
a class from many other classes [1]. Moreover, many dis- p
i=1 j: j6=i
criminative models are naturally suited to pairwise clas-
k k
sification, such as logistic regression, LDA or variants δ2 : min ∑ ∑ (ri j p j − r ji pi )2 , (7)
of SVM. Thus given k classes Ci , one can readily con- p
i=1 j: j6=i
struct 2k pairwise discriminative models. Let us denote k k
by Mi j the model discriminating classes Ci and C j . Sup- δV : min ∑ ∑ (I{ri j >r ji } p j − I{r ji >ri j } pi )2 , (8)
p
pose that Mi j is able not only to discriminate, but also to i=1 j: j6=i
compute the pairwise class membership probability ri j of (9)
an object X with features f:
where I is the indicator function. Each of the four func-
ri j = ri j (X) = p(X ∈ Ci | f, X ∈ Ci or X ∈ C j ). (1)
tionals is nonnegative. When the system (3) does have
Given the knowledge of ri j (X) the question is then to esti- a solution, each functional is zero at, and only at the so-
mate multi-class probabilities pi where lution. One less satisfying feature of these approaches is
that they lack probabilistic motivation, unlike the method
pi = pi (X) = p(X ∈ Ci | f). (2) we propose in the next section.
A New Method to Combine Probability Estimates from Pairwise Binary Classifiers 195
3 New Method (m)
first checks using (10) and (11) that pm = pm and p j =
(m)
p j . It follows that the vector p j satisfies equations (13)
We will now describe our new algorithm. In general, one and (14). Since the solution of (13) and (14) is unique, the
has 0 ≤ ri j ≤ 1. To avoid complications arising from de- method will yield the correct solution. However, this is an
generate cases we assume sharp inequalities 0 < ri j < 1, ideal, very special situation that will generally not hold for
which poses no difficulty in practical applications. k ≥ 3.
Consider for a moment that an object X belongs to We have opted to do comparison testing of the pro-
the class Cm . Then for judging its similarity to other posed method with the method of Wu, Ling and Wen [2]
classes one may restrict attention to the values rm j (and that minimizes functional δ1 (6). The reason is that that
r jm = 1 − rm j ), since only classifiers Mm j were trained on method also involves the construction of a positive Markov
values from the category Cm . But for those k − 1 values matrix whose solution is their estimate of pm . We conduct
equations (3) can be solved exactly, as we will now show. two experiments: one is an artificial three-way classifica-
We have tion problem, and the other a vowel recognition task.
1 pm + p j 1 − pm
∑ rm j = ∑ pm
= (k − 1) +
pm
. (10)
4.1 Three-Way Classification
j6=m j6=m
(m)
The system of equations (3) becomes over-determined for
This relation allows us to compute an estimate pm of pm k = 3. If one of the classifiers is unreliable then the sys-
explicitly as tem (3) will not have a solution. In this section we present
!−1 the results of a synthetic experiment for three-way classi-
1 fication.
∑
(m)
pm = − (k − 2) , (11)
j6=m rm j
In our experiment we assume that only classifier M23 is
unreliable. In other words we assume that classifiers M12
where the upper index indicates that the estimate of pm and M13 discriminating respectively categories C1 versus
is computed by taking into account only values rm j . The C2 and C1 versus C3 yield precise estimates of r12 and r13 .
remaining probabilities can be then computed by the fol- For a fixed value p1 , p2 we thus set r12 = p1 /(p1 + p2 )
lowing formula: and r13 = p1 /(p1 + p3 ) = p1 /(1 − p2 ). Let p̂m and pWum
1 denote our and Wu’s estimates of pm . As r23 varies in
pj
(m)
= pm ·
(m)
−1 . (12) interval (0, 1), define the absolute errors
rm j
∆ = sup | p̂i − pi |, (15)
Now we repeat this argument for m = 1, 2, . . . , k. In gen- i,r23
eral the estimates of pi thus obtained will be conflicting ∆Wu = sup |pWu
(m) (n) i − pi |, (16)
i.e. in general p j 6= p j , because given values ri j may i,r23
not allow for solving (3) consistently. We will now take in-
spiration from the probability law p(A) = ∑i p(A|Bi )p(Bi ), and the relative error
if Bi is a partition of the probability space. We will require
that the estimate p̂i of pi should satisfy the following linear ∆rel Wu
Wu = sup |pi − p̂i |. (17)
i,r23
system of equations:
The results of our experiment are shown in Table 1.
p̂ j = ∑ p j p̂m ,
(m)
for j = 1, . . . , k. (13) From the table it is clear that sometimes our method gives
m more precise estimates, but for other values of p1 , p2 , Wu’s
These requirements can be interpreted as imposing self- method will yield more precise results. However, in all
consistency on the estimates p̂i . One readily checks that cases, the relative error between our results and Wu’s re-
the matrix of the linear system (13) is Markov and posi- sults is smaller than the absolute errors, often by an order
tive, thus (13) has a one-dimensional space of solutions. of one magnitude.
Imposing an additional condition
4.2 Vowel Recognition
∑ p̂m = 1 (14)
m Unlike consonants, vowels may be perceived non-
determines a unique estimate p̂m of pm . categorically by listeners [3], making it a good testing
ground for multi-class probabilistic estimates. We opted
for English language, because it has a large variety of vow-
4 Evaluation of the New Method els and because there are large corpora of annotated speech
available. We worked with TIMIT, a phonetically seg-
First note that our algorithm will yield the correct solution mented corpus of American English [4]. Our categories
if the system (3) has a solution. In order to see that, one consisted of 15 monophthongs as shown in Table 2. For
196 O. Šuch, Š. Beňuš, A. Tinajová
p1 p2 ∆ ∆Wu ∆rel
Wu vowel success Wu’s success agreement
0.05 0.05 0.66 0.7 0.09 rate rate
0.1 0.1 0.57 0.61 0.09 iy 48 % 48 % 96.6%
0.85 0.1 0.07 0.05 0.05 ih 21 % 21 % 94.8 %
0.85 0.05 0.07 0.05 0.05 eh 22 % 23 % 95.4 %
0.05 0.85 0.66 0.70 0.1 ae 60 % 60 % 94.4 %
0.1 0.85 0.58 0.61 0.06 aa 48 % 48 % 96.2 %
0.33 0.33 0.21 0.22 0.05 ah 20 % 21 % 94.6 %
ao 60 % 61 % 97.2 %
Table 1: Errors of estimation for various values of p1 uh 18 % 18 % 95 %
and p2 uw 40 % 39 % 96.4 %
ux 40 % 40 % 97.4 %
vowel sample sample er 34 % 35 % 95.6 %
word word’s ax 31 % 31 % 96.4 %
transcription ix 16 % 18 % 94.4 %
iy beet bcl b IY tcl t axr 48% 46 % 96.2 %
ih bit bcl b IH tcl t ax-h 81 % 81 % 98.8 %
eh bet bcl b EH tcl t
ae bat bcl b AE tcl t Table 3: Evaluation of our and Wu’s [2] methods on indi-
aa bott bcl b AA tcl t vidual monophthongs from the test data from TIMIT cor-
ah but bcl b AH tcl t pus. The first column indicates agreement between classi-
ao bought bcl b AO tcl t fication by our method and TIMIT annotation, the second
uh book bcl b UH kcl k column the statistics for method of Wu et al, and the third
uw boot bcl b UW tcl t column indicates how often our method and Wu’s method
ux toot tcl t UX tcl t agreed on the most-likely classified class.
er bird bcl b ER dcl d
ax about AX bcl b aw tcl t
ix debit dcl d eh bcl b IX tcl t We decided to do a more detailed case study. From the test
axr butter bcl b ah dx AXR subset we have chosen sentence SA1 spoken by speaker
ax-h suspect s AX-H s pcl p eh kcl k tcl t MREB0 and examined each monophthong at two points
in time. The first was 5 milliseconds after the onset, and
Table 2: Sample words containing 15 different the other one approximately near the vowel’s center. The
monophong sounds of American English as segmented in results are shown in Table 4.
TIMIT corpus Likelihoods of most likely estimates of our and Wu’s
method are again quite close. There are two differences
each of the categories we randomly chose their realiza- between onset and center predictions. The first one is mis-
tions from the set of male speakers in the corpus. Each prediction of /er/ at the beginning of the word ‘greasy’,
realization was analyzed with a window 512 samples wide which is quite understandable, since the vowel is preceded
(at 16kHz sampling rate its length was 32ms). If the cen- by /r/. To gain an insight into the other mispredictions as
ter of the window was less than 256 samples away from well as deeper insight into dynamical behavior of the re-
the next phoneme, it was proportionally less likely to be sulting multiclass classifier we present time plots in Fig. 1.
selected into our dataset. We have trained pairwise classi- In Fig. 1a the mis-classification of /iy/ instead of TIMIT’s
fiers using linear discriminant analysis (LDA). The feature /ix/ in the word ’in’ is shown. We speculate that the prob-
set was log-periodogram, where the analysis window was lem might be attributed to greater weight put on F2, that
weighted with Hanning window before computing FFT. is relatively high and within the region for /iy/, compared
We have performed comparison testing of our and Wu’s to F1 that is quite high and definitely within the region for
method by selecting 500 random samples from the test /ix/. In other words, the vowel might be a bit fronter than
subset. Per phone results are shown in Table 3. The canonical /ix/. In Fig. 1b, the first vowel of ’greasy’ is
key statistics is that overall there was 96% agreement be- mis-classified as /ux/ instead of TIMIT’s /iy/.
tween most-likely classifications by our method and Wu’s This problem might be attributed to coarticulation from
method. the flanking consonants. The first vowel does have lower
The overall success rate was slightly below 40% for both F2, which is plausibly responsible for /ux/ prediction, but
our and Wu’s method. Due to the limitations of the fea- it is preceded by /r/, which is commonly associated with
tures (no F0, no vowel duration, no dynamic information, lip protrusion, which lowers F2. In Fig. 1c in the vowel of
no multiframe data), suboptimal performance may be ex- word ’wash’, we see that it is only in the beginning in the
pected. For instance without intensity baseline, it is nearly word ’wash’ that the classifier gives more weight to /ao/,
impossible to correctly distinguish some accented vowels. and then it increasingly agrees that the vowel is /aa/.
A New Method to Combine Probability Estimates from Pairwise Binary Classifiers 197
offset TIMIT Wu’s method our method offset TIMIT Wu’s method our method
label label
3831 iy iy 80.1 % iy 79.9 % 4200 iy iy 83.2 % iy 82.9 %
6053 ae ae 79.7 % ae 79.6 % 6800 ae ae 83.7 % ae 83.6 %
9187 axr axr 62.2 % axr 61.6 % 9600 axr axr 50.6 % axr 50.7 %
11780 aa aa 32.9 % aa 32.6 % 12500 aa aa 80.7 % aa 79.5 %
19677 ux ux 60.3 % ux 58.2 % 21000 ux ux 67.2 % ux 66.4 %
25544 ix iy 66.4 % iy 64.9 % 25800 ix iy 55.5 % iy 53.3 %
28905 iy er 41.8 % er 40.3 % 29000 iy ux 23.5 % ux 22.8 %
31328 iy iy 53.4 % iy 53.3 % 31800 iy iy 72.8 % iy 72.7 %
34210 aa ao 76.3 % ao 75.8 % 35000 aa aa 57.6 % aa 57.7 %
39080 ao aa 77.1 % aa 76.9 % 39600 ao aa 78.8 % aa 78.5 %
40680 er axr 56.8 % axr 56.3 % 41500 er axr 66.4 % axr 66.4 %
42512 ao ao 87.2 % ao 87.1 % 43500 ao ao 86.3 % ao 86.3 %
46827 ih iy 58.3 % iy 57.9 % 47500 ih ux 37.9 % ux 37 %
48248 axr axr 52.1 % axr 52.4 % 49000 axr axr 71.1 % axr 71.1 %
(a) 5ms after vowel’s start (b) near the center of the vowel
Table 4: Results of monophthong classification using spectral information in 32ms window centered at the offset indicated
in the first column. Vowels were extracted from sentence SA1 spoken by speaker MREB0 from region 1 (New England).
Most likely classes are shown computed by Wu’s method and our method together with multi-class likelihoods.
In this particular case, we conclude that our classifica- 5 Conclusions
tion is closer to the phonetic realization than TIMIT’s. The
beginning of the vowel is influenced by the preceding /w/ We have described a new method for combining probabil-
with lip rounding similar to /ao/. The rest of the vowel ity estimates from pairwise classifiers. It is quite general
sounds like an /aa/ to phonetically trained listeners, and and for its application needs only pairwise classifiers that
the formant values correspond to this perception. Finally, provide posterior likelihoods. We believe that since the
Fig. 1d shows the preference for /aa/ as the first vowel of rationale for our method is probabilistically motivated, it
’water’ in our model over /ao/ in TIMIT’s. Similarly to has the potential to edge out other methods in practice.
Fig. 1c, this vowel sounds more, and its formant values In particular by its construction it avoids the problem of
correspond to our model more, than to TIMIT’s. It should ‘pairwise coupling’ approaches pointed out by G. Hin-
be noted, however, that /ao/ and /aa/ have merged in sev- ton [1, pg. 467]. Another important feature is that the re-
eral American dialects and more tokens would be needed sulting probabilities are computed as the dominant eigen-
for a more thorough analysis. vector of a Markov matrix, allowing for efficient compu-
tation via iterations when the matrix of binary likelihoods
A common way to improve the performance in auto- varies slowly in time. Finally, since the method is not hi-
matic speech recognition is to tune the parameters of the erarchical, it avoids compounding of errors common in hi-
system for a particular speaker. To that end we carried one erarchical approaches.
more experiment. We extracted formants for TIMIT vow- In presented synthetic and phonetic experiments its per-
els spoken by speaker MREB0 using package phonTools formance was very close to a method previously suggested
in R [5]. Next we performed pairwise LDA training as by Wu [2]. The classification of English vowels was sub-
previously but this time used values F1 and F2 for features optimal, but that may not be indicative of performance in
rather than the log-periodogram. These first two formants real world scenarios for several reasons.
are key perceptual features of vowels [6, 7, 8, 9]. Finally,
we performed multiclass classification on the first vowel • We have used all TIMIT vowel categories, some
in the word ‘water’. The formants contours for this vowel of which are in previously published performance
are shown in Fig. 2. benchmark tests fused because they are extremely
hard to discriminate.
The somewhat suprising results are shown in Fig. 3.
One would expect that it would have little problem with • Other pairwise classifiers, for instance logistic regres-
classification of the vowel. As seen in Fig. 3, except for a sion or SVM may yield better results.
brief start, the classifier overwhelmingly believes that the
phoneme is much closer to /aa/ than TIMIT annotated /ao/. • Based on the last experiment presented, we question
However, compared to Fig. 1d the likelihood of /aa/ is whether TIMIT annotation is consistent throughout
markedly smaller near the vowel’s boundaries. the corpus even for individual speakers.
198 O. Šuch, Š. Beňuš, A. Tinajová
1.802s 1.855s
1.0
1.591s 1.634s
1.0
0.8
0.8
ux
probability
iy
0.6
probability
0.6
0.4
0.4
0.2
ix
0.2
iy
0.0
0.0
28800 29000 29200 29400 29600
25500 25600 25700 25800 25900 26000 26100
1.0
1.0
0.8
0.8
pairwise likelihood
pairwise likelihood
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
28800 29000 29200 29400 29600
25500 25600 25700 25800 25900 26000 26100
(b) TIMIT annotation is /iy/ for the first vowel in the word
(a) TIMIT annotation is /ix/ in the word ‘in’. We considered an
‘greasy’. We considered an alternative classification that the
alternative classification that the vowel is /iy/.
vowel is /ux/.
2.438s 2.513s
1.0
2.133s 2.252s
1.0
aa
0.8
aa
0.8
probability
0.6
probability
0.6
0.4
0.4
ao
0.2
0.2
0.0
ao
0.0
39000 39200 39400 39600 39800 40000 40200
34500 35000 35500 36000
1.0
1.0
0.8
0.8
pairwise likelihood
pairwise likelihood
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
39000 39200 39400 39600 39800 40000 40200
34500 35000 35500 36000
(d) TIMIT annotation is /ao/ for the first vowel in the word ‘wa-
(c) TIMIT annotation is /aa/ in the word ‘wash’. We considered
ter’. We considered an alternative classification that the vowel is
an alternative classification that the vowel is /ao/.
/aa/.
Figure 1: Time series plots of multiclass and pairwise classification likelihoods for four vowels in sentence SA1 spoken
by MREB0. The top plot in each subfigure shows multiclass likelihoods, and the bottom plot shows binary classifica-
tion likelihoods ri j . In multiclass plots, dashed dark curve indicates the likelihood of the alternative hypothesis and dark
dash-dotted curve that of TIMIT annotation computed by our method (i.e. p̂i ). Solid curves in multiclass plots indicate
corresponding but visually nearly indistinguishable estimates obtained via Wu’s method. In binary plots we plot likeli-
hoods of the alternative hypothesis against all other classes. The dotted curve in each binary plot indicates likelihood of
the alternative hypothesis compared to the TIMIT annotation.
A New Method to Combine Probability Estimates from Pairwise Binary Classifiers 199
[4] Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D.,
Dahlgren, N., Zue, V.: TIMIT acoustic-phonetic continuous
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
speech corpus, [Online], 1993.
●
●●
●
●
●●
●
[5] Barreda, S.: phonTools: functions for phonetics in R, R
●●
●
●
●
1800
●
●
●●
●
package version 0.2-2.0, 2014
frequency (Hz)
[6] Potter, R., Steinberg, J.: Toward the specification of speech.
1400
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
● J. Acoust. Soc. Amer. 22 (6) (1950), 807–820
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
1000
●
●
●
[7] Peterson, G., Barney, H.: Control methods used in a study of
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
vowels. J. Acoust. Soc. Amer. 24 (2) (1952), 175–184
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
● ●●
●
●●
●
●
●●
●
●●
●
●
●
[8] Turner, R., Patterson, R.: An analysis of the size informa-
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●● ●
●
●●
●
●
●●
600
●
●
●
●●
●
●●
●
●
●●
●
●●
● ●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
39000 39200 39400 39600 39800 40000 40200 tion in classical formant data: Peterson and Barney (1952)
revisited. J. Acoust. Soc. Jpn. 33 (2003)
[9] Kiefte, M., Nearey, T., Assmann, P.: Vowel perception in
Figure 2: Formant contours F1-F3 for the first vowel of normal speakers. In: Handbook of vowels and vowel disor-
ders, M. Ball and F. Gibbon, (Eds.) Psychology Press, 2012,
word ‘water’ in sentence SA1 spoken by MREB0.
ch. 6, 160–185
2.438s 2.513s
aa
0.8
probability
0.4
ao
0.0
39000 39200 39400 39600 39800 40000 40200
Figure 3: Time series plots of multiclass likelihoods for
the first vowel in the word ‘water’ spoken in sentence SA1
by speaker MREB0. Dark dashed curve indicates likeli-
hood of /aa/, whereas dot-dashed curve indicates likeli-
hood of /ao/. Solid curves, as in Fig. 1, indicate estimate
by Wu’s method.
Further experiments with a complete ASR system may
shed more light on the applicability of the proposed al-
gorithm.
Acknowledgements
Our research was supported by the project University Sci-
ence Park ITMS 26220220184 and grants APVV-0219-12,
APVV-14-0560 and VEGA 2/0197/15. The authors are
thankful to Paul Foulkes, K. Bachratá, and Martin Klimo
for helpful discussion.
References
[1] Hastie, T. H., Tibshirani, R.: Classification by pairwise cou-
pling. Annals of Statistics 26 (2) (1998), 451–471
[2] Wu, T. -F., Lin, C. -J., Weng, R.: Probability estimates for
multi-class classification by pairwise coupling. Journal of
Machine Learning Research 5 (2004), 975–1005
[3] Fry, D., Abramson, A., Eimas, P., Liberman, A.: The iden-
tification and discrimination of synthetic vowels. Language
and Speech 5 (1962), 171–189