<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A New Method to Combine Probability Estimates from Pairwise Binary Classifiers</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ondrej</forename><surname>Šuch</surname></persName>
							<email>ondrejs@savbb.sk</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">University of Žilina</orgName>
								<orgName type="institution" key="instit2">Slovak Academy of Sciences</orgName>
								<address>
									<country key="SK">Slovakia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Štefan</forename><surname>Beňuš</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution" key="instit1">Constantine the Philosopher University</orgName>
								<orgName type="institution" key="instit2">Slovak Academy of Sciences</orgName>
								<address>
									<country key="SK">Slovakia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Tinajová</surname></persName>
							<email>andrea.tinajova@gmail.com</email>
							<affiliation key="aff2">
								<orgName type="institution">Slovak Academy of Sciences</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">A New Method to Combine Probability Estimates from Pairwise Binary Classifiers</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2AB17352ED8B7E2D65C1B6681F24EA22</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T16:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>binary classifiers</term>
					<term>multiclass classification</term>
					<term>phoneme recognition</term>
					<term>English vowels</term>
					<term>TIMIT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Estimating class membership probabilities is an important step in many automated speech recognition systems. Since binary classifiers are usually easier to train, one common approach to this problem is to construct pairwise binary classifiers. Pairwise models yield an overdetermined system of equations for the class membership probabilities. Motivated by probabilistic arguments we propose a new way for estimating individual class membership probabilities, which reduces to solving a linear system of equations. A solution of this system is obtained by finding the unique non-zero eigenvector of total probability one, corresponding to eigenvalue one of a positive Markov matrix. This is a property shared by another algorithm previously proposed by Wu, Lin, and Weng. We compare properties of these methods in two settings: a theoretical three-way classification problem, and via classification of English monophthongs from TIMIT corpus.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Probabilistic approach underlies most current automatic speech recognition (ASR) systems, and very likely also human speech perception. In many ASR systems a common task is to provide estimates of probabilities of a given sample belonging to multiple classes given the observed values of its features. These classes may represent various phonemes, diphones or other kinds of linguistic categories.</p><p>In machine learning it is easier to find the boundary between two classes rather than the boundary separating a class from many other classes <ref type="bibr" target="#b0">[1]</ref>. Moreover, many discriminative models are naturally suited to pairwise classification, such as logistic regression, LDA or variants of SVM. Thus given k classes C i , one can readily construct k 2 pairwise discriminative models. Let us denote by M i j the model discriminating classes C i and C j . Suppose that M i j is able not only to discriminate, but also to compute the pairwise class membership probability r i j of an object X with features f:</p><formula xml:id="formula_0">r i j = r i j (X) = p(X ∈ C i | f, X ∈ C i or X ∈ C j ). (1)</formula><p>Given the knowledge of r i j (X) the question is then to estimate multi-class probabilities p i where</p><formula xml:id="formula_1">p i = p i (X) = p(X ∈ C i | f).<label>(2)</label></formula><p>Inspired by Bradley-Terry model, Hastie and Tibshirani suggested <ref type="bibr" target="#b0">[1]</ref> to require:</p><formula xml:id="formula_2">p i p i + p j = r i j<label>(3)</label></formula><formula xml:id="formula_3">∑ i p i = 1<label>(4)</label></formula><p>Note that there are 1 + k 2 equations for k unknowns, so the system of equations is over-determined for k ≥ 3 and it may be not possible to solve them.</p><p>In the next section we review several approaches which have been suggested to find approximate solution of <ref type="bibr" target="#b2">(3)</ref>. In Section 3 we will propose a new method to combine pairwise estimates. In Section 4 we will examine its performance with synthetic as well as real world acoustic data. In Conclusion we discuss findings of our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Existing Approaches</head><p>One natural requirement for an algorithm which determines probabilities p i is that if the system (3) has a solution then the algorithm will find them exactly.</p><p>Several approaches satisfying this requirement are outlined in the work of Wu, Ling and Wen <ref type="bibr" target="#b1">[2]</ref>. They consider the following functionals:</p><formula xml:id="formula_4">δ HT : min p k ∑ i=1 [ k ∑ j: j =i (r i j 1 k − 1 2 p i )] 2 ,<label>(5)</label></formula><formula xml:id="formula_5">δ 1 : min p k ∑ i=1 [ k ∑ j: j =i (r i j p j − r ji p i )] 2 ,<label>(6)</label></formula><formula xml:id="formula_6">δ 2 : min p k ∑ i=1 k ∑ j: j =i (r i j p j − r ji p i ) 2 ,<label>(7)</label></formula><formula xml:id="formula_7">δ V : min p k ∑ i=1 k ∑ j: j =i (I {r i j &gt;r ji } p j − I {r ji &gt;r i j } p i ) 2 ,<label>(8) (9)</label></formula><p>where I is the indicator function. Each of the four functionals is nonnegative. When the system (3) does have a solution, each functional is zero at, and only at the solution. One less satisfying feature of these approaches is that they lack probabilistic motivation, unlike the method we propose in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">New Method</head><p>We will now describe our new algorithm. In general, one has 0 ≤ r i j ≤ 1. To avoid complications arising from degenerate cases we assume sharp inequalities 0 &lt; r i j &lt; 1, which poses no difficulty in practical applications. Consider for a moment that an object X belongs to the class C m . Then for judging its similarity to other classes one may restrict attention to the values r m j (and r jm = 1 − r m j ), since only classifiers M m j were trained on values from the category C m . But for those k − 1 values equations (3) can be solved exactly, as we will now show.</p><p>We have</p><formula xml:id="formula_8">∑ j =m 1 r m j = ∑ j =m p m + p j p m = (k − 1) + 1 − p m p m . (<label>10</label></formula><formula xml:id="formula_9">)</formula><p>This relation allows us to compute an estimate p (m)</p><formula xml:id="formula_10">m of p m explicitly as p (m) m = ∑ j =m 1 r m j − (k − 2) −1 ,<label>(11)</label></formula><p>where the upper index indicates that the estimate of p m is computed by taking into account only values r m j . The remaining probabilities can be then computed by the following formula:</p><formula xml:id="formula_11">p (m) j = p (m) m • 1 r m j − 1 .<label>(12)</label></formula><p>Now we repeat this argument for m = 1, 2, . . . , k. In general the estimates of p i thus obtained will be conflicting i.e. in general p (m)</p><formula xml:id="formula_12">j = p (n)</formula><p>j , because given values r i j may not allow for solving (3) consistently. We will now take inspiration from the probability law p(A) = ∑ i p(A|B i )p(B i ), if B i is a partition of the probability space. We will require that the estimate pi of p i should satisfy the following linear system of equations:</p><formula xml:id="formula_13">p j = ∑ m p (m) j pm , for j = 1, . . . , k.<label>(13)</label></formula><p>These requirements can be interpreted as imposing selfconsistency on the estimates pi . One readily checks that the matrix of the linear system (13) is Markov and positive, thus (13) has a one-dimensional space of solutions. Imposing an additional condition</p><formula xml:id="formula_14">∑ m pm = 1 (14)</formula><p>determines a unique estimate pm of p m .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluation of the New Method</head><p>First note that our algorithm will yield the correct solution if the system (3) has a solution. In order to see that, one first checks using (10) and (11) that p (m) m = p m and p (m) j = p j . It follows that the vector p j satisfies equations ( <ref type="formula" target="#formula_13">13</ref>) and ( <ref type="formula">14</ref>). Since the solution of (13) and ( <ref type="formula">14</ref>) is unique, the method will yield the correct solution. However, this is an ideal, very special situation that will generally not hold for k ≥ 3.</p><p>We have opted to do comparison testing of the proposed method with the method of Wu, Ling and Wen <ref type="bibr" target="#b1">[2]</ref> that minimizes functional δ 1 <ref type="bibr" target="#b5">(6)</ref>. The reason is that that method also involves the construction of a positive Markov matrix whose solution is their estimate of p m . We conduct two experiments: one is an artificial three-way classification problem, and the other a vowel recognition task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Three-Way Classification</head><p>The system of equations (3) becomes over-determined for k = 3. If one of the classifiers is unreliable then the system (3) will not have a solution. In this section we present the results of a synthetic experiment for three-way classification.</p><p>In our experiment we assume that only classifier M 23 is unreliable. In other words we assume that classifiers M 12 and M 13 discriminating respectively categories C 1 versus C 2 and C 1 versus C 3 yield precise estimates of r 12 and r 13 . For a fixed value p 1 , p 2 we thus set r 12 = p 1 /(p 1 + p 2 ) and r 13 = p 1 /(p 1 + p 3 ) = p 1 /(1 − p 2 ). Let pm and p Wu m denote our and Wu's estimates of p m . As r 23 varies in interval (0, 1), define the absolute errors</p><formula xml:id="formula_15">∆ = sup i,r 23 | pi − p i |,<label>(15)</label></formula><formula xml:id="formula_16">∆ Wu = sup i,r 23 |p Wu i − p i |,<label>(16)</label></formula><p>and the relative error</p><formula xml:id="formula_17">∆ rel Wu = sup i,r 23 |p Wu i − pi |.<label>(17)</label></formula><p>The results of our experiment are shown in Table <ref type="table" target="#tab_0">1</ref>. From the table it is clear that sometimes our method gives more precise estimates, but for other values of p 1 , p 2 , Wu's method will yield more precise results. However, in all cases, the relative error between our results and Wu's results is smaller than the absolute errors, often by an order of one magnitude.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Vowel Recognition</head><p>Unlike consonants, vowels may be perceived noncategorically by listeners <ref type="bibr" target="#b2">[3]</ref>, making it a good testing ground for multi-class probabilistic estimates. We opted for English language, because it has a large variety of vowels and because there are large corpora of annotated speech available. We worked with TIMIT, a phonetically segmented corpus of American English <ref type="bibr" target="#b3">[4]</ref>. Our categories consisted of 15 monophthongs as shown in Table <ref type="table" target="#tab_1">2</ref> Sample words containing 15 different monophong sounds of American English as segmented in TIMIT corpus each of the categories we randomly chose their realizations from the set of male speakers in the corpus. Each realization was analyzed with a window 512 samples wide (at 16kHz sampling rate its length was 32ms). If the center of the window was less than 256 samples away from the next phoneme, it was proportionally less likely to be selected into our dataset. We have trained pairwise classifiers using linear discriminant analysis (LDA). The feature set was log-periodogram, where the analysis window was weighted with Hanning window before computing FFT.</p><p>We have performed comparison testing of our and Wu's method by selecting 500 random samples from the test subset. Per phone results are shown in Table <ref type="table" target="#tab_2">3</ref>. The key statistics is that overall there was 96% agreement between most-likely classifications by our method and Wu's method. The overall success rate was slightly below 40% for both our and Wu's method. Due to the limitations of the features (no F0, no vowel duration, no dynamic information, no multiframe data), suboptimal performance may be expected. For instance without intensity baseline, it is nearly impossible to correctly distinguish some accented vowels. We decided to do a more detailed case study. From the test subset we have chosen sentence SA1 spoken by speaker MREB0 and examined each monophthong at two points in time. The first was 5 milliseconds after the onset, and the other one approximately near the vowel's center. The results are shown in Table <ref type="table" target="#tab_4">4</ref>.</p><p>Likelihoods of most likely estimates of our and Wu's method are again quite close. There are two differences between onset and center predictions. The first one is misprediction of /er/ at the beginning of the word 'greasy', which is quite understandable, since the vowel is preceded by /r/. To gain an insight into the other mispredictions as well as deeper insight into dynamical behavior of the resulting multiclass classifier we present time plots in Fig. <ref type="figure">1</ref>. In Fig. <ref type="figure">1a</ref> the mis-classification of /iy/ instead of TIMIT's /ix/ in the word 'in' is shown. We speculate that the problem might be attributed to greater weight put on F2, that is relatively high and within the region for /iy/, compared to F1 that is quite high and definitely within the region for /ix/. In other words, the vowel might be a bit fronter than canonical /ix/. In Fig. <ref type="figure">1b</ref>, the first vowel of 'greasy' is mis-classified as /ux/ instead of TIMIT's /iy/. This problem might be attributed to coarticulation from the flanking consonants. The first vowel does have lower F2, which is plausibly responsible for /ux/ prediction, but it is preceded by /r/, which is commonly associated with lip protrusion, which lowers F2. In Fig. <ref type="figure">1c</ref> in the vowel of word 'wash', we see that it is only in the beginning in the word 'wash' that the classifier gives more weight to /ao/, and then it increasingly agrees that the vowel is /aa/.</p><p>offset <ref type="bibr">TIMIT</ref>   In this particular case, we conclude that our classification is closer to the phonetic realization than TIMIT's. The beginning of the vowel is influenced by the preceding /w/ with lip rounding similar to /ao/. The rest of the vowel sounds like an /aa/ to phonetically trained listeners, and the formant values correspond to this perception. Finally, Fig. <ref type="figure">1d</ref> shows the preference for /aa/ as the first vowel of 'water' in our model over /ao/ in TIMIT's. Similarly to Fig. <ref type="figure">1c</ref>, this vowel sounds more, and its formant values correspond to our model more, than to TIMIT's. It should be noted, however, that /ao/ and /aa/ have merged in several American dialects and more tokens would be needed for a more thorough analysis.</p><p>A common way to improve the performance in automatic speech recognition is to tune the parameters of the system for a particular speaker. To that end we carried one more experiment. We extracted formants for TIMIT vowels spoken by speaker MREB0 using package phonTools in R <ref type="bibr" target="#b4">[5]</ref>. Next we performed pairwise LDA training as previously but this time used values F1 and F2 for features rather than the log-periodogram. These first two formants are key perceptual features of vowels <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>. Finally, we performed multiclass classification on the first vowel in the word 'water'. The formants contours for this vowel are shown in Fig. <ref type="figure">2</ref>.</p><p>The somewhat suprising results are shown in Fig. <ref type="figure" target="#fig_0">3</ref>. One would expect that it would have little problem with classification of the vowel. As seen in Fig. <ref type="figure" target="#fig_0">3</ref>, except for a brief start, the classifier overwhelmingly believes that the phoneme is much closer to /aa/ than TIMIT annotated /ao/. However, compared to Fig. <ref type="figure">1d</ref> the likelihood of /aa/ is markedly smaller near the vowel's boundaries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>We have described a new method for combining probability estimates from pairwise classifiers. It is quite general and for its application needs only pairwise classifiers that provide posterior likelihoods. We believe that since the rationale for our method is probabilistically motivated, it has the potential to edge out other methods in practice. In particular by its construction it avoids the problem of 'pairwise coupling' approaches pointed out by G. Hinton <ref type="bibr">[1, pg. 467]</ref>. Another important feature is that the resulting probabilities are computed as the dominant eigenvector of a Markov matrix, allowing for efficient computation via iterations when the matrix of binary likelihoods varies slowly in time. Finally, since the method is not hierarchical, it avoids compounding of errors common in hierarchical approaches.</p><p>In presented synthetic and phonetic experiments its performance was very close to a method previously suggested by Wu <ref type="bibr" target="#b1">[2]</ref>. The classification of English vowels was suboptimal, but that may not be indicative of performance in real world scenarios for several reasons.</p><p>• We have used all TIMIT vowel categories, some of which are in previously published performance benchmark tests fused because they are extremely hard to discriminate.</p><p>• Other pairwise classifiers, for instance logistic regression or SVM may yield better results.</p><p>• Based on the last experiment presented, we question whether TIMIT annotation is consistent throughout the corpus even for individual speakers. (d) TIMIT annotation is /ao/ for the first vowel in the word 'water'. We considered an alternative classification that the vowel is /aa/.</p><p>Figure <ref type="figure">1</ref>: Time series plots of multiclass and pairwise classification likelihoods for four vowels in sentence SA1 spoken by MREB0. The top plot in each subfigure shows multiclass likelihoods, and the bottom plot shows binary classification likelihoods r i j . In multiclass plots, dashed dark curve indicates the likelihood of the alternative hypothesis and dark dash-dotted curve that of TIMIT annotation computed by our method (i.e. pi ). Solid curves in multiclass plots indicate corresponding but visually nearly indistinguishable estimates obtained via Wu's method. In binary plots we plot likelihoods of the alternative hypothesis against all other classes. The dotted curve in each binary plot indicates likelihood of the alternative hypothesis compared to the TIMIT annotation.</p><p>q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 39000 39200 39400 39600 39800 40000 40200 600 1000 1400 1800 frequency (Hz) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 2: Formant contours F1-F3 for the first vowel of word 'water' in sentence SA1 spoken by MREB0. Further experiments with a complete ASR system may shed more light on the applicability of the proposed algorithm.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Time series plots of multiclass likelihoods for the first vowel in the word 'water' spoken in sentence SA1 by speaker MREB0. Dark dashed curve indicates likelihood of /aa/, whereas dot-dashed curve indicates likelihood of /ao/. Solid curves, as in Fig. 1, indicate estimate by Wu's method.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>. For Errors of estimation for various values of p 1 and p 2</figDesc><table><row><cell></cell><cell>p 1 0.05 0.05 0.66 0.7 0.09 p 2 ∆ ∆ Wu ∆ rel Wu 0.1 0.1 0.57 0.61 0.09 0.85 0.1 0.07 0.05 0.05 0.85 0.05 0.07 0.05 0.05 0.05 0.85 0.66 0.70 0.1 0.1 0.85 0.58 0.61 0.06 0.33 0.33 0.21 0.22 0.05</cell></row><row><cell cols="2">vowel sample sample word word's transcription</cell></row><row><cell>iy ih eh ae aa ah ao uh uw ux er ax ix axr</cell><cell>beet bit bet bat bott but bought bcl b AO tcl t bcl b IY tcl t bcl b IH tcl t bcl b EH tcl t bcl b AE tcl t bcl b AA tcl t bcl b AH tcl t book bcl b UH kcl k boot bcl b UW tcl t toot tcl t UX tcl t bird bcl b ER dcl d about AX bcl b aw tcl t debit dcl d eh bcl b IX tcl t butter bcl b ah dx AXR s AX-H s pcl p eh kcl k tcl t</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Evaluation of our and Wu's<ref type="bibr" target="#b1">[2]</ref> methods on individual monophthongs from the test data from TIMIT corpus. The first column indicates agreement between classification by our method and TIMIT annotation, the second column the statistics for method of Wu et al, and the third column indicates how often our method and Wu's method agreed on the most-likely classified class.</figDesc><table><row><cell cols="4">vowel success Wu's success agreement rate rate</cell></row><row><cell>iy ih eh ae aa ah ao uh uw ux er ax ix axr ax-h</cell><cell>48 % 21 % 22 % 60 % 48 % 20 % 60 % 18 % 40 % 40 % 34 % 31 % 16 % 48% 81 %</cell><cell>48 % 21 % 23 % 60 % 48 % 21 % 61 % 18 % 39 % 40 % 35 % 31 % 18 % 46 % 81 %</cell><cell>96.6% 94.8 % 95.4 % 94.4 % 96.2 % 94.6 % 97.2 % 95 % 96.4 % 97.4 % 95.6 % 96.4 % 94.4 % 96.2 % 98.8 %</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 :</head><label>4</label><figDesc>Results of monophthong classification using spectral information in 32ms window centered at the offset indicated in the first column. Vowels were extracted from sentence SA1 spoken by speaker MREB0 from region 1 (New England). Most likely classes are shown computed by Wu's method and our method together with multi-class likelihoods.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">J. Yaghob (Ed.): ITAT 2015 pp. 194-199 Charles University in Prague, Prague, 2015</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>Our research was supported by the project University Science Park ITMS 26220220184 and grants APVV-0219-12, APVV-14-0560 and VEGA 2/0197/15. The authors are thankful to Paul Foulkes, K. Bachratá, and Martin Klimo for helpful discussion.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Classification by pairwise coupling</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Hastie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tibshirani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of Statistics</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="451" to="471" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Probability estimates for multi-class classification by pairwise coupling</title>
		<author>
			<persName><forename type="first">T. -F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C. -J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="975" to="1005" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The identification and discrimination of synthetic vowels</title>
		<author>
			<persName><forename type="first">D</forename><surname>Fry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abramson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Eimas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Liberman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language and Speech</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="171" to="189" />
			<date type="published" when="1962">1962</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Garofolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lamel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fiscus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pallett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dahlgren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Zue</surname></persName>
		</author>
		<title level="m">TIMIT acoustic-phonetic continuous speech corpus</title>
				<imprint>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">phonTools: functions for phonetics in R, R package version</title>
		<author>
			<persName><forename type="first">S</forename><surname>Barreda</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">0</biblScope>
			<biblScope unit="page" from="2" to="2" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Toward the specification of speech</title>
		<author>
			<persName><forename type="first">R</forename><surname>Potter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Steinberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Acoust. Soc. Amer</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="807" to="820" />
			<date type="published" when="1950">1950</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Control methods used in a study of vowels</title>
		<author>
			<persName><forename type="first">G</forename><surname>Peterson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Barney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Acoust. Soc. Amer</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="175" to="184" />
			<date type="published" when="1952">1952</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">An analysis of the size information in classical formant data: Peterson and Barney</title>
		<author>
			<persName><forename type="first">R</forename><surname>Turner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Patterson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Acoust. Soc. Jpn</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<date type="published" when="1952">1952. 2003</date>
		</imprint>
	</monogr>
	<note>) revisited</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Vowel perception in normal speakers</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kiefte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nearey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Assmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Handbook of vowels and vowel disor</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Ders</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Ball</surname></persName>
		</editor>
		<editor>
			<persName><surname>Gibbon</surname></persName>
		</editor>
		<imprint>
			<publisher>Psychology Press</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="160" to="185" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
