<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Comput</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1098/rspa</article-id>
      <title-group>
        <article-title>CNN based Parkinson's Disease Assessment using Empirical Mode Decomposition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ayush Tripathi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sunil Kumar Kopparapu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TCS Research &amp; Innovation - Mumbai, Tata Consultancy Services Limited</institution>
          ,
          <addr-line>Maharashtra</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>4</volume>
      <fpage>3063</fpage>
      <lpage>3067</lpage>
      <abstract>
        <p>Parkinson's Disease (PD) is a neuro-degenerative disorder which is caused by a decrease in dopamine producing neurons in the human body and afects the body's motor system. In addition to afecting several motor and non-motor activities of a person's day to day life, PD patients have dificulty in speech production due to reduced coordination of the muscles that control breathing, phonation, articulation and prosody. Analyzing speech allows clinicians to objectively measure the severity of PD in a non-invasive way. In this work, we propose an efective method to discriminate between PD and healthy control (HC) subjects by utilizing a technique to decompose a speech signal into simpler Intrinsic Mode Functions called the Empirical Mode Decomposition. We train a Convolutional Neural Network (CNN) to learn significant properties from raw IMFs for the purpose of PD-HC classification. We evaluate our technique on sustained phonations speech from the Italian Parkinson's Voice and Speech database. Experimental results show that significant characteristics of Parkinsonian dysarthria can be learnt by using the raw IMFs and the need for explicitly extracting handcrafted features could be mitigated.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Parkinson's speech</kwd>
        <kwd>Empirical Mode Decomposition</kwd>
        <kwd>Intrinsic Mode Function</kwd>
        <kwd>sustained phonation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        the signs of PD are often confused with those of
natural aging hence making the diagnosis even more
chalParkinson’s Disease (PD) is a neuro-degenerative dis- lenging. Clinicians widely use the Unified Parkinson’s
order which is caused by a decrease in dopamine pro- Disease Rating Scale (UPDRS) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for evaluation of PD.
ducing neurons in the human body and afects the body’s The evaluation is carried out through face to face
inmotor system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. PD afects 1-2 per 1000 of the pop- terviews and clinical observations using a set of
quesulation at any time. The prevalence of PD increases tions to evaluate: (a) non-motor experiences of daily
with age and it afects roughly 1% of the population living, (b) motor experiences of daily living, (c) motor
above 60 years [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Normal respiratory and well con- examination, and (d) motor complications.
trolled articulatory movements are fundamental for pro- Naturally spoken speech can be analyzed in a
nonducing well-coordinated normal speech. The common invasive manner and hence the study of changes in
signs and symptoms of PD such as tremor, bradykine- acoustic properties of speech are a center-point of
resia, rigid muscles and akinesia hamper the ability of an search for the measurement of symptomatic changes
individual to precisely control the speech producing in PD [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Articulation, voice intensity, frequency
specorgans which leads to disordered speech. This man- trum, and speech intelligibility are the main acoustic
ifests in PD patients in the form of soft voice, mono- parameters observed for tracking changes in speech. It
tone, breathiness, hoarse voice quality, imprecise ar- was observed [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that PD patients sufer from
reducticulation and a decrease in naturalness while speak- tion in the range of articulatory movement which in
ing [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. turn leads to impaired vowel articulation. The
produc
      </p>
      <p>
        In the absence of any specific laboratory test or in- tion of vowels is a complicated process that involves
struments to measure or monitor the evolution and precise control over the movements of the tongue, lips
treatment response of PD, it is extremely crucial to and jaw, creating oropharyngeal resonating cavities,
track the motor functions such as gait freezing and which amplify certain frequency bands of the voice
speech analysis to examine the disease. Importantly, spectrum called formants. The possibility of using
sustained phonation /a/ for discriminating PD from healthy
Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, subjects was first proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Iermelaainld: .t.Eadyiutosrhs@oftctsh.ecoPmroc(Aee.dTinrigpsa:tShtie)f;an Conrad, Ilaria Tiddi A set of 13 features describing diferent aspects of
sunilkumar.kopparapu@tcs.com (S.K. Kopparapu) Parkinsonian speech for the task was suggested in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
url: https://www.tcs.com (S.K. Kopparapu) Phonation and rhythm features [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and other vowel
orcid: 0000-0002-7944-2260 (A. Tripathi); 0000-0002-0502-527X features [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to capture characteristics of PD dysarthria
(S.K. Kopp©a2r02a0pCuop)yright for this paper by its authors. Use permitted under Creative have been proposed in literature. An extensive feature
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org) analysis followed by a 2 stage feature selection to
represent physiological aspects of PD obtained from sus- Table 1
tained vowel /a/ and DDK task was proposed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. A Number of 1 second utterances for PD and HC categories in
set of frame-level features was used to construct a Fis- the dataset.
cher Vector representation of the speech sample along Phonation PD HC
tvwiecicthfteoaartSbuuarsepespdowaratpsVpreuocsateocdrhiManlao[cn1h3gi]nweinitchloarasdsleiafierrrgtieonsi[ed1te2on]f.tiaAfcynotuih-se- ///iea/// 433089350 222996079
most relevant features for characterizing the disorders /o/ 400 284
in speech of PD patients. Voxtester [14], is a system for /u/ 379 305
apsasreasmsientgerPsDinrcelluadteindgi:mvpoaiciremsepnetctbryumus,infogrmawanidtse, sDeDtoKf Total 1957 1445
rate, voice intensity and vocal sound pressure level.
      </p>
      <p>With the advent of machine learning in all spheres been recorded in a warm, echo free and quiet room
of processing the trend has been to extract more and at a sampling frequency of 16 kHz by keeping the
mimore features using signal processing in order to dis- crophone at a distance of 15 to 25 centimeters from
criminate PD and HC subjects. In this paper, we pro- the subject. The speech intelligibility of the patients
pose a method to classify PD and HC by decompos- was perceptually assessed on a 5-point scale based on
ing the speech utterance by using the Empirical Mode the UPDRS protocol. The following reading tasks were
Decomposition (EMD) technique. EMD is the process performed by the subjects:
of decomposing non-stationary time series into
simpler Intrinsic Mode Functions (IMF) in the time do- • 2 phonations each of the vowel /a/, /e/, /i/, /o/,
main. This technique has had various applications in /u/
the speech domain such as enhancement, denoising • execution of syllable /pa/ and /ka/ (5 sec)
[15], formant tracking [16], pathological voice analysis
[17], emotion recognition [18], glottal activity detec- • 2 readings of a phonetically balanced text
tion [19] etc. In these studies, the emphasis has been
on extracting temporal and spectral features using the
IMFs which are then used for classification tasks. How- In our study, we use a subset of this dataset, namely
ever, to the best of our knowledge, employing raw IMFs the sustained phonations (/a/, /e/, /i/, /o/, /u/).
Defor classification of pathological speech has not been pending on the severity of the condition and the speaker,
studied. The main contribution of this paper lies in the amount of time a subject can sustain a phonation
using a Convolutional Neural Network architecture to is diferent and subsequently the length (in seconds) of
learn these features from raw IMFs without the need of the audio recordings are unequal. As will be discussed
explicitly extracting handcrafted features for the pur- in Section 3, we segment the unequal length speech
pose of PD-HC classification. The approach is vali- samples into non-overlapping segments (utterance) of
dated on the Italian Parkinson’s Voice and Speech databaseea.ch 1 second duration. In all there were 1957
utterThe rest of the paper is organized as follows: Section ances from PD and 1445 utterances from HC (see
Ta2 describes the database used for the experiments; we ble 1); this forms the data in all our experiments on
provide the description of the proposed approach in the phonation data for PD-HC classification. For
comSection 3 while Section 4 details achieved results. We plete information on the recording protocol, the
subdiscuss the salient aspects of the proposed approach jects and the tasks, please refer to [21].
while also providing an analogy to the traditional
feature extraction based methods in Section 5 and
conclude in Section 6</p>
    </sec>
    <sec id="sec-2">
      <title>3. Proposed Approach</title>
      <p>• reading of phonetically balanced words and phrases</p>
      <sec id="sec-2-1">
        <title>The proposed PD diagnosis system consists of two ma</title>
        <p>2. Dataset jor parts. First, the raw speech utterance of 1
second duration is decomposed into its Intrinsic Mode
The Italian Parkinson’s Voice and Speech database [20] Functions (IMFs) by using the Empirical Mode
Decomconsists of recordings from 28 (19 Male, 9 Female) speak- position (EMD) technique. A 1D-CNN model is then
ers with Parkinson’s Disease aged between 40 and 80 trained using the raw IMFs as input for classifying the
years and 22 (10 Male, 12 Female) healthy controls (HC) speech utterance into one of the two categories, namely,
aged between 60 and 77 years. The utterances have HC or PD. We now describe the signal decomposition
sponding IMFs, the signal is subjected to a sifting
pro</p>
        <sec id="sec-2-1-1">
          <title>In order to decompose a signal  [ ] into its corre- sample.</title>
          <p>cess, namely,
process and the architecture of the 1D-CNN model used
in our experiments.
3.1. Empirical Mode Decomposition
Empirical Mode Decomposition is an adaptive, data
driven technique used to decompose non-stationary
and non-linear signals into Intrinsic Mode Functions
of a signal, in the time-domain itself without the
requirement of any a priori basis [22]. Any function that
satisfies the following two conditions is categorized as
an Intrinsic Mode Function:
1. The number of extrema and the number of zero
crossings in the signal must be either equal or
difer at most by one, and
2. The mean value of the envelope defined by
joining the points of local minima and local maxima
must be zero.
1. For the signal  [ ], find the locations of all local
maxima and minima. Define initial residue as,
 0[ ] =  [ ]
2. Connect all the local maxima (minima) by
applying a cubic spline interpolation to obtain upper
(lower) envelope  
3. Compute the mean  
4. Update initial residue
(
 
=
( 
)
.</p>
          <p>+ 
2</p>
          <p>)
 0[ ] ←  0[ ] −  
of an IMF.
5. Repeat Steps 1 - 4 until  0[ ] =  [ ]
gets reduced
to a function ℎ1[ ] which satisfies the properties
6. Obtain the first residue  1[ ] =  0[ ] − ℎ1[ ]
7. Repeat Steps 1-6 with the residue  1[ ] as the
inieither monotonic, or a function with single
maxima and minima or is a constant.</p>
          <p>[ ]

 [ ] =   [ ] + ∑ ℎ [ ]
(1)
duration, where the decomposition is curtailed at  =</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>9. Note that the process of decomposing a signal into</title>
          <p>8. Stop the process when the residue
tial residue to find all the IMFs ℎ [ ]  = 1, 2, ⋯ ,  . Figure 2: IMFs for PD and HC, (a)-(f) ((g)-(l)) represent first
becomes 5 IMFs and residue for PD (HC) speech of phonation /a/.
residue, namely,
 [ ] can be represented as a sum of IMFs and the final
By performing the decomposition process, the signal taneous amplitude and frequency is termed as Hilbert</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>IMFs and then representing the IMFs using the instan</title>
      </sec>
      <sec id="sec-2-3">
        <title>Huang Transform (HHT). Features extracted from the</title>
      </sec>
      <sec id="sec-2-4">
        <title>IMFs can be used as complimentary features to the</title>
        <p>standard signal processing practices. In this regard,</p>
      </sec>
      <sec id="sec-2-5">
        <title>HHT can be understood as a generalized Fourier Trans</title>
        <p>form that represents the signal in terms of a finite
number of components [23].</p>
        <p>
          In general, healthy speech is more coherent than the
speech of a PD patient and as a result HC speech is
decomposed faster (smaller  ) than PD speech. This
observation forms the hypothesis of our work.
Previous studies have focused on using handcrafted
spectral and temporal features extracted from these IMFs
in order to discriminate between healthy and
pathological speech (see [
          <xref ref-type="bibr" rid="ref11">11, 24</xref>
          ]). In this paper, we propose
a machine learning approach to use the raw IMFs in
order to diagnose the presence of Parkinson’s disease.
The first set of results are on the sustained phonations
from both PD and HC. We consider the first five IMFs,
namely, ℎ1[ ] to ℎ5[ ] and the residue,  5[ ] as the
input to our classifier.
        </p>
        <p>Figure 2 depicts the first 5 IMFs and the final residue
corresponding to the sustained phonation /a/ spoken
by a HC ((a)-(f)) and a PD ((g)-(l)) subject. Clearly, one
can visually notice the diference between the IMFs
and the residue for HC and PD speech sample. These
IMFs capture the characteristics of the parent signal
and hence can be employed to extract information
useful for pathological speech classification. This is the
diference we wish to exploit to discriminate speech
uttered by PD and speech uttered by HC.
3.2. Experimental Setup
to 49 speakers are used for training the model and the
model is tested on the left out speaker. For all
experiments, 20% of the training data is randomly chosen
for the purpose of validating the model. For the test
speaker, the posterior probabilities obtained from the
model output for each 1 second utterance was
averaged for classification. Note that Italian PD dataset
is not very large (as is common with any
pathological speech databases) to define separate train, test and
validation sets, using leave one out mechanism allows
predictions for all the speakers without relying on any
sort of speaker specific information.</p>
      </sec>
      <sec id="sec-2-6">
        <title>The architecture of the 1D-CNN model used for the</title>
        <p>classification task is shown in Figure 3. The input to
the 1D-CNN model is the raw IMF signal. The
1DCNN was trained using Keras [25] deep learning
library with Tensorflow [26] backend. We use speech
signal (as mentioned in Table 1) of 1 second which
corresponds to 16000 samples. Each of the 1 second
speech utterance is subject to the EMD process and
the first 5 IMFs (ℎ1[ ], ℎ2[ ], ⋯ , ℎ5[ ]) were extracted 4. Results
along with the final residue (  5[ ]). These are then fed
as input to a multiple-input 1D-CNN network. Thus, The experimental results using 1D-CNN obtained for
the input to the network is a set of 6, 16000 dimen- leave-one-speaker-out for diferent phonations are
tabsional vector (time series). We set the kernel size for ulated in Table 2. In order to account for variations in
the CNN to be 320 with a stride of 160 and the num- outcomes due to random weight initialization of the
ber of filters is chosen by performing a grid search to 1D-CNN, we repeat the experiment 5 times and report
optimize the classification accuracy. The output of the the average accuracy obtained in Table 2. We also
reCNN is then concatenated after a Global MaxPooling port the specificity and sensitivity which is defined as
operation and is fed to a fully connected layer with the percentage of correctly classified HC and PD
utReLU activation function, while the number of neu- terances respectively. The confusion matrix for 5
inrons is optimized by using a grid search. For the output dividual runs for the phonation /a/ is also shown in
layer, softmax activation function is used with the out- Table 3, as can be observed the number of correctly
put dimensions being the two classes, namely, HC and recognized subjects are not significantly diferent; the
PD. The target to the model was one-hot encoding of variation between diferent runs is ±2. As can be
obthe health state of the individual. We trained the net- served in Figure 2, the final residue (  5[ ]) is most
rework using binary cross-entropy loss with Adam op- flective of the diference between PD and HC speech
timizer. We set the learning rate to the default value samples followed by IMFs ℎ4[ ] and ℎ5[ ]. To evaluate
of 0.001. In order to obtain speaker independent re- if  5[ ] by itself independently captures the
discrimisults which can be scaled to populations outside the nating properties between HC and PD, we trained a
training set, we perform a leave-one-speaker-out vali- single input 1D-CNN model using  5[ ] as the input,
dation of the model wherein utterances corresponding
/a/
/e/
/i/
/o/
/u/
namely, all inputs were 0 except the last residue
input  5[ ] in Figure 3. We perform a similar analysis by
training another model with inputs as signals ℎ4[ ],
ℎ5[ ] and  5[ ]. The results obtained by using these 5. Discussion
approaches are reported in Tables 4 and 5. Clearly, the
performance detoriates (it can be observed that for the EMD is a popular decomposition technique used to
anphonation /a/ there is drop in accuracy from 76% to alyze non-stationary and non-linear signals. The IMFs
69.6% and 64.4%) compared to when all the IMFs and can be used to extract features like instantaneous
amresidue are used together. Further, we combine the re- plitude and frequency, marginal spectrum etc which
sults obtained by using each of the individual phona- are relevant for pathological speech classification
Howtions by taking a majority vote on the predictions ob- ever, in this paper we propose a deep architecture in
tained by each of the 5 diferent models. The class con- the form of 1D-CNN which allows us to use raw IMF
fusion matrix using this approach is presented in Table signal instead of having to select and extract explicit
6. We achieve an average accuracy of 85%, while the features useful for pathological speech classification.
specificity and sensitivity values are 81.82% and 87.5% It is commonly assumed that neural networks are black
respectively. boxes that are unable to interpretable results. We
at</p>
        <p>The use of IMFs signals as raw features in a 1D- tempt to explain the performance of the proposed
arCNN classifier shows promise to be able to discrim- chitecture.
inate PD and HC as can be seen in Table 2. To the For the 1D-CNN, we used a kernel size of 320 with
best of our knowledge, a study on classification of PD a stride of 160. In the hindsight this is equivalent to
PD
HC</p>
        <p>
          PD
and HC using the Italian Parkinson’s Voice and Speech
has not been attempted earlier. However, our results
are comparable to the state-of-the art measures which
have been validated on other datasets, for example [
          <xref ref-type="bibr" rid="ref11">11,
12, 13, 27</xref>
          ]. Note that we did not have access to these
datasets to make a direct comparison. On closer
observation, we observed that most of the misclassified
PD patients by our proposed approach belong to the
class of 11 (of the 28) PD patients in the database who
were rated 0 (namely, having no speech problems) on
the UPDRS test scale by the clinicians. This is
consistent with the fact that assigning a precise rating (PD
or HC) for these boundary cases is challenging even
for the trained experts which translates to
misclassification of these samples.
extracting features from 20 ms of speech with a shift
of 10 ms which is common practice in speech
processing owing to the non-stationary nature of the speech
signal. Further,
• The 1D-CNN network can be assumed to be a
feature extraction mechanism which, given a raw
IMF (or residue), extracts a set of discriminative
features. The number of filters may be
interpreted as the number of features extracted from
a particular input signal.
• The extracted features from input signals ℎ1[ ]
- ℎ5[ ] and  5[ ] are then concatenated to form
a feature vector.
• The Dense layers then act as a simple binary
classifier with the input as the concatenated
feature vector.
        </p>
        <p>As one can observe, the use of raw IMFs mitigates
the need to explicitly extract handcrafted features from
the IMFs, the 1D-CNN architecture learns
discriminating features from the raw signal to distinguish between
PD and HC speech samples. For the purpose of
decomposing the signal, the speech sample is segmented into
ifxed durations of 1 second each. This duration is long
enough to capture the non-stationary aspect of speech
as well as the dynamics involved in the phonation of
vowel sounds.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusion</title>
      <p>Parkinson’s Disease is a chronic neuro-degenerative
disease which is dificult to diagnose. The symptoms
of PD can be mistaken with natural aging, thereby
making the diagnosis very very challenging. Tracking changes
in speech has proven to be a useful tool for establishing
non-invasive approach to early detection of PD. In this
work, we propose an eficient technique to
discriminate PD and HC patients by analyzing their speech
samples of sustained phonation. Traditional approaches
have focused on experimenting with handcrafted
spectral and temporal features. In this paper, however, we
focus on machine learning the discriminating features
of speech associated with PD patients and healthy
control from the raw IMF signals. We train a 1D-CNN
model using these raw IMFs to learn the
discriminating properties in the signals to classify PD and HC
subjects.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hoehn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yahr</surname>
          </string-name>
          , Parkinsonism: onset, progression and mortality,
          <source>Neurology</source>
          <volume>17</volume>
          (
          <year>1967</year>
          )
          <fpage>427</fpage>
          -
          <lpage>442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Tysnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storstein</surname>
          </string-name>
          ,
          <article-title>Epidemiology of parkinson's disease</article-title>
          ,
          <source>Journal of Neural Transmission</source>
          <volume>124</volume>
          (
          <year>2017</year>
          )
          <fpage>901</fpage>
          -
          <lpage>905</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00702-017-1686-y.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iansek</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. C.</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. J.L.</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. S.</surname>
          </string-name>
          ,
          <article-title>Speech impairment in a large sample of patients with parkinson's disease</article-title>
          ,
          <source>Behavioral Neurology</source>
          <volume>11</volume>
          (
          <year>1998</year>
          /
          <year>1999</year>
          )
          <fpage>131</fpage>
          -
          <lpage>137</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Elton</surname>
          </string-name>
          ,
          <article-title>Unified parkinsons disease rating scale</article-title>
          ,
          <source>Recent Developments in Parkinsons Disease,Macmillan Health Care Information</source>
          <volume>2</volume>
          (
          <year>1987</year>
          )
          <fpage>153</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>Disorders of speech and language in parkinson's disease, Mental and Behavioral Dysfunction in Movement Disorders, M. A</article-title>
          . Be´dard,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Agid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Korczyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lesperance</surname>
          </string-name>
          , and S. Chouinard, Eds. New York,NY, USA: Humana Press, (
          <year>2003</year>
          )
          <fpage>125</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iansek</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. C.</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. J.L.</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. S.</surname>
          </string-name>
          ,
          <article-title>Motor instability in parkinsonian speech intensity</article-title>
          ,
          <source>Neuropsychiatry, Neuropsychology and Behavioral Neurology</source>
          <volume>14</volume>
          (
          <year>2001</year>
          )
          <fpage>109</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Little</surname>
          </string-name>
          ∗,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>McSharry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Spielman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Ramig</surname>
          </string-name>
          ,
          <article-title>Suitability of dysphonia measurements for telemonitoring of parkinson's disease</article-title>
          ,
          <source>IEEE Transactions on Biomedical Engineering</source>
          <volume>56</volume>
          (
          <year>2009</year>
          )
          <fpage>1015</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Novotn</surname>
          </string-name>
          y´,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rusz</surname>
          </string-name>
          , R. Cˇ mejla, E. Ru˚zˇicˇka,
          <article-title>Automatic evaluation of articulatory disorders in parkinson's disease</article-title>
          ,
          <source>IEEE/ACM Transactions on Audio, Speech, and Language Processing</source>
          <volume>22</volume>
          (
          <year>2014</year>
          )
          <fpage>1366</fpage>
          -
          <lpage>1378</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rusz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cmejla</surname>
          </string-name>
          ,
          <article-title>Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated parkinson's disease</article-title>
          ,
          <source>Journal of Acoustical Society of America</source>
          <volume>129</volume>
          (
          <year>2011</year>
          )
          <fpage>350</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rusz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cmejla</surname>
          </string-name>
          ,
          <article-title>Imprecise vowel articulation as a potential early marker of parkinson's disease: Efect of speaking task</article-title>
          ,
          <source>Journal of Acoustical Society of America</source>
          <volume>134</volume>
          (
          <year>2013</year>
          )
          <fpage>2171</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rueda</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Va´squez-</article-title>
          <string-name>
            <surname>Correa</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Rios-Urrego</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Orozco-Arroyave</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Krishnan</surname>
          </string-name>
          , E. Noeth,
          <article-title>Feature Representation of Pathophysiology of Parkinsonian Dysarthria</article-title>
          ,
          <source>in: Proc. Interspeech</source>
          <year>2019</year>
          ,
          <year>2019</year>
          , pp.
          <fpage>3048</fpage>
          -
          <lpage>3052</lpage>
          . URL: http://dx.doi.org/ 10.21437/Interspeech.2019-
          <fpage>2490</fpage>
          . doi:
          <volume>10</volume>
          .21437/ Interspeech.2019-
          <volume>2490</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>