=Paper=
{{Paper
|id=Vol-2699/paper28
|storemode=property
|title=CNN based Parkinson's Disease Assessment using Empirical Mode Decomposition
|pdfUrl=https://ceur-ws.org/Vol-2699/paper28.pdf
|volume=Vol-2699
|authors=Ayush Tripathi,Sunil Kumar Kopparapu
|dblpUrl=https://dblp.org/rec/conf/cikm/TripathiK20
}}
==CNN based Parkinson's Disease Assessment using Empirical Mode Decomposition==
CNN based Parkinson’s Disease Assessment using Empirical Mode Decomposition Ayush Tripathia , Sunil Kumar Kopparapua a TCS Research & Innovation - Mumbai, Tata Consultancy Services Limited, Maharashtra, India. Abstract Parkinson’s Disease (PD) is a neuro-degenerative disorder which is caused by a decrease in dopamine producing neurons in the human body and affects the body’s motor system. In addition to affecting several motor and non-motor activities of a person’s day to day life, PD patients have difficulty in speech production due to reduced coordination of the muscles that control breathing, phonation, articulation and prosody. Analyzing speech allows clinicians to objectively measure the severity of PD in a non-invasive way. In this work, we propose an effective method to discriminate between PD and healthy control (HC) subjects by utilizing a technique to decompose a speech signal into simpler Intrinsic Mode Functions called the Empirical Mode Decomposition. We train a Convolutional Neural Network (CNN) to learn significant properties from raw IMFs for the purpose of PD-HC classification. We evaluate our technique on sustained phonations speech from the Italian Parkinson’s Voice and Speech database. Experimental results show that significant characteristics of Parkinsonian dysarthria can be learnt by using the raw IMFs and the need for explicitly extracting handcrafted features could be mitigated. Keywords Parkinson’s speech, Empirical Mode Decomposition, Intrinsic Mode Function, sustained phonation 1. Introduction the signs of PD are often confused with those of natu- ral aging hence making the diagnosis even more chal- Parkinson’s Disease (PD) is a neuro-degenerative dis- lenging. Clinicians widely use the Unified Parkinson’s order which is caused by a decrease in dopamine pro- Disease Rating Scale (UPDRS) [4] for evaluation of PD. ducing neurons in the human body and affects the body’s The evaluation is carried out through face to face in- motor system [1]. PD affects 1-2 per 1000 of the pop- terviews and clinical observations using a set of ques- ulation at any time. The prevalence of PD increases tions to evaluate: (a) non-motor experiences of daily with age and it affects roughly 1% of the population living, (b) motor experiences of daily living, (c) motor above 60 years [2]. Normal respiratory and well con- examination, and (d) motor complications. trolled articulatory movements are fundamental for pro- Naturally spoken speech can be analyzed in a non- ducing well-coordinated normal speech. The common invasive manner and hence the study of changes in signs and symptoms of PD such as tremor, bradykine- acoustic properties of speech are a center-point of re- sia, rigid muscles and akinesia hamper the ability of an search for the measurement of symptomatic changes individual to precisely control the speech producing in PD [5]. Articulation, voice intensity, frequency spec- organs which leads to disordered speech. This man- trum, and speech intelligibility are the main acoustic ifests in PD patients in the form of soft voice, mono- parameters observed for tracking changes in speech. It tone, breathiness, hoarse voice quality, imprecise ar- was observed [6] that PD patients suffer from reduc- ticulation and a decrease in naturalness while speak- tion in the range of articulatory movement which in ing [3]. turn leads to impaired vowel articulation. The produc- In the absence of any specific laboratory test or in- tion of vowels is a complicated process that involves struments to measure or monitor the evolution and precise control over the movements of the tongue, lips treatment response of PD, it is extremely crucial to and jaw, creating oropharyngeal resonating cavities, track the motor functions such as gait freezing and which amplify certain frequency bands of the voice speech analysis to examine the disease. Importantly, spectrum called formants. The possibility of using sus- tained phonation /a/ for discriminating PD from healthy Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, subjects was first proposed in [7]. Ireland. Editors of the Proceedings: Stefan Conrad, Ilaria Tiddi email: t.ayush@tcs.com (A. Tripathi); A set of 13 features describing different aspects of sunilkumar.kopparapu@tcs.com (S.K. Kopparapu) Parkinsonian speech for the task was suggested in [8]. url: https://www.tcs.com (S.K. Kopparapu) Phonation and rhythm features [9] and other vowel orcid: 0000-0002-7944-2260 (A. Tripathi); 0000-0002-0502-527X features [10] to capture characteristics of PD dysarthria (S.K. Kopparapu) © 2020 Copyright for this paper by its authors. Use permitted under Creative have been proposed in literature. An extensive feature Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 analysis followed by a 2 stage feature selection to rep- resent physiological aspects of PD obtained from sus- Table 1 tained vowel /a/ and DDK task was proposed in [11]. A Number of 1 second utterances for PD and HC categories in set of frame-level features was used to construct a Fis- the dataset. cher Vector representation of the speech sample along Phonation PD HC with a Support Vector Machine classifier in [12]. An i- /a/ 390 269 vector based approach along with a large set of acous- /e/ 385 290 tic features was used in [13] in order to identify the /i/ 403 297 most relevant features for characterizing the disorders /o/ 400 284 in speech of PD patients. Voxtester [14], is a system for /u/ 379 305 assessing PD related impairment by using a wide set of Total 1957 1445 parameters including: voice spectrum, formants, DDK rate, voice intensity and vocal sound pressure level. With the advent of machine learning in all spheres been recorded in a warm, echo free and quiet room of processing the trend has been to extract more and at a sampling frequency of 16 kHz by keeping the mi- more features using signal processing in order to dis- crophone at a distance of 15 to 25 centimeters from criminate PD and HC subjects. In this paper, we pro- the subject. The speech intelligibility of the patients pose a method to classify PD and HC by decompos- was perceptually assessed on a 5-point scale based on ing the speech utterance by using the Empirical Mode the UPDRS protocol. The following reading tasks were Decomposition (EMD) technique. EMD is the process performed by the subjects: of decomposing non-stationary time series into sim- pler Intrinsic Mode Functions (IMF) in the time do- • 2 phonations each of the vowel /a/, /e/, /i/, /o/, main. This technique has had various applications in /u/ the speech domain such as enhancement, denoising • execution of syllable /pa/ and /ka/ (5 sec) [15], formant tracking [16], pathological voice analysis [17], emotion recognition [18], glottal activity detec- • 2 readings of a phonetically balanced text tion [19] etc. In these studies, the emphasis has been on extracting temporal and spectral features using the • reading of phonetically balanced words and phrases IMFs which are then used for classification tasks. How- In our study, we use a subset of this dataset, namely ever, to the best of our knowledge, employing raw IMFs the sustained phonations (/a/, /e/, /i/, /o/, /u/). De- for classification of pathological speech has not been pending on the severity of the condition and the speaker, studied. The main contribution of this paper lies in the amount of time a subject can sustain a phonation using a Convolutional Neural Network architecture to is different and subsequently the length (in seconds) of learn these features from raw IMFs without the need of the audio recordings are unequal. As will be discussed explicitly extracting handcrafted features for the pur- in Section 3, we segment the unequal length speech pose of PD-HC classification. The approach is vali- samples into non-overlapping segments (utterance) of dated on the Italian Parkinson’s Voice and Speech database. each 1 second duration. In all there were 1957 utter- The rest of the paper is organized as follows: Section ances from PD and 1445 utterances from HC (see Ta- 2 describes the database used for the experiments; we ble 1); this forms the data in all our experiments on provide the description of the proposed approach in the phonation data for PD-HC classification. For com- Section 3 while Section 4 details achieved results. We plete information on the recording protocol, the sub- discuss the salient aspects of the proposed approach jects and the tasks, please refer to [21]. while also providing an analogy to the traditional fea- ture extraction based methods in Section 5 and con- clude in Section 6 3. Proposed Approach The proposed PD diagnosis system consists of two ma- 2. Dataset jor parts. First, the raw speech utterance of 1 sec- ond duration is decomposed into its Intrinsic Mode The Italian Parkinson’s Voice and Speech database [20] Functions (IMFs) by using the Empirical Mode Decom- consists of recordings from 28 (19 Male, 9 Female) speak- position (EMD) technique. A 1D-CNN model is then ers with Parkinson’s Disease aged between 40 and 80 trained using the raw IMFs as input for classifying the years and 22 (10 Male, 12 Female) healthy controls (HC) speech utterance into one of the two categories, namely, aged between 60 and 77 years. The utterances have HC or PD. We now describe the signal decomposition process and the architecture of the 1D-CNN model used in our experiments. 3.1. Empirical Mode Decomposition Empirical Mode Decomposition is an adaptive, data driven technique used to decompose non-stationary and non-linear signals into Intrinsic Mode Functions of a signal, in the time-domain itself without the re- quirement of any a priori basis [22]. Any function that satisfies the following two conditions is categorized as an Intrinsic Mode Function: 1. The number of extrema and the number of zero crossings in the signal must be either equal or differ at most by one, and 2. The mean value of the envelope defined by join- ing the points of local minima and local maxima must be zero. Figure 1: Empirical Mode Decomposition of a 1 second In order to decompose a signal 𝑠[𝑛] into its corre- sample. sponding IMFs, the signal is subjected to a sifting pro- cess, namely, 1. For the signal 𝑠[𝑛], find the locations of all local maxima and minima. Define initial residue as, 𝑟0 [𝑛] = 𝑠[𝑛] 2. Connect all the local maxima (minima) by apply- ing a cubic spline interpolation to obtain upper (lower) envelope 𝐸𝑢𝑝𝑝𝑒𝑟 (𝐸𝑙𝑜𝑤𝑒𝑟 ). (𝐸 ) 3. Compute the mean 𝐸𝑚𝑒𝑎𝑛 = 𝑢𝑝𝑝𝑒𝑟 2 𝑙𝑜𝑤𝑒𝑟 +𝐸 4. Update initial residue 𝑟0 [𝑛] ← 𝑟0 [𝑛] − 𝐸𝑚𝑒𝑎𝑛 5. Repeat Steps 1 - 4 until 𝑟0 [𝑛] = 𝑠[𝑛] gets reduced to a function ℎ1 [𝑛] which satisfies the properties of an IMF. 6. Obtain the first residue 𝑟1 [𝑛] = 𝑟0 [𝑛] − ℎ1 [𝑛] 7. Repeat Steps 1-6 with the residue 𝑟1 [𝑛] as the ini- tial residue to find all the IMFs ℎ𝑖 [𝑛] 𝑖 = 1, 2, ⋯ , 𝐾 . Figure 2: IMFs for PD and HC, (a)-(f) ((g)-(l)) represent first 8. Stop the process when the residue 𝑟𝐾 [𝑛] becomes 5 IMFs and residue for PD (HC) speech of phonation /a/. either monotonic, or a function with single max- ima and minima or is a constant. IMFs and then representing the IMFs using the instan- By performing the decomposition process, the signal taneous amplitude and frequency is termed as Hilbert 𝑠[𝑛] can be represented as a sum of IMFs and the final Huang Transform (HHT). Features extracted from the residue, namely, IMFs can be used as complimentary features to the standard signal processing practices. In this regard, 𝐾 𝑠[𝑛] = 𝑟𝐾 [𝑛] + ∑ ℎ𝑖 [𝑛] (1) HHT can be understood as a generalized Fourier Trans- 𝑖=1 form that represents the signal in terms of a finite num- ber of components [23]. Figure 1 depicts the IMFs obtained as a result of de- In general, healthy speech is more coherent than the composing a natural speech utterance of one second speech of a PD patient and as a result HC speech is duration, where the decomposition is curtailed at 𝐾 = decomposed faster (smaller 𝐾 ) than PD speech. This 9. Note that the process of decomposing a signal into observation forms the hypothesis of our work. Previ- ous studies have focused on using handcrafted spec- h [n] 1 h [n] h [n]2 h [n] h [n] 3 r [n] 4 5 5 tral and temporal features extracted from these IMFs InputLayer InputLayer InputLayer InputLayer InputLayer InputLayer in order to discriminate between healthy and patho- logical speech (see [11, 24]). In this paper, we propose Batch Normalization Batch Normalization Batch Normalization Batch Normalization Batch Normalization Batch Normalization a machine learning approach to use the raw IMFs in Conv1D Conv1D Conv1D Conv1D Conv1D Conv1D order to diagnose the presence of Parkinson’s disease. The first set of results are on the sustained phonations Global MaxPooling1D Global MaxPooling1D Global MaxPooling1D Global MaxPooling1D Global MaxPooling1D Global MaxPooling1D from both PD and HC. We consider the first five IMFs, namely, ℎ1 [𝑛] to ℎ5 [𝑛] and the residue, 𝑟5 [𝑛] as the in- Concatenate put to our classifier. Dense Figure 2 depicts the first 5 IMFs and the final residue corresponding to the sustained phonation /a/ spoken Dense by a HC ((a)-(f)) and a PD ((g)-(l)) subject. Clearly, one Prediction can visually notice the difference between the IMFs and the residue for HC and PD speech sample. These Figure 3: Proposed 1D-CNN Architecture. IMFs capture the characteristics of the parent signal and hence can be employed to extract information use- ful for pathological speech classification. This is the to 49 speakers are used for training the model and the difference we wish to exploit to discriminate speech model is tested on the left out speaker. For all exper- uttered by PD and speech uttered by HC. iments, 20% of the training data is randomly chosen for the purpose of validating the model. For the test 3.2. Experimental Setup speaker, the posterior probabilities obtained from the The architecture of the 1D-CNN model used for the model output for each 1 second utterance was aver- classification task is shown in Figure 3. The input to aged for classification. Note that Italian PD dataset the 1D-CNN model is the raw IMF signal. The 1D- is not very large (as is common with any pathologi- CNN was trained using Keras [25] deep learning li- cal speech databases) to define separate train, test and brary with Tensorflow [26] backend. We use speech validation sets, using leave one out mechanism allows signal (as mentioned in Table 1) of 1 second which predictions for all the speakers without relying on any corresponds to 16000 samples. Each of the 1 second sort of speaker specific information. speech utterance is subject to the EMD process and the first 5 IMFs (ℎ1 [𝑛], ℎ2 [𝑛], ⋯ , ℎ5 [𝑛]) were extracted 4. Results along with the final residue (𝑟5 [𝑛]). These are then fed as input to a multiple-input 1D-CNN network. Thus, The experimental results using 1D-CNN obtained for the input to the network is a set of 6, 16000 dimen- leave-one-speaker-out for different phonations are tab- sional vector (time series). We set the kernel size for ulated in Table 2. In order to account for variations in the CNN to be 320 with a stride of 160 and the num- outcomes due to random weight initialization of the ber of filters is chosen by performing a grid search to 1D-CNN, we repeat the experiment 5 times and report optimize the classification accuracy. The output of the the average accuracy obtained in Table 2. We also re- CNN is then concatenated after a Global MaxPooling port the specificity and sensitivity which is defined as operation and is fed to a fully connected layer with the percentage of correctly classified HC and PD ut- ReLU activation function, while the number of neu- terances respectively. The confusion matrix for 5 in- rons is optimized by using a grid search. For the output dividual runs for the phonation /a/ is also shown in layer, softmax activation function is used with the out- Table 3, as can be observed the number of correctly put dimensions being the two classes, namely, HC and recognized subjects are not significantly different; the PD. The target to the model was one-hot encoding of variation between different runs is ±2. As can be ob- the health state of the individual. We trained the net- served in Figure 2, the final residue (𝑟5 [𝑛]) is most re- work using binary cross-entropy loss with Adam op- flective of the difference between PD and HC speech timizer. We set the learning rate to the default value samples followed by IMFs ℎ4 [𝑛] and ℎ5 [𝑛]. To evaluate of 0.001. In order to obtain speaker independent re- if 𝑟5 [𝑛] by itself independently captures the discrimi- sults which can be scaled to populations outside the nating properties between HC and PD, we trained a training set, we perform a leave-one-speaker-out vali- single input 1D-CNN model using 𝑟5 [𝑛] as the input, dation of the model wherein utterances corresponding Table 2 Table 5 Accuracies for Phonation tasks (proposed approach). Accuracies for Phonation tasks (using only residue). Phonation Accuracy Specificity Sensitivity Phonation Accuracy Specificity Sensitivity /a/ 76.00 80.00 72.86 /a/ 64.4 52.72 73.57 /e/ 76.40 78.57 73.64 /e/ 67.2 74.54 61.43 /i/ 72.00 68.57 76.36 /i/ 56.4 51.82 60.00 /o/ 72.40 68.57 77.27 /o/ 62.4 55.45 67.86 /u/ 72.00 70.00 74.55 /u/ 61.2 57.27 64.29 Average 73.76 73.14 74.94 Average 62.32 58.36 65.43 Table 3 Table 6 Confusion matrix for 5 runs for the phonation /a/ (proposed Class confusion matrix for the classification system by using approach). majority voting across all 5 sustained phonations. PD HC PD HC PD 21, 20, 20, 7, 8, 8, PD 87.5 12.5 21, 20 (72.86%) 7, 8 (27.14%) HC 18.18 81.82 HC 5, 3, 4, 17, 19, 18, 5, 5 (20.0%) 17, 17 (80.0%) and HC using the Italian Parkinson’s Voice and Speech has not been attempted earlier. However, our results Table 4 are comparable to the state-of-the art measures which Accuracies for Phonation tasks (using ℎ4 [𝑛], ℎ5 [𝑛] and 𝑟5 [𝑛]). have been validated on other datasets, for example [11, Phonation Accuracy Specificity Sensitivity 12, 13, 27]. Note that we did not have access to these datasets to make a direct comparison. On closer ob- /a/ 69.6 61.82 75.00 servation, we observed that most of the misclassified /e/ 72.8 71.82 73.57 PD patients by our proposed approach belong to the /i/ 59.6 51.82 65.71 /o/ 66.4 60.00 71.43 class of 11 (of the 28) PD patients in the database who /u/ 62.4 52.73 70.00 were rated 0 (namely, having no speech problems) on the UPDRS test scale by the clinicians. This is consis- Average 66.16 59.64 71.14 tent with the fact that assigning a precise rating (PD or HC) for these boundary cases is challenging even namely, all inputs were 0 except the last residue in- for the trained experts which translates to misclassifi- put 𝑟5 [𝑛] in Figure 3. We perform a similar analysis by cation of these samples. training another model with inputs as signals ℎ4 [𝑛], ℎ5 [𝑛] and 𝑟5 [𝑛]. The results obtained by using these 5. Discussion approaches are reported in Tables 4 and 5. Clearly, the performance detoriates (it can be observed that for the EMD is a popular decomposition technique used to an- phonation /a/ there is drop in accuracy from 76% to alyze non-stationary and non-linear signals. The IMFs 69.6% and 64.4%) compared to when all the IMFs and can be used to extract features like instantaneous am- residue are used together. Further, we combine the re- plitude and frequency, marginal spectrum etc which sults obtained by using each of the individual phona- are relevant for pathological speech classification How- tions by taking a majority vote on the predictions ob- ever, in this paper we propose a deep architecture in tained by each of the 5 different models. The class con- the form of 1D-CNN which allows us to use raw IMF fusion matrix using this approach is presented in Table signal instead of having to select and extract explicit 6. We achieve an average accuracy of 85%, while the features useful for pathological speech classification. specificity and sensitivity values are 81.82% and 87.5% It is commonly assumed that neural networks are black respectively. boxes that are unable to interpretable results. We at- The use of IMFs signals as raw features in a 1D- tempt to explain the performance of the proposed ar- CNN classifier shows promise to be able to discrim- chitecture. inate PD and HC as can be seen in Table 2. To the For the 1D-CNN, we used a kernel size of 320 with best of our knowledge, a study on classification of PD a stride of 160. In the hindsight this is equivalent to extracting features from 20 ms of speech with a shift References of 10 ms which is common practice in speech process- ing owing to the non-stationary nature of the speech [1] M. Hoehn, M. Yahr, Parkinsonism: onset, pro- signal. Further, gression and mortality, Neurology 17 (1967) 427– 442. • The 1D-CNN network can be assumed to be a [2] O. Tysnes, A. Storstein, Epidemiol- feature extraction mechanism which, given a raw ogy of parkinson’s disease, Journal of IMF (or residue), extracts a set of discriminative Neural Transmission 124 (2017) 901–905. features. The number of filters may be inter- doi:10.1007/s00702-017-1686-y. preted as the number of features extracted from [3] A. K. Ho, R. Iansek, M. C., B. J.L., G. S., Speech a particular input signal. impairment in a large sample of patients with parkinson’s disease, Behavioral Neurology 11 • The extracted features from input signals ℎ1 [𝑛] (1998/1999) 131–137. - ℎ5 [𝑛] and 𝑟5 [𝑛] are then concatenated to form [4] S. Fahn, R. L. Elton, Unified parkinsons disease a feature vector. rating scale, Recent Developments in Parkin- • The Dense layers then act as a simple binary sons Disease,Macmillan Health Care Information classifier with the input as the concatenated fea- 2 (1987) 153–163. ture vector. [5] H. Cohen, Disorders of speech and language in parkinson’s disease, Mental and Behav- As one can observe, the use of raw IMFs mitigates ioral Dysfunction in Movement Disorders, M. A. the need to explicitly extract handcrafted features from Bédard,Y. Agid, A. D. Korczyn, P. Lesperance, and the IMFs, the 1D-CNN architecture learns discriminat- S. Chouinard, Eds. New York,NY, USA: Humana ing features from the raw signal to distinguish between Press, (2003) 125–134. PD and HC speech samples. For the purpose of decom- [6] A. K. Ho, R. Iansek, M. C., B. J.L., G. S., Mo- posing the signal, the speech sample is segmented into tor instability in parkinsonian speech intensity, fixed durations of 1 second each. This duration is long Neuropsychiatry, Neuropsychology and Behav- enough to capture the non-stationary aspect of speech ioral Neurology 14 (2001) 109–116. as well as the dynamics involved in the phonation of [7] M. A. Little , P. E. McSharry, E. J. Hunter, J. Spiel- ∗ vowel sounds. man, L. O. Ramig, Suitability of dysphonia mea- surements for telemonitoring of parkinson’s dis- ease, IEEE Transactions on Biomedical Engineer- 6. Conclusion ing 56 (2009) 1015–1022. [8] M. Novotný, J. Rusz, R. Čmejla, E. Růžička, Au- Parkinson’s Disease is a chronic neuro-degenerative tomatic evaluation of articulatory disorders in disease which is difficult to diagnose. The symptoms parkinson’s disease, IEEE/ACM Transactions of PD can be mistaken with natural aging, thereby mak- on Audio, Speech, and Language Processing 22 ing the diagnosis very very challenging. Tracking changes (2014) 1366–1378. in speech has proven to be a useful tool for establishing [9] J. Rusz, R. Cmejla, Quantitative acoustic mea- non-invasive approach to early detection of PD. In this surements for characterization of speech and work, we propose an efficient technique to discrim- voice disorders in early untreated parkinson’s inate PD and HC patients by analyzing their speech disease, Journal of Acoustical Society of America samples of sustained phonation. Traditional approaches 129 (2011) 350. have focused on experimenting with handcrafted spec- [10] J. Rusz, R. Cmejla, Imprecise vowel articulation tral and temporal features. In this paper, however, we as a potential early marker of parkinson’s dis- focus on machine learning the discriminating features ease: Effect of speaking task, Journal of Acousti- of speech associated with PD patients and healthy con- cal Society of America 134 (2013) 2171. trol from the raw IMF signals. We train a 1D-CNN [11] A. Rueda, J. Vásquez-Correa, C. D. Rios-Urrego, model using these raw IMFs to learn the discriminat- J. R. Orozco-Arroyave, S. Krishnan, E. Noeth, ing properties in the signals to classify PD and HC sub- Feature Representation of Pathophysiology of jects. Parkinsonian Dysarthria, in: Proc. Interspeech 2019, 2019, pp. 3048–3052. URL: http://dx.doi.org/ 10.21437/Interspeech.2019-2490. doi:10.21437/ Interspeech.2019-2490. [12] J. V. E. López, J. R. Orozco-Arroyave, G. Gosz- non-stationary time series analysis, Proceed- tolya, Assessing Parkinson’s Disease from ings of the Royal Society of London. Series Speech Using Fisher Vectors, in: Proc. Inter- A: Mathematical, Physical and Engineering Sci- speech 2019, 2019, pp. 3063–3067. URL: http://dx. ences 454 (1998) 903–995. doi:10.1098/rspa. doi.org/10.21437/Interspeech.2019-2217. doi:10. 1998.0193. 21437/Interspeech.2019-2217. [23] R. Sharma, L. Vignolo, G. Schlotthauer, [13] Y. Hauptman, R. Aloni-Lavi, I. Lapidot, T. Gure- M. Colominas, H. L. Rufiner, S. Prasanna, vich, Y. Manor, S. Naor, N. Diamant, I. Opher, Empirical mode decomposition for adaptive Identifying Distinctive Acoustic and Spectral am-fm analysis of speech: A review, Speech Features in Parkinson’s Disease, in: Proc. Inter- Communication 88 (2017) 39 – 64. URL: speech 2019, 2019, pp. 2498–2502. URL: http://dx. http://www.sciencedirect.com/science/article/ doi.org/10.21437/Interspeech.2019-2465. doi:10. pii/S0167639316302370. doi:https://doi. 21437/Interspeech.2019-2465. org/10.1016/j.specom.2016.12.004. [14] G. Dimauro, D. Caivano, V. Bevilacqua, F. Gi- [24] M. Kaleem, B. Ghoraani, A. Guergachi, S. Krish- rardi, V. Napoletano, Voxtester, software for nan, Pathological speech signal analysis and clas- digital evaluation of speech changes in parkin- sification using empirical mode decomposition, son disease, in: 2016 IEEE International Sympo- Med Biol Eng Comput 51 (2013). sium on Medical Measurements and Applications [25] F. Chollet, et al., Keras, https://keras.io, 2015. (MeMeA), 2016, pp. 1–6. [26] M. Abadi, et al., TensorFlow: Large-scale [15] G. Rilling, P. Flandrin, P. Goncalves, Empirical machine learning on heterogeneous systems, mode decomposition, fractional gaussian noise 2015. URL: https://www.tensorflow.org/, soft- and hurst exponent estimation, in: Proceedings. ware available from tensorflow.org. (ICASSP ’05). IEEE International Conference on [27] N. Garcia, J. C. Vásquez Correa, J. R. Orozco- Acoustics, Speech, and Signal Processing, 2005., Arroyave, E. Nöth, Multimodal i-vectors to detect volume 4, 2005, pp. iv/489–iv/492 Vol. 4. and evaluate parkinson’s disease, in: Proc. Inter- [16] A. Bouzid, N. Ellouze, Voiced speech analysis by speech 2018, 2018, pp. 2349–2353. URL: http://dx. empiricalmode decompositio, Advances in Non- doi.org/10.21437/Interspeech.2018-2295. doi:10. linear Speech Pro-cessing, Springer (2007). 21437/Interspeech.2018-2295. [17] B. Mijović, M. Silva, V. den B. R. H. Bergh, K. Alle- gaert, J. M. Aerts, D. Berckmans, V. S. Huffel, As- sessment of pain expression in infant cry signals using empirical mode decomposition., Methods Inf Med 49(05) (2010). [18] L. Xiang, X. L., Speech emotion recognition using novel hht-teo based features, Journal of Comput- ers 6 (2011). [19] R. Sharma, S. R. Mahadeva Prasanna, Character- izing glottal activity from speech using empiri- cal mode decomposition, in: 2015 Twenty First National Conference on Communications (NCC), 2015, pp. 1–6. [20] G. D. F. Girardi, Italian parkinson’s voice and speech, 2019. URL: http://dx.doi.org/10.21227/ aw6b-tg17. doi:10.21227/aw6b-tg17. [21] G. Dimauro, V. Di Nicola, V. Bevilacqua, D. Caivano, F. Girardi, Assessment of speech intelligibility in parkinson’s disease using a speech-to-text system, IEEE Access 5 (2017) 22199–22208. [22] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung, H. H. Liu, The empirical mode decomposi- tion and the hilbert spectrum for nonlinear and