Exploiting voice signal decomposition in expert system for Parkinson’s disease detection Aivaras Šimulis Evaldas Vaičiukynas Department of Information Systems Department of Information Systems Kaunas University of Technology Kaunas University of Technology Kaunas, Lithuania Kaunas, Lithuania e-mail: aivaras.simulis@ktu.edu e-mail: evaldas.vaiciukynas@ktu.lt Abstract—The goal of this research is the robust detection of approach for PD detection. Detailed review of the related work Parkinson’s disease by acoustic analysis of sustained voice can be found in [3]. recordings. Application of signal decomposition into intrinsic mode functions (IMFs) is investigated as a novel type of audio Most of the studies use audio features obtained from the features and a custom solution for decision-level fusion, employing entire record (global attributes) directly or calculated from short- statistical functionals to compress decisions from all IMFs. term frames (local attributes). Frame-based features usually are Proposed audio features are perceptual linear predictive cepstral compressed with the statistical functionals or the Gaussian coefficients (PLPCCs) estimated on the extracted components mixture model [3]. Some studies use large feature sets aiming to from one or several equally-spaced windows of audio signal. obtain comprehensive characterization of the voice signal, while Decompositions used are empirical mode decomposition (EMD) others rely on “clinically useful” set of measures or perform and variational mode decomposition (VMD). Random forest (RF) feature selection to collect a compact set of audio descriptors. In is used as a base detector as well as meta learner for decision-level this work, we investigate if applying signal decomposition of a fusion. Cost of log-likelihood ratio and equal error rate (EER) few short-term frames of a sustained voice recording and were used to measure goodness-of-detection. Baseline solution calculating perceptual linear predictive cepstral coefficients using PLPCCs from all frames was compared to several types of (PLPCCs) for extracted components can outperform a simple decision-level fusion (EMD, VMD, and EMD+VMD) using 1–3 decomposition-less approach of using PLPCCs from all frames. windows of various sizes (10–100 ms). Decomposition-based PLPCCs and EMD+VMD fusion from three 30 ms sized windows Small size of previously used databases and data samples resulted in detection performance with an average EER of 6.5% (less than 60 PD subjects) is a major deficiency resulting in and clearly outperformed the baseline solution of decomposition- unreliable estimates of reported performance. Incorrect less PLPCCs, having EER of 32.9%. Variable importance from assessment of accuracy is another common deficiency – studies meta RF found both decompositions as useful and variance- often lack conformity to leave-one-subject-out [3] or leave-one- related statistics of base RF decisions as the most important. individual-out [4] validation scheme. The need for such scheme arises when subject has several recordings, where all recordings Keywords— Parkinson’s disease; voice analysis; empirical mode of a subject should be included either in a training or in a testing decomposition; variational mode decomposition; PLPCC; random fold, but not in both. forest; medical decision support This work explores application of two techniques for voice I. INTRODUCTION signal decomposition into intrinsic mode functions, namely, Parkinson’s disease (PD) is the second most common empirical mode decomposition (EMD) and variational mode neurodegenerative disease after Alzheimer’s [1] and it is decomposition (VMD), introduces a novel decision-level fusion expected that the prevalence of PD is going to increase due to approach using EMD and VMD base detectors and addresses the population ageing. If it is detected early, the progression of PD aforementioned deficiencies by using relatively large database may be more researched using neuroprotective strategies, which and leave-one-subject-out validation for a task of PD detection. could result in increased life span and improved living The organization of this paper is as follows: voice recordings conditions. database is described in section 2, feature extraction in section Amongst many various symptoms, PD induces speech 3, detection methodology is presented by section 4, results of disorders, which can be observed as early as 5 years before the experiments are in section 5, while conclusions are drawn in diagnosis [2]. Investigations show that Parkinsonian vocal section 6. dysfunction can be characterized by: reduced vocal tract volume and reduced tongue flexibility, significantly narrower pitch II. FEATURE EXTRACTION range, longer pauses and smaller variations in pitch range, voice Audio features of choice were PLPCC, which were extracted intensity level, and articulation rate. Therefore, acoustic analysis either from all frames of recording and compressed using is considered by many researchers as an important non-invasive statistical functionals (baseline solution) or from a few frames after applying decomposition and extracting IMFs (researched Copyright © 2017 held by the authors 49 solution). Two approaches for decomposition were considered – automatically selects the number of intrinsic mode functions empirical mode decomposition (EMD) and variational mode (IMFs). According to the selected number of IMFs for EMD, the decomposition (VMD). Therefore, the researched solution same number of IMFs was used in the VMD algorithm. proposes EMD-PLPCC and VMD-PLPCC as audio descriptors. Variational mode decomposition [9] decomposes the signal A. Signal decomposition into various modes or intrinsic mode functions using calculus of Empirical mode decomposition is a recent method for non- variation. Each mode of the signal is assumed to have compact stationary signal analysis which has found extensive application frequency support around a central frequency. VMD tries to find in many areas of science and engineering [5, 6]. EMD out these central frequencies and intrinsic mode functions decomposes signals into functions of time, from which spectral centered on those frequencies concurrently using an information may also be obtained. Therefore, EMD lends itself optimization methodology called alternate direction method of well to extraction of temporal as well as spectral descriptors multipliers (ADMM13). The original formulation of the from the original signal [7]. Application of EMD to analysis of optimization problem is continuous in time domain. The speech signals is scarce. It is an adaptive technique that allows constrained formulation is given in [9]. decomposition of non-linear and non-stationary data into B. PLPCC intrinsic mode functions. An intrinsic mode function (IMF) satisfies the following two conditions [8]: The idea of a perceptual front end for determining linear prediction cepstral coefficients (PLPCCs) has been applied in 1. The number of maxima, which are strictly positive, and different ways to improve speech detection and coding, as well the number of minima, which are strictly negative, for as noise reduction, reverberation suppression, and echo each IMF, are either equal, or differ at most by one. cancellation. Linear prediction of a signal is done via autoregressive moving average (ARMA) modeling of the time 2. The mean value of the envelope, as defined by the series. In an ARMA model, the current sample is expressed as: maxima and the minima, for each IMF, is zero. 𝑃 𝑄 The technique for decomposition of the data into IMFs is known as sifting, a brief description of which follows: 𝑦[𝑛] = ∑ 𝑎𝑝 𝑥[𝑛 − 𝑝] + ∑ 𝑏𝑞 𝑦[𝑛 − 𝑞] (2) 𝑝=0 𝑞=1 1. For a given discrete time signal 𝑥[𝑛], all the local minima and maxima of 𝑥[𝑛] are identified. where 𝑥[𝑛] is the current input signal, and 𝑦[𝑛] is the current output. The perceptual linear prediction coefficients are created 2. The upper envelope 𝐸𝑈 is calculated by using a cubic from the linear prediction coefficients by performing perceptual spline to connect all the local maxima. Similarly, the processing before performing the autoregressive modeling [10]. lower envelope 𝐸𝐿 is calculated from the local minima. The main sequence of steps for PLPCC calculation is provided The upper and lower envelopes should cover all the by a diagram in Fig. 1. data in 𝑥[𝑛] between them. 3. The mean 𝐸𝑚𝑒𝑎𝑛 = 𝐸𝑈 + 𝐸𝐿 /2 of the upper and lower envelopes is calculated, and 𝑥[𝑛] is updated by subtracting the mean from it 𝑥[𝑛] ← 𝑥[𝑛] − 𝐸𝑚𝑒𝑎𝑛 . 4. The previous three steps are executed till 𝑥[𝑛] is reduced to an IMF 𝑐1 [𝑛], which conforms to the properties of IMFs described previously. The first IMF contains the highest oscillation frequencies found in the original data 𝑥[𝑛]. Fig. 1. Calculating perceptual linear prediction cepstral coefficients. 5. The first IMF 𝑐1 [𝑛] is subtracted from 𝑥[𝑛] to get the residue 𝑟1 [𝑛]. After this processing, cepstral conversion is performed. This is because linear prediction coefficients are very sensitive to 6. The residue 𝑟1 [𝑛] is now taken as the starting point frame synchronization and numerical error. In other words, the instead of 𝑥[𝑛], and the previously mentioned steps are linear prediction cepstral coefficients are much more stable than repeated to find all the IMFs 𝑐1 [𝑛] so that the final the linear prediction coefficients themselves [10]. residue 𝑟𝐾 either becomes a constant, a monotonic For a baseline solution, PLPCCs extracted from all frames function, or a function with a single maximum and were compressed using the following statistical functionals: minimum from which no further IMF can be extracted. min, max, mean, median, trimean, standard deviation, inter- Therefore, at the end of the decomposition, we can represent quartile range (IQR), lower quartile (Qlo), upper quartile (Qup), 𝑥[𝑛] as the sum of K IMFs and a residue 𝑟𝐾 : lower range (IQlo), upper range (IQup), skewness, and kurtosis. 𝐾 For a researched solution, only one or a few frames were 𝑥[𝑛] = ∑ 𝑐𝑖 [𝑛] + 𝑟𝐾 [𝑛] (1) used, and PLPCCs were calculated both for unprocessed frame 𝑖=1 and for each of IMFs after signal decomposition. Depending on The EMD algorithm variant used is ensemble empirical mode the number of IMFs the same number of feature vectors, decomposition with adaptive noise (CEEMDAN) [7], which 50 containing concatenation of original PLPCCs with IMF-based B. Decision-level Fusion PLPCCs, are constructed. Individual RFs were built independently for each III. VOICE DATABASE decomposition variant (EMD or VMD) and each frame location (center frame only or 3 frames in equally-spaced locations) and Voice database had 383 speakers (141 men and 242 women), decisions of these individual experts were combined in a meta- where each speaker was represented by 3 recordings (419 learner fashion. RF was used both as a base learner and as a meta recordings for men and 715 recordings for women) of sustained learner. This implies that outputs from RF models from the first voicing of vowel /a/, each at least 2 seconds in length. Speaker stage after compression of decisions for all IMFs of a single age range was from 16 to 82 for HC and from 39 to 85 for PD. recording using statistical functionals are treated as inputs Detailed summary of voice recording database is in Table I. (meta-features) for another RF in the second stage. Recordings were collected in a sound-proof chamber using acoustic cardioid microphone (AKG Perception 220) placed at a For the detection task, a decision from base RF is the distance of 10 cm from the mouth. Audio format was “.wav” difference between class posteriori. Given a trained RF, this (16-bit mono PCM with 44.1 kHz sampling frequency). HC difference or variant of soft decision is estimated as: voice subgroup encompassed healthy volunteer individuals who ∑𝐿𝑖=1 𝑓(𝑡𝑖 , 𝑥, 𝑞 = 2) ∑𝐿𝑖=1 𝑓(𝑡𝑖 , 𝑥, 𝑞 = 1) considered their voice as normal, had no complaints concerning 𝑑(𝑡1 , … , 𝑡𝐿 ) = − , (3) 𝐿 𝐿 their voice and no history of chronic laryngeal diseases or other long-lasting voice disorders. Voices of these individuals were where 𝑥 is the object being classified, 𝐿 is the number of trees also confirmed as healthy by clinical specialists. Furthermore, 𝑡2 , . . . , 𝑡𝐿 in the RF for which observation 𝑥 is OOB, 𝑞 is a class no pathological alterations in the larynx of the subjects from HC label (label number 1 corresponds to HC and 2 to PD), and voice subgroup were found during video laryngostroboscopy. 𝑓(𝑡𝑖 , 𝑥, 𝑞) stands for the 𝑞th class frequency in the leaf node, into which 𝑥 falls in the 𝑖th tree 𝑡𝑖 of the forest: TABLE I. SUMMARY OF VOICE RECORDINGS DATABASE 𝑛(𝑡𝑖 , 𝑥, 𝑞) 𝑓(𝑡𝑖 , 𝑥, 𝑞) = , (4) Recordings Parkinson (PD) Healthy (HC) Total ∑𝑄 𝑗=1 𝑛(𝑡𝑖 , 𝑥, 𝑞) Men 36 (107) 105 (312) 141 (419) where 𝑄 is the number of classes and 𝑛(𝑡𝑖 𝑥, 𝑞) is the number of Women 39 (116) 203 (599) 242 (715) training data from class 𝑞 and falling into the same leaf node of Total: 75 (223) 308 (911) 383 (1134) 𝑡𝑖 as 𝑥. IV. DETECTION METHODOLOGY For a baseline solution, there was no need of decision-level fusion and a single base RF was enough. For a researched A. Random Forest solution, decision-level fusion used decisions from the Random forest (RF) [11] is a well-known pattern recognition decomposition-based base RFs. Base RF was constructed using algorithm, suitable for a detection task. RF is a committee of all components (IMFs) resulting from EMD or VMD in a decision trees, where the final decision is obtained by majority specific frame. The number of decisions from a base RF voting. The core idea of RF is to combine many (𝐵 in total) corresponded to the number of extracted components. This decision trees, built using different bootstrap samples of the varied number of base decisions was compressed into meta- original data set, and a random subset (of predetermined size 𝑞) features using the following statistical functionals: min, max, of features 𝑥 1 , … , 𝑥 𝑝 . RF is known to be robust against over- mean, median, trimean, standard deviation, inter-quartile range fitting and as the number of trees increases, the generalization (IQR), lower quartile (Qlo), upper quartile (Qup), lower range error converges to a limit [11]. For our experiments 𝐵 was set to (IQlo), upper range (IQup), skewness, and kurtosis. 1 5000, several specific values of q (√𝑝, 2 ∙ √𝑝, ∙ √𝑝) were Meta-features in fusion RF were also investigated by 2 tested and the best performing 𝑞 setting retained. performing permutation-based variable importance analysis using mean decrease in accuracy as the variable importance The generalization performance of RF was evaluated using measure. Values of each meta-feature are permuted several internal out-of-bag (OOB) validation, where each observation is times and the mean difference in fusion RF performance on classified only by the trees which did not have this observation OOB data is estimated. in bootstrap sample during construction. It is well known that OOB validation provides an unbiased estimate of a test set error, C. Assesing Detection similar to leave-one-out scheme. Because of the “repeated To evaluate the goodness of detection, detector’s scores for measures” aspect in voice data, where each subject is OOB data were used. Votes of RF were converted to a proper represented by several recordings of sustained vowel, sampling score vector by normalizing votes for a specific class through part of the RF algorithm [12] had to be modified to ensure that division by the total number of times the case was OOB, as in all recordings of each subject are either included in a bootstrap formula (3). A quick way to compare detectors is the equal error sample or left aside as OOB. Such modification corresponds to rate (EER). The cost of log-likelihood-ratio (Cllr) is a leave-one-subject-out scheme, which helps to avoid speaker comprehensive detection metric, used here as the main criterion detection intermingling with pathology detection. Additionally, for model selection. The log-likelihood-ratio is the logarithm of RF setting of stratified sampling was configured to preserve the ratio between the likelihood that the target (PD) produced the class ratio and gender balance of the full dataset in each drawn signal and the likelihood that a non-target (HC) produced the bootstrap sample. signal. EER and Cllr measures were estimated using the ROC 51 convex hull method, available in the BOSARIS toolkit [13]. A 1 0.456 13.92 0.100 2.65 0.085 2.22 well-calibrated and useful detector should have Cllr < 1 and EER 0.03 2 0.306 10.07 0.503 14.95 0.264 8.27 3 0.342 11.32 0.115 3.33 0.079 2.22 < 50 %. 1 0.419 11.77 0.586 19.08 0.359 11.51 0.04 2 0.379 11.69 0.541 17.01 0.313 10.05 V. EXPERIMENTS 3 0.451 13.70 0.100 2.87 0.010 2.90 A. Experimental Setup 1 0.355 10.71 0.583 19.02 0.303 8.53 0.05 2 0.347 10.92 0.539 16.67 0.300 9.73 For our frame-based features several window sizes were 3 0.216 5.80 0.449 13.36 0.180 5.37 tested: 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 milliseconds. 1 0.316 10.08 0.555 17.16 0.266 7.88 For a researched solution, different quantity of windows were 0.06 2 0.328 10.07 0.441 14.23 0.265 8.41 extracted and detection performance compared: 1 window from 3 0.244 7.39 0.419 13.24 0.220 6.54 1 0.405 11.84 0.561 17.15 0.349 10.61 the center of recording versus 2 or 3 windows placed at equally- 0.07 2 0.450 14.68 0.474 15.48 0.380 12.73 spaced locations to evenly cover all recording. For each setting 3 0.236 6.07 0.356 12.15 0.190 5.40 of window size and quantity, three decision-level fusion variants 1 0.379 12.09 0.529 16.85 0.309 9.93 were tested: EMD, VMD and EMD+VMD. The number 0.08 2 0.323 11.27 0.505 15.91 0.288 9.78 PLPCCs used was 12 in both baseline and researched solutions. 3 0.297 9.87 0.448 14.11 0.275 9.13 1 0.443 13.34 0.591 19.59 0.368 11.24 From the results of initial experiments, the single best 0.09 2 0.349 10.56 0.504 16.10 0.307 9.84 window size was selected. Sensitivity analysis was then 3 0.317 10.11 0.458 13.90 0.278 9.28 performed by repeating decomposition for each recording 5 1 0.363 10.78 0.495 16.21 0.252 7.66 times. Each of the 5 different collections of IMFs, converted into 0.10 2 0.390 12.17 0.509 15.64 0.359 12.07 3 0.254 7.21 0.109 3.86 0.074 2.73 EMD-PLPCCs or VMD-PLPCCs, were used in base RF and construction of base RF repeated 5 times. Finally, each of 25 results from decomposition and base RF runs was fused using The summary of results from Table II for 1 and 3 windows meta RF and the fusion was repeated 5 times. All runs resulted is illustrated in Fig 2–3. Both settings of window quantity were in 125 detection performance measures (EER and Cllr) and able to provide the lowest EER of 2.2% with 30 ms window size, statistical testing for equality of central tendencies (mean and but performance when using 3 windows appear to be more stable median) was performed to compare the setting of using 1 central and doesn’t fluctuate that much with respect to window sizes. window only versus using 3 windows at equally-spaced locations. Parametric independent samples t-test was used to compare means. Non-parametric Wilcoxon rank-sum test (Mann-Whitney U-test) was used to compare medians. B. Results From the results of initial experiment with various window sizes in Table II we decide that 30 ms window size, 1 or 3 windows and fusion of EMD+VMD could be a recommended setting for a researched solution. Both window sizes had same EER of 2.22%, but Cllr was slightly better for 3 windows (0.079) than 1 window (0.085). VMD consistently outperformed EMD, but the best overall performance was found when fusing EMD+VMD. The best window size was difficult to identify, because the results change from one window size to another sometimes rather erratically. This could be due to stochastic Fig. 2. EMD+VMD detection performance by EER and Cllr using 1 window. nature of EMD and VMD, where optimization results in sub- optimal solutions and not identical IMFs when run repeatedly. TABLE II. DETECTION PERFORMANCE USING VARIOUS WINDOW SIZES, WINDOW QUANTITIES AND TYPE OF DECOMPOSITION FOR FUSION BY META RF. Type of decomposition used in decision-level fusion Size Quantity EMD VMD EMD+VMD (s) Cllr EER Cllr EER Cllr EER 1 0.718 26.23 0.237 6.65 0.215 6.10 0.01 2 0.617 20.83 0.754 26.20 0.565 19.12 3 0.115 3.03 0.245 6.71 0.079 2.24 1 0.448 13.96 0.682 23.03 0.375 12.42 0.02 2 0.385 13.08 0.610 20.32 0.336 10.45 Fig. 3. EMD+VMD detection performance by EER and Cllr using 3 windows. 3 0.321 9.70 0.499 16.66 0.255 7.72 52 Results of the baseline solution (see Fig. 4), when extracting Quartiles Median Mean PLPCCs from all frames (without any decomposition) and 0.4 compressing with statistical functionals, were rather stable irrespective of the window size. The best detection performance Distribution of Cllr was 32.9% by EER and 0.895 by Cllr. 0.35 0.3 0.25 1 Number of windows 3 Fig. 6. Distribution of Cllr after 125 repetitions of the detection task, visualized by boxplot: 1 (left) vs 3 windows (right). 14 Quartiles Median Mean 13 12 Distribution of EER Fig. 4. Detection performance by EER and Cllr for the baseline solution. 11 10 Variable importance from the meta RF when using 3 windows and EMD+VMD type of decision-level fusion is 9 shown in Fig 5. Both EMD and VMD types of decomposition 8 are useful, but from statistical functionals variance-related 7 measures (standard deviation and inter-quartile range with its 6 lower part) of base decisions appear to be the most important for the detection. 5 1 Number of windows 3 Fig. 7. Distribution of EER after 125 repetitions of the detection task, visualized by boxplot: 1 (left) vs 3 windows (right). TABLE III. TEST OF CLLR FOR NORMALITY OF 1 AND 3 WINDOWS Test name Test statistic p-value 1 window 3 windows 1 window 3 windows Doornik- 3.28061e- 12.3997 34.4653 0.00202975 Hansen 008 Shapiro-Wilk 3.66809e- 2.39331e- 0.926049 0.863831 W 006 009 Lilliefors 0.119188 0.197622 ~= 0 ~= 0 Jarque-Bera 7.1929 13.3053 0.0274209 0.00129062 TABLE IV. TEST OF EER FOR NORMALITY OF 1 AND 3 WINDOWS Fig. 5. Variable importance from fusion RF (EMD+VMD) using 3 windows. Test name Test statistic p-value 1 window 3 windows 1 window 3 windows Sensitivity analysis was performed by repeating detection Doornik- 3.72794e- 11.2948 52.6303 0.00352675 Hansen 012 task using 30 ms window size and EMD+VMD fusion. Shapiro-Wilk 2.69682e- Distribution of the resulting Cllr and EER measures is illustrated 0.952697 0.840665 0.00025031 W 010 by boxplots in Fig 6 – 7. Lilliefors 0.0835316 0.204496 ~= 0.03 ~= 0 Jarque-Bera 6.0258 15.6308 0.049149 0.00040348 Due to the lack of normality in Cllr and EER measures (see Tables III – IV), non-parametric statistical testing was performed. Medians of Cllr and EER were compared between 1 window and 3 windows setting using Wilcoxon rank-sum test and results are provided in Table V. Statistically significant difference with 99% confidence (p-value < 0.01) was indicated in favor of using 3 windows. 53 TABLE V. HYPOTHESIS OF EQUAL CENTRAL TENDENCIES TESTING FOR ACKNOWLEDGMENT CLLR AND EER USING WILCOXON RANK-SUM TEST Voice database was collected at Lithuanian University of Rank-sum test results for Cllr Null hypothesis The two medians are equal Health Sciences (LUHS) under a grant (No. MIP-075/2015) n1 = 125, n2 = 125 from the Research Council of Lithuania. Authors would like to w (sum of ranks, sample 1) = 23500 thank LUHS specialists – otorhinolaryngologist Evaldas Cllr statistics z = (23500 - 15687.5) / 571.684 = 13.6658 Padervinskis and neurologist Jolita Čičelienė. P(Z > 13.6658) = 0, Two-tailed p-value = 0 n1 = 125, n2 = 125 REFERENCES w (sum of ranks, sample 1) = 23500 EER statistics z = (23500 - 15687.5) / 571.684 = 13.6658 P(Z > 13.6658) = 0, Two-tailed p-value = 0 [1] M. C. de Rijk, L. J. Launer, K. Berger, M. M. B. Breteler, J. F. Dartigues, M. Baldereschi, L. Fratiglioni, A. Lobo, J. M. Martınez-Lage, C. VI. CONCLUSIONS Trenkwalder, A. Hofman, “Prevalence of Parkinson’s disease in Europe: a collaborative study of population-based cohorts,” Neurology, vol. 54(11 Decomposition-based PLPCC features, namely, EMD- Supply 5), S21-S23, 2000. PLPCC and VMD-PLPCC, were found useful for building [2] B. Harela, M. Cannizzaro, P. J. Snyder, “Variability in fundamental expert system for Parkinson’s detection from sustained voice. frequency during speech in prodromal and incipient Parkinson’s disease: The baseline solution without decomposition and decision-level A longitudinal case study,” Brain and Cognition, vol. 56 , pp. 24-29, 2004. fusion, where PLPCCs were obtained from all frames and [3] J. R. Orozco-Arroyave, F. Honig, J. D. Arias-Londono, J. F. Vargas- Bonilla, K. Daqrouq, S. Skodda, J. Rusz, E. Noth, “Automatic detection compressed using statistical functionals, resulted in the lowest of Parkinson’s disease in running speech spoken in three different EER of 32.90% for 30 ms windows size. The researched solution languages,” J. Acoust. Soc. Am., vol. 139(1), pp. 481-500, 2016. using decision-level fusion of PLPCCs from EMD and VMD, [4] C. O. Sakar, O. Kursun, “Telediagnosis of Parkinson’s disease using obtained from 3 windows at equally-spaced locations, resulted measurements of dysphonia,” J. Med. Syst., vol. 34(4), pp. 591-599, 2010. in EER as low as 2.22% for 30 ms window size. [5] M. F. Kaleem, L. Sugavaneswaran, A. Guergachi, S. Krishnan, “Application of empirical mode decomposition and Teager energy Sensitivity analysis was performed by fixing window size at operator to EEG signals for mental task classification,” Proceedings of the 30 ms and repeating detection task 125 times to choose between International Conference of the IEEE Engineering in Medicine and fusion of components from 1 or 3 windows. Fusion of Biology (EMBC), pp. 4590-4593, 2010. EMD+VMD from 1 window provided an average EER of [6] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman, S. Van Huffel, 12.13%, whereas fusion of EMD+VMD from 3 windows “Source separation from single-channel recordings by combining empirical mode decomposition and independent component analysis,” provided an average EER of 6.48%. Detection performance IEEE Trans. Biomed. Eng., vol. 57(9), pp. 2188-2196, 2010. when using decision-level fusion of decomposition results from [7] M. E. Torres, M. A. Colominas, G. Schlotthauer, “A complete Ensemble 3 evenly located windows was better than using 1 central Empirical Mode decomposition with adaptive noise,” Proceedings of the window and the difference in detection performance was IEEE International Conference on Acoustics, Speech and Signal statistically significant, as indicated by statistical tests. Processing (ICASSP-11), pp. 4144-4147, 2011. [8] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. Main challenge remains lack of numerical stability in Yen, C. C. Tun, H. H. Liu, “The empirical mode decomposition and the decomposition output, where extracted IMFs were not identical Hilbert spectrum for nonlinear and non-stationary time series analysis,” between runs. This limitation leaves the choice of the window Proc. R. Soc. Lond., A 454(1971), pp. 903-995, 1998. size questionable, but future work could exploit a multitude of [9] K. Dragomiretskiy, D. Zosso, “Variational Mode Decomposition,” IEEE sub-optimal IMFs and use them for boosting data amounts when Trans. Signal Process, vol. 62(3), pp. 531-544, 2014. building detectors. Not only the first stage detectors could be [10] H. Hermansky, “Perceptual Linear Predictive (PLP) Analysis of Speech,” built on more data, but also statistical functionals when J. Acoust. Soc. Am., vol. 87(4), pp. 1738-1752, 1990. compressing decisions of base detectors for the second stage [11] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5-32, 2001. would certainly benefit from increased number of decisions – [12] A. Jaiantilal, “Random forest (regression, classification and clustering) implementation for Matlab (and standalone),” 2012. statistics for decision-level fusion would be obtained not from http://code.google.com/archive/p/randomforest-matlab/. 6–12 components, as in a current solution, but from several [13] N. Brummer, E. de Villiers, “The BOSARIS toolkit: Theory, algorithms times more. We speculate that meta-leaner in the second stage and code for surviving the new DCF”. arXiv 1304(2865v1), pp. 1–23, could achieve robustness by using all IMFs of several repeated Presented at the NIST SRE 2011 Analysis Workshop, Atlanta, December decomposition runs. 2011. http://sites.google.com/site/bosaristoolkit. 54