<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Two-Step Framework for Parkinson's Disease Classification: Using Multiple One-Way ANOVA on Speech Features and Decision Trees</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>wikiHow</string-name>
          <email>gaurang@wikihow.com</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rensselaer Polytechnic Institute</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a two-step classification framework to diagnose Parkinson's Disease (PD) using speech samples. At the first stage, multiple one-way ANalysis Of VAriance (ANOVA) is used on independent subsets of vocal features to extract the best set of features from each speech processing algorithm. These extracted feature subsets are then merged with other baseline vocal features (shimmer, jitter, pitch, harmonicity, vocal fold, and fundamental frequency parameters) to form the training feature set. In the second step, this combined training set is used to train an extreme gradient boosting (XGBoost) classification model, which is a decision tree based algorithm. The overall model performance was scored and evaluated using the Receiver Operating Characteristic Area Under Curve (ROC AUC), F-Measure, Matthews Correlation Coefficient (MCC), and accuracy. It was then compared with benchmarked statistical classifiers and other studies that use different combinations of features from this PD dataset. We apply one-way ANOVA on different speech feature sets to extract the best features without losing useful vocal information. Our classification performance outperforms state-of-theart PD classification models that use generic feature selection methods or use only one or more of the vocal feature subsets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        PD is one of the most common diseases of the motor
system degeneration that results from the loss of cells in
various parts of the brain. PD’s primary symptoms are tremor,
slow movement, speech disorder, impaired balance, and gait
problems. There are no diagnostic tests or biomarkers for PD
diagnosis because the symptoms resemble the ones observed
due to other diseases. Physicians use methods like MRI,
ultrasound, blood tests to eliminate other conditions with
similar symptoms. Research has also been done to detect PD
using various motor and non-motor symptoms
        <xref ref-type="bibr" rid="ref25">(Tolosa et al.
2009)</xref>
        . However, there is no standard way for PD diagnosis.
      </p>
      <p>
        PD Diagnosis has typically involved measuring the
severity of the symptoms using non-invasive medical techniques.
Since approximately 90% of PD patients suffer from speech
disorders, analyzing speech samples to study vocal
impairment is considered as the most common technique for
PD diagnosis
        <xref ref-type="bibr" rid="ref23">(Shahbakhi, Far, and Tahami 2014)</xref>
        . The
extent of vocal impairment is typically assessed using
sustained vowel phonations
        <xref ref-type="bibr" rid="ref12">(Little et al. 2008)</xref>
        . Sustained
vowel phonations don’t capture all morphological or
lexical speech features, but research shows that they are
sufficient for distinguishing between PD subjects and healthy
controls
        <xref ref-type="bibr" rid="ref5">(Gu¨ ru¨ ler 2017)</xref>
        . Most PD classification studies
using speech features have been focused on jitter, shimmer,
and signal-to-noise ratio. Recent studies have also used other
vocal features like fundamental frequency parameters,
MelFrequency Cepstral Coefficients (MFCCs), harmonicity
features, Wavelet Transform (WT)-based features, and Tunable
Q-factor Wavelet Transform (TQWT)-based features to
better understand speech deterioration. TQWT was first used in
2019 for PD classification and was shown to perform
better than other vocal features for PD diagnosis
        <xref ref-type="bibr" rid="ref22">(Sakar et al.
2019)</xref>
        . The performance of PD classification models
depends directly on the selection of vocal features used for
training them.
      </p>
      <p>Past studies have used different combinations of the
aforementioned features to train classifiers without any focus on
extracting useful features from different types of vocal
features. This study proposes a novel two-step classification
framework for PD diagnosis. The first step uses multiple
one-way ANOVAs to extract vocal features from MFCCs,
WTs, and TQWTs separately. Extracted feature sets are
merged with other baseline vocal features to form the
final training set. In the second step, a decision-tree based
classifier is trained on this training set to make predictions.
To the best of our knowledge, this is the first PD
classification study that employs a multiple ANOVA strategy to
extract the best vocal features from TQWT, MFCCs, and WTs,
and combine all of them with standard baseline features like
jitter, skimmer, etc., to generate an extensive training set.
Our study shows that extracting features separate from each
other prevents not only loss of useful vocal/ signal
information but also addresses the high-dimensionality nature of the
dataset. Using a decision-tree based classifier on extracted
features also handles any class imbalance without the need
of oversampling or under-sampling the dataset.
Classification results obtained on the public dataset show that our
proposed two-step framework outperforms current
state-ofthe-art models that use just one or more of the vocal feature
subsets without extracting the best features from individual
algorithms.</p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review</title>
      <p>
        There are no laboratory tests or biomarkers for the diagnosis
of PD
        <xref ref-type="bibr" rid="ref1">(Cova and Priori 2018)</xref>
        . Consequently, there has been
significant research in measuring the severity of symptoms
to diagnose PD.
        <xref ref-type="bibr" rid="ref27">Tseng et al. (2014)</xref>
        have shown multiple
eye-tracking methods for PD diagnosis. Jansson et al. (2015)
proposed two approaches by using stochastic anomaly
detection in eye-tracking data. There have also been multiple
studies that use gait and tremor measures to diagnose PD
        <xref ref-type="bibr" rid="ref10 ref11 ref14 ref20">(Lee and Lim 2012; Manap, Tahir, and Yassin 2011)</xref>
        .
      </p>
      <p>
        Analyzing voice samples and deterioration has shown
great potential in the advancement of PD diagnosis
        <xref ref-type="bibr" rid="ref10 ref20">(Ramani
and Sivagami 2011)</xref>
        . Vocal impairment has also been shown
to be among the earliest symptoms of PD, detectable up to
five years before clinical diagnosis
        <xref ref-type="bibr" rid="ref15">(Oung et al. 2015)</xref>
        . This
aligns with clinical evidence, which shows that most PD
patients exhibit vocal disorders. These studies reinforce the
notion that speech samples reflect disease status after
extracting the necessary information from the vowel phonations.
      </p>
      <p>
        There have been multiple studies on PD classification
techniques using vocal features.
        <xref ref-type="bibr" rid="ref5">Gu¨ru¨ler (2017)</xref>
        proposed
a system using a complex-valued artificial neural
network with k-means clustering and achieved an accuracy of
99.52%.
        <xref ref-type="bibr" rid="ref2">Das (2010)</xref>
        also used neural networks and
demonstrated an accuracy of 92.9%.
        <xref ref-type="bibr" rid="ref17">Peker, Sen, and Delen (2015</xref>
        )
achieved a 98.1% accuracy using complex-valued neural
networks with minimum Redundancy Maximum Relevance
(mRMR) feature selection.
        <xref ref-type="bibr" rid="ref3">Gil and Manuel (2009)</xref>
        achieved
an accuracy of 90% using a multilayer perceptron and
Support Vector Machines (SVM). Karimi Rouzbahani and Daliri
(2011) used a K-Nearest Neighbor (KNN) classifier and
achieved an accuracy of 93.82%.
        <xref ref-type="bibr" rid="ref6">Hazan et al. (2012)</xref>
        proposed using a country-specific sample of the training data
and achieved a 94% accuracy. Many of these studies use
a public dataset consisting of 195 vocal measurements
belonging to 23 PD and 8 healthy controls
        <xref ref-type="bibr" rid="ref12">(Little et al. 2008)</xref>
        .
Another publicly available dataset used in the
aforementioned studies consists of multiple speech recordings of 20
PD and 20 healthy controls
        <xref ref-type="bibr" rid="ref21">(Sakar et al. 2013)</xref>
        . Since most
of the proposed PD classifiers perform analysis on one of
these datasets, the extracted vocal features from speech
samples largely overlap. Although high classification rates have
been reported in these studies, both of these datasets are
extremely small. Models trained on these datasets are prone to
overfitting to a very small sample of features.
        <xref ref-type="bibr" rid="ref22">Sakar et al.
(2019)</xref>
        have shown that the cross-validation methods used
in these studies cause biases since the number of controls in
them were minimal.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref22">Sakar et al. (2019)</xref>
        collected 3 voice recordings each from
252 subjects to build a much larger dataset for PD
classification. Apart from the baseline vocal features used in
previous studies, they also extracted MFCCs, WTs, and for the
first time, TQWT-based features too. They reported a
highest classification accuracy of 86% by using a SVM-Radial
Basis Function (SVM-RBF) classifier and just the MFCCs
feature set. By only using the TQWT-based features, they
reported the highest individual classifier accuracy of 85%
with an F-measure of 0.84 using a multilayer perceptron
classifier. They also demonstrated using a mRMR feature
selection algorithm on the entire feature set to select the
top50 features. The mRMR top-50 feature selection improved
their classification accuracy to 86% with an F-measure of
0.84 using an SVM-RBF classifier. This was the first study
that used TQWT-based features for PD classification. It was
also the first study to report an improvement in diagnostic
accuracy by combining all features and selecting 50-best by
using a feature selection algorithm. They found that MFCCs
and TQWT contain complementary information, and
combining them improves the classification performance.
      </p>
      <p>
        Since then, there have been a few studies that have
proposed different classification methods using TQWT-based
features and this larger dataset built by
        <xref ref-type="bibr" rid="ref22">Sakar et al. (2019)</xref>
        .
        <xref ref-type="bibr" rid="ref4">Gunduz (2019)</xref>
        proposed two frameworks using
Convolutional Neural Networks (CNN). The first framework
combines all features and inputs it to a 9-layer CNN. The second
framework passes the feature sets to the parallel input
layers connected to the convolution layers in the CNN. They
achieved an accuracy of 84.9% by using a combination of
TQWT and baseline features. This was improved to 86.9%
by using triple feature sets that used TQWT, WT, and
baseline features. They reported that the TQWT features had the
best feature performance metrics among all classifiers.
      </p>
      <p>Solana-Lavalle, Gala´n-Herna´ndez, and Rosas-Romero
(2020) proposed using a Wrapper Feature Selection method
along with an SVM classifier and obtained a classification
accuracy of 94.7% on the larger dataset. The feature
selection method used in this study did not account for the
biological and vocal features in the dataset separately and
instead selected the best K features suited to the used classifier.
Only 8 to 20 features are selected from 754 vocal features.
This leads to loss of valuable acoustic and signal
information, especially from WT and TQWT-based features – since
they are extensive WT techniques that quantify frequency
deviations in speech signals and contain 10+ original
features each. Wrapper feature selection methods try to find the
best set of features suited to a specific learning algorithm
by evaluating all combinations of features against the
evaluation/ performance metric, and thus, there is also a high
chance of over-fitting to the training data.</p>
      <p>
        <xref ref-type="bibr" rid="ref18">Polat (2019)</xref>
        proposed a hybrid approach using a
combination of Synthetic Minority Over-Sampling Technique
(SMOTE) and a Random Forest Classifier (RFC). They
achieved an accuracy of 87.037% without SMOTE and a
higher accuracy of 94.89% by over-sampling the minority
class (healthy control) and then training an RFC. By
oversampling, this study changed the original dataset to
balance the classes. Over-sampling also increases the likelihood
of overfitting because it replicates the oversampled class
datapoints. It also does not consider neighboring examples
can be from different classes. Studies on class-imbalanced
data have shown that SMOTE is not beneficial for
highdimensional datasets
        <xref ref-type="bibr" rid="ref13 ref8">(Maldonado, Lo´pez, and Vairetti 2019;
Joseph 2020)</xref>
        . This leads to overlap of classes and additional
noise in an already high-dimensional dataset
        <xref ref-type="bibr" rid="ref8">(Joseph 2020)</xref>
        .
      </p>
      <p>Compared to the previous work, our work is one of the
first studies to demonstrate an improved speech feature
selection methodology and a decision-tree based robust
classifier that handles class imbalance without having to modify
the original dataset by over-sampling or under-sampling.</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>
        The dataset we used for the analysis was gathered at the
Department of Neurology in Cerrahpasa Faculty of Medicine,
Istanbul University
        <xref ref-type="bibr" rid="ref22">(Sakar et al. 2019)</xref>
        . It contains the
information of 188 patients with PD – 107 men and 81
women, and 64 healthy controls (23 men and 41 women)
with ages varying between 41 and 82. The researchers set
the microphone to 44.1 kHz, and the sustained phonation of
the vowel “ahh. . . ” was collected from each subject with
three repetitions. These phonations were fed into the Praat
acoustic analysis software to extract information about jitter,
glow, vocal fold, fundamental frequency, harmonicity,
Recurrence Period Density Entropy (RPDE), Detrended
Fluctuation Analysis (DFA), and Pitch Period Entropy (PPE)
from the signal. In the gathered dataset, these fundamental
vocal features, along with gender, are called baseline
features.
      </p>
      <p>
        MFCCs of a sound signal separate the impact of the vocal
cords (source) and vocal tract (filter) in the signal
        <xref ref-type="bibr" rid="ref19">(Poorjam
2018)</xref>
        . This helps detect deterioration in the movement of
articulators like the tongue and lips, which are affected by PD.
Higher-order MFCCs represent greater levels of spectral
detail. Typically, 10 to 20 MFCCs are used for speech analysis.
In this dataset, there are 13 original MFCCs and 71 derived
features that are formed with mean and standard deviation
of the original signals, addition to log-energy of the signal,
and their 1st and 2nd derivatives
        <xref ref-type="bibr" rid="ref22">(Sakar et al. 2019)</xref>
        .
      </p>
      <p>
        WT is used to analyze signals in terms of wavelets, time,
and frequency domain limited functions to detect regional
fluctuations. WT features of the basic frequency of speech
signal (F0) have been used for PD diagnosis
        <xref ref-type="bibr" rid="ref4">(Gunduz 2019)</xref>
        .
It captures the amount of deviation in speech samples and
thus detects any distortions in vowel phonations. 10-level
discrete WT is applied to signals for extracting WT-based
features obtained from F0 and its log transformation. This
results in 182 features, including the log energy entropy and
Teager-Kaiser energy of both the approximation and detailed
coefficients
        <xref ref-type="bibr" rid="ref22">(Sakar et al. 2019)</xref>
        .
      </p>
      <p>
        TQWT is a discrete-time wave transform, like WT.
TQWT uses 3 tunable parameters (Q, J , and r) to tune
it based on the behavior of the speech signal
        <xref ref-type="bibr" rid="ref22">(Sakar et al.
2019)</xref>
        . TQWT has been recently used in PD studies since it
can detect distortion in vocal fold vibrations. TQWT
parameters were set by considering the time domain
characteristics of the speech signals. The tunable Q-factor parameter is
related to the number of oscillations in the signals. A high
Q value is selected for signals with high oscillations in the
time domain. The parameter J comes from the end of the
decomposition stage of the transformation. There would be
J levels and J + 1 sub-bands coming from J high-pass
filters and one final low-pass filter. The redundancy
parameter, r, controls the excessive ringing to localize the wavelet
without affecting its shape
        <xref ref-type="bibr" rid="ref22">(Sakar et al. 2019)</xref>
        . At first, the
value of the Q parameter is defined to control the oscillatory
behavior of wavelets. The r parameter value was set to be
equal or greater than 3 to prevent the undesired ringings in
wavelets. To find out the best accuracy values of the
different Q r pairs, several levels (J ) were searched for in the
specified intervals, and in total, 432 TQWT features are
extracted
        <xref ref-type="bibr" rid="ref22">(Sakar et al. 2019)</xref>
        . Table 1 describes the 4 feature
subsets in this dataset and the number of features in each.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>PD classification is treated as a binary classification task in
which the framework takes an input of extracted speech
features and predicts a class (PD/ No PD). Figure 1 illustrates
the end-to-end classification framework for PD diagnosis.
The dataset contains 752 features in 4 feature sets: baseline
features, MFCCs, WT, and TQWT. The drawback of using
MFCCs, WT, and TQWT together is the ‘curse of the
dimensionality’ problem. High-dimensional datasets lead to
overfitting, hinders useful vocal information in the dataset, and
leads to computational instability. Extracting a meaningful
set of features from each feature set is important to reduce
the dimensionality of the feature set while still ensuring that
all useful vocal features are retained. This will also reduce
the computational complexity of the classifier. We propose
using the one-way ANOVA selection schemes to extract the
best performing training features from MFCCs, WT, and
TQWT feature-sets. The selected features from each method
are merged with the baseline features. This merged feature
set serves as the training data for the classifier. We then
train an optimized XGBoost classifier on the training data
and evaluate its performance against past studies and
benchmarked statistical classification models.</p>
      <p>ANOVA Feature Selection
ANOVA is a statistical hypothesis test used to determine
whether the means from two or more samples of data come
from the same distribution or not. It is usually used in
problems involving numerical inputs and a classification target
variable. There are two types of ANOVA: one-way ANOVA
and two-way ANOVA. One-way ANOVA only involves one
independent variable, while two-way ANOVA compares two
independent variables.</p>
      <p>To find how well each speech feature discriminates
between the two output classes, we use a one-way ANOVA
F-test. F-tests are a class of statistical tests that calculate
the ratio between variances values. ANOVA tests the
following null hypothesis (H0): there is no difference between
features, and the features have the same mean value. The
alternate hypothesis (H1) is that there is a difference
between the means and the groups (feature variances are not
equal). The ANOVA F-test produces an F-score based on
the variance ratio calculated among the means to the
variance within the group. Group means drawn from features
with the same or highly similar mean values will have lower
variance between the group and have a lower F-score. A high
F-score implies that features have different mean values and
can discriminate between the dependent variable categories
better. The results of this test can be used for feature
selection where those features that are independent of the target
variable can be removed from the training set. The F-score
for each speech feature is calculated as follows:</p>
      <p>F =</p>
      <sec id="sec-4-1">
        <title>Between Group Variability (BGV)</title>
      </sec>
      <sec id="sec-4-2">
        <title>Within Group Variability (WGV) The BGV and WGV for each subset is calculated as:</title>
        <p>K
BGV = X ni(Y i:</p>
        <p>i=1
W GV = XK Xni (Yij
i=1 j=1</p>
        <p>K</p>
        <p>N</p>
        <p>Y )2
1
Y i:)2
K
Where K is the number of groups, N is the overall sample
size, ni is the number of observations in the ith group. Yij
is the jth observation in the ith out of K groups. Y is the
overall mean of the variable set, and Y i:is the sample mean
of the ith group. K 1 is also defined as the degrees of
freedom in some studies, referring to the maximum number
of logically independent features with the freedom to vary.</p>
        <p>
          The scikit-learn machine learning library provides a
native implementation of a one-way ANOVA F-test
(f classif) and a SelectKBest class to pick
features with the highest F-scores. The F-test score function
returns an array of F-scores, one for each speech feature.
SelectKBest class then picks the first k features with the
highest scores
          <xref ref-type="bibr" rid="ref16">(Pedregosa et al. 2011)</xref>
          .
        </p>
        <p>Using ANOVA feature selection on the entire dataset
leads to loss of vital vocal information. Each of the 54
baseline features provides fundamental and distinct speech
information. Removing any of these baseline features leads to
lost information, which is not available in any of the other
vocal feature sets. Just selecting the best k features from the
entire dataset using the highest F-scores leads to many
crucial original and derived features being left out. This is
especially observed in the highly dimensional WT and TQWT
feature subsets. This can also lead to overfitting to certain
derived features or a classification model that relies
primarily on features that perform well for that specific model
instead of features that represent the disease. To conserve vital
information obtained from each feature subset while also
addressing the broader dimensionality problem, we extract
features from each feature set separately. This ensures that the
original signals are retained and focuses on finding the best
performing derived features. All baseline features are used,
and the best ki features are extracted from MFCCs, WTs,
and TQWTs, respectively. ki is obtained for each subset
using grid-search cross-validation. The grid-search
crossvalidation evaluated a different combination of ki features
from each subset to find the optimal classification
performance. Forty features from MFCCs, 75 from WT, and 100
from TQWT were selected with the highest F-scores in their
category, and these were used along with baseline features
as the training set.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Parameter</title>
        <p>Learning Rate
Number of Estimators
Max Depth
Min Child Weight
Gamma
Subsample
Col. Sample by Tree
Num. Thread
Scale POS Weight
XGBoost is a robust gradient boosting library based on
ensemble tree-boosting. Its fundamental function predicts
a new classification membership after each iteration.
Predictions are made from weak classifiers and are iteratively
improved. Incorrect classifications from the previous
iteration receive higher weights, forcing the model to focus
on their performance improvement. The final classification
combines the improvement of all the previously modeled
trees. XGBoost is not susceptible to overfitting because of
its more robust regularization framework that constrains
overfitting. An XGBoost classifier was trained on the
training dataset that was extracted after ANOVA. XGBoost’s</p>
      </sec>
      <sec id="sec-4-4">
        <title>Baseline</title>
        <p>MFCC
WT
TQWT
Baseline + MFCC
Baseline + WT
Baseline + TQWT
MFCC + WT
MFCC + TQWT
WT + TQWT</p>
        <p>All features</p>
      </sec>
      <sec id="sec-4-5">
        <title>Model/ Study</title>
        <p>multi-ANOVA</p>
        <p>+ XGBoost
(proposed framework)</p>
        <p>Combined ANOVA</p>
        <p>+ XGBoost</p>
        <p>Gunduz (2019):
All features + CNN</p>
        <p>Gunduz (2019):
All features + SVM</p>
        <p>Sakar et al. (2019):
Top-50 features using
mRMR + SVM (RBF)</p>
        <p>Polat (2019): RFC</p>
        <p>AUC
0.91
0.89
n/a
n/a
n/a
n/a
built-in cross-validation was used at each iteration to get
the optimal boosting iterations in a single run. Grid-search
cross-validation was used to optimize the model
parameters. The final hyper-parameters obtained are shown in
Table 2. The optimized model achieved the highest
classification accuracy of 94.78%. In the following section, we
evaluate our framework’s performance with benchmarked
statistical models and other studies on this dataset.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>Evaluation metrics are needed to assess the predictive
performance of the proposed framework. Although accuracy is
a common metric, it may yield misleading results in case
of unbalanced class distribution. Evaluation metrics such as
F-measure, MCC, and ROC AUC can measure how well a
classifier performs, even in class imbalance cases. We use
ROC AUC, F-Measure, MCC, and accuracy to evaluate the
performance of the proposed framework against statistical
classifiers and other studies using this dataset. While using
individual feature sets, TQWT-based features perform
better than other feature subsets. Significant improvement in
Performance Metrics</p>
      <p>F1 Acc. MCC
classification performance is observed when one feature set
(baseline, MFCC, or WT) is complemented with TQWT
features. Using ANOVA to extract the best features and then
using them to train an XGBoost model performs better than
other state-of-the-art techniques proposed on this dataset.
Polat’s (2019) proposal to use SMOTE to over-sample the
minority class and train an RFC leads to a slightly better
classification accuracy (0.001). However, AUC, F-measure,
and MCC metrics of Polat’s model are unknown. The
performance of benchmarked classifiers, including SVM, RFC,
and Gradient Boosting Classifier (GBC), using different
feature combinations is shown in Table 3. The performance
metrics of our proposed framework, compared to other
studies, are presented in Table 4. We also demonstrate that using
a multi-ANOVA strategy performs better than one ANOVA
on the entire feature set.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper presents a two-step classification framework to
diagnose PD using a set of 753 vocal features. We propose
a novel vocal-feature selection technique for PD
classification using multiple one-way ANOVA on the MFCCs, WT
and TQWT. The selected features are merged with
baseline vocal and biological features to form the training set.
We propose an XGBoost classifier trained on the extracted
data for PD classification. The proposed framework achieves
a classification accuracy of 94.71% with an F-1 of 0.965
and an MCC of 0.86. We show that the proposed
framework performs better than the state of the art without altering
the dataset by over or under-sampling. We demonstrate that
separately extracting features from different algorithms
reduces the dimensionality without the loss of any vital speech
information and performs better than a generic feature
selection technique. We also show that the proposed
framework performs better than benchmarked statistical
classifiers. Most literature on PD diagnosis relies on a very small
sample size collected from 20-30 persons. High levels of
accuracy in predictions of models based on a significantly
larger data set (i.e., 252 persons) have been demonstrated
in this paper. Thereby, the generalization capabilities of the
model are validated. Using the proposed framework,
clinical diagnosis of early-onset of PD will be consistent across
physicians, thereby eliminating the chances of
misdiagnosis. Specifically, the high levels of accuracy, F1, MCC, and
ROC AUC indicate that there is a very negligible chance of
missing a diagnosis. We have open-sourced the code used in
this study in a public GitHub repository (https://github.com/
Gaurangprasad/parkinson disease ANOVA classifier).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Cova</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Priori</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Diagnostic biomarkers for Parkinson's disease at a glance: where are we</article-title>
          ?
          <source>Journal of Neural Transmission</source>
          <volume>125</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1417</fpage>
          -
          <lpage>1432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>A comparison of multiple classification methods for diagnosis of Parkinson disease</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>37</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1568</fpage>
          -
          <lpage>1572</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Gil</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Manuel</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Diagnosing Parkinson by using artificial neural networks and support vector machines</article-title>
          .
          <source>Global Journal of Computer Science and Technology</source>
          <volume>9</volume>
          (
          <issue>4</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Gunduz</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Deep learning-based Parkinson's disease classification using vocal feature sets</article-title>
          .
          <source>IEEE Access</source>
          <volume>7</volume>
          :
          <fpage>115540</fpage>
          -
          <lpage>115551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Gu¨ru¨ler,</article-title>
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>A novel diagnosis system for Parkinson's disease using complex-valued artificial neural network with k-means clustering feature weighting method</article-title>
          .
          <source>Neural Computing and Applications</source>
          <volume>28</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1657</fpage>
          -
          <lpage>1666</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Hazan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hilu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Manevitz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ramig</surname>
            ,
            <given-names>L. O.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sapir</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Early diagnosis of Parkinson's disease via machine learning on speech data</article-title>
          .
          <source>In 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel</source>
          , 1-
          <fpage>4</fpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          2015.
          <article-title>Stochastic anomaly detection in eye-tracking data for quantification of motor symptoms in Parkinson's disease</article-title>
          .
          <source>In Signal and Image Analysis for Biomedical and Life Sciences</source>
          ,
          <fpage>63</fpage>
          -
          <lpage>82</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Imbalanced Data</article-title>
          . URL https://medium.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>com/@jasonjoseph072/imbalanced-data-97e2e8a9e0a8.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Karimi</given-names>
            <surname>Rouzbahani</surname>
          </string-name>
          , H.; and Daliri,
          <string-name>
            <surname>M. R.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Diagnosis of Parkinson's disease in human using voice signals</article-title>
          .
          <source>Basic and Clinical Neuroscience</source>
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>12</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , S.-H.; and
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Parkinson's disease classification using gait characteristics and wavelet-based feature extraction</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>39</volume>
          (
          <issue>8</issue>
          ):
          <fpage>7338</fpage>
          -
          <lpage>7344</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Little</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McSharry</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Spielman</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Ramig</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Suitability of dysphonia measurements for telemonitoring of Parkinson's disease</article-title>
          .
          <source>Nature Precedings 1-1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Maldonado</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Lo´pez, J.; and
          <string-name>
            <surname>Vairetti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>An alternative SMOTE oversampling strategy for high-dimensional datasets</article-title>
          .
          <source>Applied Soft Computing</source>
          <volume>76</volume>
          :
          <fpage>380</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Manap</surname>
            ,
            <given-names>H. H.</given-names>
          </string-name>
          ; Tahir,
          <string-name>
            <given-names>N. M.</given-names>
            ; and
            <surname>Yassin</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. I. M.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Statistical analysis of parkinson disease gait classification using Artificial Neural Network</article-title>
          .
          <source>In 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)</source>
          ,
          <fpage>060</fpage>
          -
          <lpage>065</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Oung</surname>
            ,
            <given-names>Q. W.</given-names>
          </string-name>
          ; Muthusamy,
          <string-name>
            <given-names>H.</given-names>
            ;
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            ;
            <surname>Basah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N.</given-names>
            ;
            <surname>Yaacob</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Sarillee,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. H.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Technologies for assessment of motor disorders in Parkinson's disease: a review</article-title>
          .
          <source>Sensors</source>
          <volume>15</volume>
          (
          <issue>9</issue>
          ):
          <fpage>21710</fpage>
          -
          <lpage>21745</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Weiss, R.;
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Brucher,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Perrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Duchesnay</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Scikitlearn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Peker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Delen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Computer-aided diagnosis of Parkinson's disease using complex-valued neural networks and mRMR feature selection algorithm</article-title>
          .
          <source>Journal of healthcare engineering 6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Polat</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>A hybrid approach to Parkinson disease classification using speech signal: the combination of smote and random forests</article-title>
          .
          <source>In 2019 Scientific Meeting on ElectricalElectronics &amp; Biomedical Engineering and Computer Science (EBBT)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Poorjam</surname>
            ,
            <given-names>A. H.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Why we take only 12- 13 MFCC coefficients in feature extraction? URL https://www.researchgate.net/post/Why we take only 12- 13 MFCC coefficients in feature extraction</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Ramani</surname>
            , R.
            <given-names>G.</given-names>
            ; and Sivagami, G.
          </string-name>
          <year>2011</year>
          .
          <article-title>Parkinson disease classification using data mining algorithms</article-title>
          .
          <source>International journal of computer applications 32</source>
          <volume>(9)</volume>
          :
          <fpage>17</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Sakar</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Isenkul</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sakar</surname>
            ,
            <given-names>C. O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sertbas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gurgen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Delil</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Apaydin, H.; and
          <string-name>
            <surname>Kursun</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings</article-title>
          .
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          <volume>17</volume>
          (
          <issue>4</issue>
          ):
          <fpage>828</fpage>
          -
          <lpage>834</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Sakar</surname>
            ,
            <given-names>C. O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Serbes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gunduz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tunc</surname>
            ,
            <given-names>H. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nizam</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sakar</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tutuncu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aydin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Isenkul</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          ; and Apaydin,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>A comparative analysis of speech signal processing algorithms for Parkinson's disease classification and the use of the tunable Q-factor wavelet transform</article-title>
          .
          <source>Applied Soft Computing</source>
          <volume>74</volume>
          :
          <fpage>255</fpage>
          -
          <lpage>263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Shahbakhi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Far</surname>
          </string-name>
          , D. T.; and
          <string-name>
            <surname>Tahami</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Speech analysis for diagnosis of parkinson's disease using genetic algorithm and support vector machine</article-title>
          .
          <source>Journal of Biomedical Science and Engineering</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Solana-Lavalle</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <article-title>Gala´n-Herna´ndez</article-title>
          , J.-C.; and RosasRomero,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>Automatic Parkinson disease detection at early stages as a pre-diagnosis tool by using classifiers and a small set of vocal features</article-title>
          .
          <source>Biocybernetics and Biomedical Engineering</source>
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <fpage>505</fpage>
          -
          <lpage>516</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Tolosa</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gaig</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; Santamar´ıa, J.; and Compta,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>Neurology</source>
          <volume>72</volume>
          (
          <issue>7 Supplement 2</issue>
          ):
          <fpage>S12</fpage>
          -
          <lpage>S20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Tseng</surname>
          </string-name>
          , P.-H.;
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>I. G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Munoz</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Itti</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>Eye-tracking method and system for screening human diseases</article-title>
          .
          <source>US Patent 8</source>
          ,
          <issue>808</issue>
          ,
          <fpage>195</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>