376


Exploring the Effects of Different Embedding Algorithms
and Neural Architectures on Early Detection of
Alzheimer’s Disease
Minni Jain, Rishabh Doshi, Vibhu Sehra and Divyashikha Sethia
Department of Computer Engineering, Delhi Technological University, Delhi, India


                                          Abstract
                                          Alzheimer’s Disease (AD) is an irrecoverable, progressive neurodegenerative disorder that deteriorates the cognitive and
                                          linguistic abilities of a person over time. Ample research has been done on the early detection of AD; it remains a challenging
                                          task. Doctors use the patient’s history, laboratory tests, and change in behaviour to diagnose the disease. Natural Language
                                          Processing(NLP) techniques can help automate the detection of AD, as Language impairments accompany this disease. This
                                          work aims to analyze the effect of different Embedding models on the DementiaBank dataset in order to detect the disease.
                                          The work uses both Generic and domain-specific Word Embeddings on the three deep learning models - CNN, Bidirectional
                                          LSTM(BLSTM), and CNN+BLSTM. Results indicate that for a specific picture description task like cookie theft description,
                                          domain-specific Word Embeddings tend to work better. Lastly, it is discussed how results are affected by the use of different
                                          Embedding models (Fasttext, Word2Vec, GloVe).

                                          Keywords
                                          Alzheimer’s Disease, Natural Language Processing, Word Embeddings, Deep Learning, Cookie theft Description task


1. Introduction                                                                                                    medical treatment is not very useful after the diagnosis
                                                                                                                   of the disease. Hence the early detection of Alzheimer’s
Alzheimer’s Disease(AD) is a brain disorder that slowly                                                            is still a challenge in medical science. There have been
damages the nerve connections in the Brain. It is the                                                              many attempts to diagnose the disease with the help
most common type of dementia and symptoms of AD                                                                    of neuroimaging techniques, but non-imaging tech-
include communication difficulties, memory loss, poor                                                              niques are essential to personalize the treatment for
judgment, and changing mood and personality1 . More                                                                a patient and monitor disease progression. Machine
than 50 million people are diagnosed with Alzheimer’s                                                              learning can detect the language deficits that often ac-
Disease every year 2 . This challenge has grown sub-                                                               company dementia and therefore can be used for ealry
stantially over the years with the ageing of the pop-                                                              detection of Alzheimer’s Disease. Previously, many
ulation and the agerelated nature of many dementia-                                                                Natural Language Processing (NLP) techniques were
producing neurodegenerative diseases [1]. This num-                                                                proposed to help in early detection of Alzheimer’s Dis-
ber of cases for Alzheimer’s Disease will continue to                                                              ease. These techniques treat the problem as a super-
grow in the coming years. There is no proven health                                                                vised learning problem. Previous research works like
care method to cure AD. Hence, it is necessary to de-                                                              [2, 3, 4] made use of transcripts obtained from inter-
velop a new method to detect AD in a patient. Around                                                               views with patients to detect Alzheimer’s disease by
50 to 90% of dementia cases are left undiagnosed by                                                                using various machine learning and deep learning al-
standard clinical examinations [1]. Early detection of                                                             gorithms. Further, other studies like [5, 6, 7] used acous-
Alzheimer’s Disease is still a massive issue in the cur-                                                           tic features obtained from the audio recordings of the
rent scenario. Alzheimer’s Disease progresses over the                                                             interviews for the classification task. Our study aims
years, and sometimes patients can have the disease                                                                 to explore the effect of various Word Embeddings and
for 20 years before showing symptoms. At this point,                                                               neural architectures on transcripts obtained from the
                                                                                                                   cookie theft description task of DementiaBank.
ISIC’2021: International Semantic Intelligence Conference, Feb 25–27,                                                 This paper makes use of both generic and domain-
2021, New Delhi, Delhi , India                                                                                     specific Word Embeddings that are trained on the tran-
" minnijain@dtu.ac.in (M. Jain); doshirishabh26@gmail.com (R.
Doshi); vibhusehra@gmail.com (V. Sehra); divyashikha@dtu.ac.in                                                     scripts. Out of all the presented models, the CNN +
(D. Sethia)                                                                                                        Bidirectional LSTM models that make use of Fasttext
                                                                                                                  domain-specific Word Embeddings provides the best
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).                     results. Sentences obtained from the transcripts are
 CEUR

                                    CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                                                   input to the models, and the output is the predicted
               http://ceur-ws.org
 Workshop      ISSN 1613-0073
 Proceedings


               1 https://www.alz.org/alzheimers-dementia/10_signs
               2 https://www.alz.org/alzheimers-dementia/facts-figures                                             label (Healthy or Alzheimer’s), no feature engineering
                                                                                                                     377


was involved in the process. Hence, this paper inves-     had achieved an accuracy of 87.5% using the sparse
tigates how the task of detecting Alzheimer’s Disease     vector representations of 4, 5 n-grams. The dataset
is affected by the use of various domain-specific and     was equally divided by making use of 99 dementia tran-
generic Embeddings on different neural architectures.     scripts and 99 control transcripts from the dataset. Re-
   The rest of the paper comprises section 2, which       cently, [2] proposed the use of 3 different deep learn-
consists of the Related works followed by our proposed    ing algorithms- 2D-CNN, LSTM, and 2D CNN - RNN
work and experimental setup in sections 3 and 4, re-      models by making use of the complete Dementia bank
spectively. Then we present our results and discussion    dataset which consists of 1017 Alzheimer’s transcripts
in sections 5 and 6, respectively, which is followed by   and 243 control transcripts. They used each utterance
the conclusion and future work in section 7.              as a separate data sample, therefore obtaining 14362
                                                          utterance samples. They achieve the best accuracy of
                                                          91.1% using the CNN-RNN model by using Word Em-
2. Related Work                                           beddings along with POS tagged data to the classifier.
                                                          [3] used a Hierarchical attention network (HAN) on
This section discusses the previous research, done in
                                                          the transcripts obtained from DementiaBank Dataset.
the field of Alzheimer’s detection using the various
                                                          They made use of Word Embeddings along with demo-
machine learning and deep learning techniques.
                                                          graphic features for the prediction task obtaining an
                                                          accuracy of 86.9%. [11] proposed a model that com-
2.1. Machine Learning Techniques                          bined bidirectional hierarchical recurrent neural net-
                                                          work with an attention mechanism for dementia de-
Existing research found on early detection of Alzheim-
                                                          tection. [12] showed that fine tuned BERT model out-
er’s Disease using Natural language processing made
                                                          performed the models that used hand crafted feature
use of various machine learning techniques. [8] used
                                                          engineering. Table.4 summarizes the approach used
three different machine learning algorithms - namely
                                                          by previous research works.
Decision trees, Support Vector Machine, and K-Nearest
neighbours on a sample of 80 conversations to achieve
the best accuracy of 79.5% using their Decision tree 3. Proposed Work
model. [9]proposed a model using Support Vector ma-
chine making use of 14 lexical features, nine syntac- 3.1. Preprocessing
tic features, and n-grams extracted from the Pitt Cor-
pus in Dementia Bank Dataset by using 99 dementia This work uses the transcripts in the Dementia Bank
transcripts and 99 control transcripts from the dataset. dataset [13], which are available in the form of CHAT
They used Area Under Curve (AUC) metric to test the transcription [14]. The transcripts are passed through
performance of the algorithm achieving a maximum a series of steps as given below and illustrated in Fig. 1.
AUC score of 0.93 by using the top 1000 features ob- PyLangAcq library [15], which is a powerful library
tained using a Leave Pair Out Cross-Validation (LPOCV) that can handle CHAT data, reads the transcripts. We
crossvalidation technique.                               then convert all obtained utterances to lower text and
   Further, [7] used the DementiaBank dataset to ex- remove all punctuations. We use 99 transcripts from
tract the acoustic measures and semantic measures to each set (Dementia and Control) from the Cookie Theft
predict the clinical scores of the patients by making task as suggested by [9, 10] where they made use of an
use of the bivariate dynamic Bayes network. [5] ex- equal number of dementia and control patients.
tracted acoustic features from the DementiaBank datas-
et and created a regression model to predict clinical 3.2. Word Embeddings used for early
scores (MMSE) used for dementia prediction. [6] made           detection of Alzheimer’s Disease
use of acoustic features on various Machine Learning
models like Logistic Regression, KNN, Naive Bayes, This work uses three types of Word Embeddings- Wor-
Dummy classifier, Random Forests, and achieved the d2Vec [16], Glove [17] and, Fasttext [18]. These em-
best accuracy of 78% with Logistic regression classi- beddings are chosen because they are widely used and
fier.                                                    have different architechtures which may tell us the best
                                                         way to proceed with the problem in hand. All the
                                                         Word Embeddings have a 300-dimensional vector rep-
2.2. Deep Learning Techniques
                                                         resentation for each Word. For each of the types men-
[10] had made use of Deep-Deep neural networks and tioned above, two-Word Embeddings are used, Domain-
                                                                                                                       378


specific and generic Word Embeddings. All the tran- the 1D Convolution layer, ReLU [22] as the activation
scripts from DementiaBank are used to create the do- function for the Dense layers, and Softmax for classi-
main specific Word Embeddings stated above. The max- fication.
imum size of a transcript was 498 words. Hence, we
keep the size of the Word Embedding as (500,300).    3.3.2. Bi-Directional LSTM Model
                                                           The model has a series of the Bidirectional LSTM layer
3.2.1. Domain-Specific Word Embeddings
                                                           and Dropout [23] layer; further layers consist of a Dense
Domain-Specific Word Embeddings are Embeddings             network for classification. The Dropout layers are adde-
that are trained on a specific corpus that contains data   d to prevent overfitting in the model and dropout rate
from the interested domain. They are highly effective      is kept at 30%. All the layers use default Tanh activa-
for a specific domain but require extra training time.     tion except the last one, which uses Softmax for clas-
Gensim library [19] is used to create Word2vec [16]        sification.
and Fasttext [18] Word Embeddings from the corpus.
Glove3 library is used to create the GloVe Embeddings 3.3.3. Hybrid CNN + Bi-Directional LSTM Model
[17].
                                                         This model is a combination of the above two mod-
                                                         els. We pass the Embeddings through a series of 1D-
3.2.2. Generic Word Embeddings
                                                         convolutional layers followed by a MaxPooling layer,
Generic Word Embeddings are Embeddings that are with two bidirectional LSTM layers stacked over the
trained on vast generic corpora. Hence these Embed- Maxpool layer. A dense network follows this. Fig. 2.
dings reduce training time and often give outstanding illustrates the proposed model. The Activations used
results. The work trains the pretrained Glove [17] Em- for CNN and bidirectional LSTM is Tanh, while we use
beddings on 6 billion words. It trains Word2vec Em- ReLU [22] activation for dense layers followed by a
bedding, which includes word vectors for a vocabulary SoftMax function for classification.
of 3 million words and phrases on roughly 100 billion
words from a Google News dataset. It also trains Fast- 3.4. Training Details
text [18] Embedding, which contains vectors for 1 mil-
lion words, on Wikipedia 2017, UMBC web base cor- The above-stated models are trained using the Adam
pus, and statmt.org news dataset having a total of 16 Optimizer [24] for 30 epochs, each using Binary cross-
billion tokens.                                          entropy as the loss function. L2 regularization [21] is
                                                         applied in each layer has 𝜆 = 10−5
3.3. Deep Learning Models Used
This section explains the deep learning models that        4. Experimental Details
are used for the classification of control and dementia
                                                           This work uses Pitt Corpus, which is the largest En-
patients. Keras functional API [20] is used to create
                                                           glish dataset available in DementiaBank [13]. Demen-
all the deep learning models explained below. To ad-
                                                           tiaBank is a part of the TalkBank project initiated by
dress the concern of overfitting, we use L2 regularizer
                                                           Carnegie Mellon University. The National Institute of
[21] as the kernel initializer. Due to the small size of
                                                           Aging funds it. The project encourages research for
the dataset, the research makes use of 10-fold cross-
                                                           human communication. It uses the Codes for the Hu-
validation on each model. The model atempts to cap-
                                                           man Analysis of Transcripts (CHAT) system [14], whi-
ture the language impairments that are often seen in
                                                           ch provides automatic analysis and testing. The CHAT
the ealry phases of dementia. The Annexure provides
                                                           system is commonly used in many datasets to pro-
the details of the model architecture.
                                                           vide uniformity and easy usage. Various participants
                                                           from each group (Control and dementia) visited annu-
3.3.1. CNN Model                                           ally for the interview. Pitt Corpus [13] is a collection
In this work, the CNN model consists of a combina-         of transcripts and audio files that were collected as a
tion of 1DConvolution layers with an increasing num-       part of a longitudinal study conducted by Alzheimer’s
ber of kernels followed by MaxPool layers. A Dense         and Related dementia at the University of Pittsburgh
network follows this. We use the Tanh activation for       School of Medicine. This dataset contains interviews

   3 https://github.com/JonathanRaiman/glove
                                                                                                                            379


                                                                NN + BLSTM) use the generic and domain-specific Wo-
                       Transcripts
                     (DementiaBank)
                                                                rd Embeddings of each Embedding model. For domain-
                                                                specific Word Embeddings, we achieved maximum ac-
                                                                curacies of 89.9%, 85%, and 90.6% with Fasttext Embed-
                                                                ding for CNN, BLSTM, and CNN + BLSTM models, re-
                                                                spectively. While for pre-trained Word Embeddings,
             Read the transcripts using PyLangAcq Library

                                                                maximum accuracies obtained were 85.2% with Glove
                                                                for both CNN and BLSTM, and 85.5% with Fasttext for
      Conversion of words to Lowercase and removing punctuation CNN + BLSTM. The baseline model used is constant
                                                                label classifier which gives the same result for any in-
                                                                put which achieved an accuracy of 50% since we have
             Creation of domain specific word embeddings        two classes. Tables 1, 2, and 3 summarize the results
                    (Word2Vec, Glove and Fasttext)              obtained by using the three Embedding models (Glove,
                                                                Word2Vec, Fasttext) for the three given deep learning
                                                                models. Fig. 4. compares the F1 scores achieved by
        Pass the word embeddings (domain specific and generic)
       through the classifier (CNN, BI-LSTM, CNN + BI-LSTM)
                                                                these models which makes clear that Domain Specific
                                                                Fasttext embeddings outperform all the other embed-
                                                                dings.
                                                                   Accuracy, precision, recall, and F1-score are used
                                 Result                         as the evaluation metrics. Previous works using deep
                                                                learning techniques such as [2] used accuracy, [10]
Figure 1: Proposed Approach for early detection of used AUC (Area Under Curve), and [3] used precision,
Alzheimer’s Disease                                             recall and F1 score as the evaluation metrics. Gener-
                                                                ally, the performance of the domain-specific Word Em-
                                                                beddings was better than that of Generic Word Embed-
of patients with possible Alzheimer’s along with con- dings. The probable causes are discussed further in the
trol patients, containing transcripts of 104 control pa- next section.
tients and 208 dementia patients. The patient’s ages
range from 49-90 years in the dataset. It comprises of
four different tests on the patients:                           6. Discussions
    • Cookie Theft: Patients see an image provided            The paper aims to explore how the different Word Em-
      by the Boston Diagnostic Aphasia Examination,           bedding models and types of Embeddings perform on
      and then the patients (Control and Dementia)            different neural models. It uses both the domain spe-
      recall the events taking place in the image (Fig. 3).   cific and the generic Word Embeddings to classify the
                                                              transcripts. However, since the domain-specific Word
    • Fluency: This task is done only for dementia            Embeddings have been trained on the same corpus be-
      patients where they respond to a word Fluency           ing used, it generally provides better results. As the
      task.                                                   cookie theft data comprises of explaining a particular
                                                              image, the vocabulary found in the transcripts is lim-
    • Recall: The Dementia Patients undergo a story
                                                              ited, and as a result, it is easier to understand the rela-
      recall test.
                                                              tionship between words. Using Domain-specific, Fast-
    • Sentence: The Dementia Patients perform a Sen-          text, and Word2vec provides better results than their
      tence construction task.                                Generic counterparts. Results indicate that Glove Em-
                                                              beddings provide similar results on both types of Word
The work uses the Cookie theft part of the corpus as          Embeddings.
it contains the maximum number of participants, and              If we had a combination of different tasks (not only
previous researchers have used it.                            cookie theft) having a larger corpus and vocabulary,
                                                              Generic Embedding might perform better.
                                                                 Results indicate that Word2vec has the lowest ac-
5. Results                                                    curacy amongst the three Embedding models. This is
                                                              possible because domain-specific Word2vec requires
All the three neural models - 1D CNN, Bidirectional
                                                              a larger corpus to develop the semantic relation as it
LSTM(BLSTM), and 1D CNN + Bidirectional LSTM (C-
                                                                                                                  380


Figure 2: Pictorial Representation of the CNN+BLSTM used


Table 1
Results obtained for the CNN model
                          Word Embedding     Accuracy      Precision   Recall   F1-score
                              Fasttext
                              Generic           0.85         0.86       0.85      0.85
                           Domain-specific      0.90         0.92       0.90      0.91
                               GloVe
                              Generic           0.85         0.85       0.85      0.85
                           Domain-specific      0.83         0.83       0.81      0.82
                             Word2Vec
                              Generic           0.77         0.78       0.77      0.77
                           Domain-specific      0.80         0.80       0.80      0.80


only captures local word relations. The domain spe-             in the text. Bidirectional LSTM is better than
cific Fasttext Embedding gives the best result since it         the LSTM as it trains on two LSTM cells instead
does not require a large corpus as it breaks each word          of one cell in a single input sequence.
into character n-grams, thereby increasing the vocab-
ulary size.                                          Compared to similar previous works like [2] and
                                                  [3] use a Word Embeddings layer that is trained along
   Results also indicate that the hybrid CNN + BLSTM
model achieves the highest accuracy of 90.6%. The with the neural architecture, this study uses three Word
CNN + BLSTM model works better than any single useEmbedding models and from each Embedding model,
of either of the model, because:                  a domainspecific and pre-trained Embedding is cre-
                                                  ated to identify how different Embedding models and
    • CNN model captures the short-term dependen- the type of data on which the Embeddings are trained
      cies in text.                               affects the performance of detecting Alzheimer’s Dis-
                                                  ease. [2] breaks down each transcript into utterances
    • LSTM model captures long term dependencies and considers them as separate data samples thereby


Table 2
Results obtained for the BLSTM model
                          Word Embedding     Accuracy      Precision   Recall   F1-score
                              Fasttext
                              Generic           0.80         0.85       0.80      0.82
                           Domain-specific      0.85         0.86       0.85      0.85
                               GloVe
                              Generic           0.85         0.88       0.85      0.86
                           Domain-specific      0.84         0.85       0.84      0.84
                             Word2Vec
                              Generic           0.74         0.75       0.74      0.74
                           Domain-specific      0.80         0.80       0.80      0.80
                                                                                                                      381


Table 3
Results obtained for the CNN+BLSTM model
                          Word Embedding         Accuracy   Precision       Recall   F1-score
                              Fasttext
                              Generic              0.86        0.86          0.85      0.85
                           Domain-specific         0.91        0.91          0.91      0.91
                               GloVe
                              Generic              0.84        0.85          0.83      0.84
                           Domain-specific         0.87        0.88          0.87      0.87
                             Word2Vec
                              Generic              0.77        0.79          0.78      0.78
                           Domain-specific         0.80        0.80          0.80      0.80


Table 4
Comparision of proposed work with results and techniques of existing work
       Author            Accuracy       Model                           Technique
       Orimaye et al.    87.5%          Neural Network                  4-5 n-grams
       (2018) [10]
       Karlekar et al.   82.8%          2D-CNN                          Word Embeddings
       (2018) [2]
       Karlekar et al.   83.7%          RNN                             Word Embeddings
       (2018) [2]
       Karlekar et al.   91.1%          2D-CNN + RNN                    Word Embeddings along with POS
       (2018) [2]                                                       tagged data
       Kong et al.       86.9%          Hierarchical Attention Net-     Word Embeddings
       (2019) [3]                       work
       Proposed work     90.6%          1D-CNN + BLSTM                  Doamin-Specific Fasttext Word Em-
                                                                        bedding


                                                            sification. For each word embedding algorithm 2 dif-
                                                            ferent types of word embeddings were used - Domain
                                                            Specific and Generic Embeddings, where it was found
                                                            that Domain Specific word embeddings performed bet-
                                                            ter than Generic Word Embeddings. This work was
                                                            limited by the small amount of dataset available. In
                                                            future, we may gather a larger dataset that may help
                                                            in creation of a more generalized embedding. Further,
                                                            we can also extend the dataset for people speaking dif-
                                                            ferent languages.

Figure 3: Boston cookie theft description task
                                                            A. Appendix
creating 14362 samples as compared to our 198 sam- A.1. Neural Model Details
ples which are complete transcripts of a patient.  We used the following neural models. The batch size
                                                   was kept at 10. In the last dense layer of each model
                                                   softmax activation function was used. Other dense
7. Conclusion and Future Work                      layers use a rectified linear activation function.
This study employs three Word Embedding algorithms
on three different Neural Models that make use of CNN
and Bidirectional LSTM for Alzheimer’s Disease Clas-
                                                                                                                  382


Figure 4: Comparsion of F1-scores achieved by different neural models and Word Embeddings


A.1.1. CNN Model                                          References
Each CNN-1D layer in brackets represents(no-of-filters   [1] M. W. Bondi, E. C. Edmonds, D. P. Salmon,
, kernel-size)                                               Alzheimer’s disease: past, present, and future,
CNN-1D(8,3) → CNN-1D(10,3) → MaxPool-1D(3) →                 Journal of the International Neuropsychological
CNN-1D(12,3) → CNN-1D(14,3) → MaxPool-1D(3)                  Society 23 (2017) 818–831.
→ Flatten() → Dense(20,Relu) → Dense(10,Relu) →          [2] S. Karlekar, T. Niu, M. Bansal, Detecting lin-
Dense(2,Softmax)                                             guistic characteristics of Alzheimer’s dementia
                                                             by interpreting neural models, arXiv preprint
A.1.2. BLSTM                                                 arXiv:1804.06440 (2018).
Each LSTM layer in brackets represents(no-of-lstm-cells- [3] W. Kong, H. Jang, G. Carenini, T. Field, A neu-
in-that-layer)                                               ral model for predicting dementia from language,
Bidir(LSTM(16)) → Dropout(0.3) → Bidir(LSTM(8))              in: Machine Learning for Healthcare Conference,
→ Bidir(LSTM(4)) → Bidir(LSTM(2)) → Dropout(0.2)             2019, pp. 270–286.
→ Dense(8) → Dense(2,Softmax)                            [4] S. O. Orimaye,  J. S.-M. W. K. J. Golden, Learning
                                                             Predictive Linguistic Features for Alzheimer’s
                                                             Disease and related Dementias using Verbal Ut-
A.1.3. CNN+BLSTM
                                                             terances, in: Proceedings Workshop on Com-
CNN-1D(8,3) → CNN-1D(10,3) → MaxPool-1D(3) →                 putational Linguistics and Clinical Psychology:
CNN-1D(16,3) → CNN-1D(20,3) → MaxPool-1D(3)                  From Linguistic Signal to Clinical Reality, 2014,
→ Bidir(LSTM(8)) → BatchNorm() → Bidir(LSTM(16))             pp. 78–87.
→ Dense(64,Relu) → Dense(32,Relu) → Dense (2,Soft- [5] S. Al-Hameed, M. Benaissa, H. Christensen, Sim-
max)
                                                                                                                         383


     ple and robust audio-based detection of biomark-          [18] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov,
     ers for alzheimer’s disease, in: 7th Workshop                  Enriching Word Vectors with Subword Informa-
     on Speech and Language Processing for Assistive                tion, arXiv:1607.04606 (2016).
     Technologies (SLPAT), 2016, pp. 32–36.                    [19] R. Rehurek, P. Sojka, Software framework for
 [6] V. Masrani, Detecting dementia from written                    topic modelling with large corpora, in: Proceed-
     and spoken language, Ph.D. thesis, University of               ings International Workshop on New Challenges
     British Columbia, 2018.                                        for NLP Frameworks, 2010.
 [7] M. Yancheva, F. Rudzicz, Vector-space topic mod-          [20] Keras,      Deep      learning    for     humans,
     els for detecting Alzheimer’s disease, in: Pro-                https://github.com/fchollet/keras, 2015. Last
     ceedings Annual Meeting of the Association for                 accessed on Nov 2019.
     Computational Linguistics, 2016, pp. 2337–2346.           [21] J. Schmidhuber, Deep learning in neural net-
 [8] C. Guinn, A. Habash, Language analysis of                      works: An overview, Neural Networks 61 (2015)
     speakers with dementia of the Alzheimer’s type,                85–117.
     in: AAAI Fall Symposium Series, 2012, pp. 8–13.           [22] A. F. Agarap, Deep Learning using Rectified Lin-
 [9] S. O. Orimaye, J. S.-M. Wong, K. J. Golden,                    ear Units (ReLU), arXiv:1803.08375 (2018).
     C. P. Wong, I. N. Soyiri, Predicting probable             [23] N. Srivastava, et al., Dropout: A Simple Way
     Alzheimer’s disease using linguistic deficits and              to Prevent Neural Networks from Overfitting,
     biomarkers, BMC Bioinformatics 18 (2017).                      Journal of Machine Learning Research 15 (2014)
[10] S. O. Orimaye, J. S.-M. Wong, C. P. Wong, Deep                 1929–1958.
     language space neural network for classifying             [24] D. P. Kingma, J. Ba, A Method for Stochastic Op-
     mild cognitive impairment and alzheimer-type                   timization, ?arXiv:1412.6980 (2014).
     dementia, PloS one 13 (2018).                             [25] L. Hebert, P. A. Scherr, J. L. Bienias, D. A. Ben-
[11] Y. Pan, B. Mirheidari, M. Reuber, A. Venneri,                  nett, D. A. Evans, Alzheimer disease in the US
     D. Blackburn, H. Christensen, Automatic hi-                    population: prevalence estimates using the 2000
     erarchical attention neural network for detect-                census, Arch Neurol 60 (2003) 1119–1122.
     ing ad, 2019, pp. 4105–4109. doi:10.21437/
     Interspeech.2019-1799.
[12] A. Balagopalan, B. Eyre, F. Rudzicz, J. Novikova,
     To bert or not to bert: Comparing speech and
     language-based approaches for alzheimer’s dis-
     ease detection, 2020. arXiv:2008.01551.
[13] J. T. Becker, F. Boller, O. L. Lopez, J. Saxton, K. Mc-
     Gonigle, The natural history of Alzheimer’s dis-
     ease. Description of study cohort and accuracy of
     diagnosis., Archives of Neurology 51 (1994) 585–
     594.
[14] B. Macwhinney, The CHILDES project: tools for
     analyzing talk, Child Language Teaching and
     Therapy 8 (2000).
[15] J. L. Lee, R. Burkholder, G. B. Flinn, E. R. Coppess,
     Working with CHAT transcripts in Python, Tech-
     nical report TR-2016-02, Technical Report, De-
     partment of Computer Science, University of
     Chicago, 2016.
[16] T. Mikolov, et al., Efficient Estimation of Word
     Representations in Vector Space, in: Proceedings
     International Conference on Learning Represen-
     tations, 2013.
[17] Pennington, Jeffrey, R. Socher, C. D. Manning,
     Glove: Global vectors for word representation,
     in: Proceedings International Conference Em-
     piricial Methods in Natural Language Processing,
     2014.