1. Introduction

Exploring the Efects of Diferent Embedding Algorithms and Neural Architectures on Early Detection of Alzheimer's Disease

Minni Jain

Rishabh Doshi

Vibhu Sehra

Divyashikha Sethia

0 0 Department of Computer Engineering, Delhi Technological University , Delhi , India

376 383

Alzheimer's Disease (AD) is an irrecoverable, progressive neurodegenerative disorder that deteriorates the cognitive and linguistic abilities of a person over time. Ample research has been done on the early detection of AD; it remains a challenging task. Doctors use the patient's history, laboratory tests, and change in behaviour to diagnose the disease. Natural Language Processing(NLP) techniques can help automate the detection of AD, as Language impairments accompany this disease. This work aims to analyze the efect of diferent Embedding models on the DementiaBank dataset in order to detect the disease. The work uses both Generic and domain-specific Word Embeddings on the three deep learning models - CNN, Bidirectional LSTM(BLSTM), and CNN+BLSTM. Results indicate that for a specific picture description task like cookie theft description, domain-specific Word Embeddings tend to work better. Lastly, it is discussed how results are afected by the use of diferent Embedding models (Fasttext, Word2Vec, GloVe).

eol>Alzheimer's Disease Natural Language Processing Word Embeddings Deep Learning Cookie theft Description task

1. Introduction

medical treatment is not very useful after the diagnosis of the disease. Hence the early detection of Alzheimer’s Alzheimer’s Disease(AD) is a brain disorder that slowly is still a challenge in medical science. There have been damages the nerve connections in the Brain. It is the many attempts to diagnose the disease with the help most common type of dementia and symptoms of AD of neuroimaging techniques, but non-imaging techinclude communication dificulties, memory loss, poor niques are essential to personalize the treatment for judgment, and changing mood and personality1. More a patient and monitor disease progression. Machine than 50 million people are diagnosed with Alzheimer’s learning can detect the language deficits that often acDisease every year 2. This challenge has grown sub- company dementia and therefore can be used for ealry stantially over the years with the ageing of the pop- detection of Alzheimer’s Disease. Previously, many ulation and the agerelated nature of many dementia- Natural Language Processing (NLP) techniques were producing neurodegenerative diseases [ 1 ]. This num- proposed to help in early detection of Alzheimer’s Disber of cases for Alzheimer’s Disease will continue to ease. These techniques treat the problem as a supergrow in the coming years. There is no proven health vised learning problem. Previous research works like care method to cure AD. Hence, it is necessary to de- [ 2, 3, 4 ] made use of transcripts obtained from intervelop a new method to detect AD in a patient. Around views with patients to detect Alzheimer’s disease by 50 to 90% of dementia cases are left undiagnosed by using various machine learning and deep learning alstandard clinical examinations [ 1 ]. Early detection of gorithms. Further, other studies like [ 5, 6, 7 ] used acousAlzheimer’s Disease is still a massive issue in the cur- tic features obtained from the audio recordings of the rent scenario. Alzheimer’s Disease progresses over the interviews for the classification task. Our study aims years, and sometimes patients can have the disease to explore the efect of various Word Embeddings and for 20 years before showing symptoms. At this point, neural architectures on transcripts obtained from the cookie theft description task of DementiaBank.

ISIC’2021: International Semantic Intelligence Conference, Feb 25–27, This paper makes use of both generic and domain2021, New Delhi, Delhi , India specific Word Embeddings that are trained on the tranD"osmhiin);nvijiabihnu@sedhtrua.a@c.ginm(aMil..cJoamin)(;Vd.oSsehhirrais);hdaibvhy2a6s@higkmhaa@il.dcotum.a(cR.i.n scripts. Out of all the presented models, the CNN + (D. Sethia) Bidirectional LSTM models that make use of Fasttext domain-specific Word Embeddings provides the best CPWrEooUrckReshdoinpgs 1IhStpNh:/c1e6u1r3-w-0st.o7r3gtpC©Cso:2E/m0/2Umw0oRCnwospWLwyircie.oganhrsltekzfAo.sorthttrrhiogbisup/ptaiaoPplnezrr4ho.b0yecIniietmtseeardnueatirhtnisoorg-nsda.slUe((sCmeCCpEeBerYnUm4tiR.ti0tae)-.d/W1un0Sd_e.rosCirgrgena)tisve irnespuulttst.o Sthenetmenocdeeslso,batanidnetdhefrooumtptuhteistrtahnescprriepdtsictaerde 2https://www.alz.org/alzheimers-dementia/facts-figures label (Healthy or Alzheimer’s), no feature engineering was involved in the process. Hence, this paper inves- had achieved an accuracy of 87.5% using the sparse tigates how the task of detecting Alzheimer’s Disease vector representations of 4, 5 n-grams. The dataset is afected by the use of various domain-specific and was equally divided by making use of 99 dementia trangeneric Embeddings on diferent neural architectures. scripts and 99 control transcripts from the dataset. Re

The rest of the paper comprises section 2, which cently, [ 2 ] proposed the use of 3 diferent deep learnconsists of the Related works followed by our proposed ing algorithms- 2D-CNN, LSTM, and 2D CNN - RNN work and experimental setup in sections 3 and 4, re- models by making use of the complete Dementia bank spectively. Then we present our results and discussion dataset which consists of 1017 Alzheimer’s transcripts in sections 5 and 6, respectively, which is followed by and 243 control transcripts. They used each utterance the conclusion and future work in section 7. as a separate data sample, therefore obtaining 14362 utterance samples. They achieve the best accuracy of 91.1% using the CNN-RNN model by using Word Em2. Related Work beddings along with POS tagged data to the classifier. [3] used a Hierarchical attention network (HAN) on This section discusses the previous research, done in the transcripts obtained from DementiaBank Dataset. the field of Alzheimer’s detection using the various They made use of Word Embeddings along with demomachine learning and deep learning techniques. graphic features for the prediction task obtaining an accuracy of 86.9%. [11] proposed a model that com2.1. Machine Learning Techniques bined bidirectional hierarchical recurrent neural network with an attention mechanism for dementia detection. [ 12 ] showed that fine tuned BERT model outperformed the models that used hand crafted feature engineering. Table.4 summarizes the approach used by previous research works.

Existing research found on early detection of Alzheim

er’s Disease using Natural language processing made use of various machine learning techniques. [ 8 ] used three diferent machine learning algorithms - namely Decision trees, Support Vector Machine, and K-Nearest neighbours on a sample of 80 conversations to achieve the best accuracy of 79.5% using their Decision tree 3. Proposed Work model. [9]proposed a model using Support Vector machine making use of 14 lexical features, nine syntac- 3.1. Preprocessing tic features, and n-grams extracted from the Pitt Corpus in Dementia Bank Dataset by using 99 dementia This work uses the transcripts in the Dementia Bank transcripts and 99 control transcripts from the dataset. dataset [ 13 ], which are available in the form of CHAT They used Area Under Curve (AUC) metric to test the transcription [ 14 ]. The transcripts are passed through performance of the algorithm achieving a maximum a series of steps as given below and illustrated in Fig. 1. AUC score of 0.93 by using the top 1000 features ob- PyLangAcq library [ 15 ], which is a powerful library tained using a Leave Pair Out Cross-Validation (LPOCV) that can handle CHAT data, reads the transcripts. We crossvalidation technique. then convert all obtained utterances to lower text and

Further, [ 7 ] used the DementiaBank dataset to ex- remove all punctuations. We use 99 transcripts from tract the acoustic measures and semantic measures to each set (Dementia and Control) from the Cookie Theft predict the clinical scores of the patients by making task as suggested by [9, 10] where they made use of an use of the bivariate dynamic Bayes network. [5] ex- equal number of dementia and control patients. tracted acoustic features from the DementiaBank dataset and created a regression model to predict clinical 3.2. Word Embeddings used for early scores (MMSE) used for dementia prediction. [6] made detection of Alzheimer’s Disease use of acoustic features on various Machine Learning models like Logistic Regression, KNN, Naive Bayes, This work uses three types of Word Embeddings- WorDummy classifier, Random Forests, and achieved the d2Vec [ 16 ], Glove [ 17 ] and, Fasttext [18]. These embest accuracy of 78% with Logistic regression classi- beddings are chosen because they are widely used and ifer. have diferent architechtures which may tell us the best way to proceed with the problem in hand. All the 2.2. Deep Learning Techniques Word Embeddings have a 300-dimensional vector representation for each Word. For each of the types men[10] had made use of Deep-Deep neural networks and tioned above, two-Word Embeddings are used, Domainspecific and generic Word Embeddings. All the tran- the 1D Convolution layer, ReLU [22] as the activation scripts from DementiaBank are used to create the do- function for the Dense layers, and Softmax for classimain specific Word Embeddings stated above. The max- ifcation. imum size of a transcript was 498 words. Hence, we keep the size of the Word Embedding as (500,300). 3.3.2. Bi-Directional LSTM Model 3.2.1. Domain-Specific Word Embeddings

Domain-Specific Word Embeddings are Embeddings

that are trained on a specific corpus that contains data from the interested domain. They are highly efective for a specific domain but require extra training time.

Gensim library [19] is used to create Word2vec [ 16 ] and Fasttext [18] Word Embeddings from the corpus.

Glove3 library is used to create the GloVe Embeddings 3.3.3. Hybrid CNN + Bi-Directional LSTM Model [ 17 ].

The model has a series of the Bidirectional LSTM layer and Dropout [23] layer; further layers consist of a Dense network for classification. The Dropout layers are added to prevent overfitting in the model and dropout rate is kept at 30%. All the layers use default Tanh activation except the last one, which uses Softmax for classification.

This model is a combination of the above two mod

3.2.2. Generic Word Embeddings els. We pass the Embeddings through a series of 1Dconvolutional layers followed by a MaxPooling layer, Generic Word Embeddings are Embeddings that are with two bidirectional LSTM layers stacked over the trained on vast generic corpora. Hence these Embed- Maxpool layer. A dense network follows this. Fig. 2. dings reduce training time and often give outstanding illustrates the proposed model. The Activations used results. The work trains the pretrained Glove [ 17 ] Em- for CNN and bidirectional LSTM is Tanh, while we use beddings on 6 billion words. It trains Word2vec Em- ReLU [22] activation for dense layers followed by a bedding, which includes word vectors for a vocabulary SoftMax function for classification. of 3 million words and phrases on roughly 100 billion words from a Google News dataset. It also trains Fast- 3.4. Training Details text [18] Embedding, which contains vectors for 1 million words, on Wikipedia 2017, UMBC web base corpus, and statmt.org news dataset having a total of 16 billion tokens.

The above-stated models are trained using the Adam

Optimizer [24] for 30 epochs, each using Binary crossentropy as the loss function. L2 regularization [21] is applied in each layer has = 10−5

3.3. Deep Learning Models Used 4. Experimental Details

This section explains the deep learning models that are used for the classification of control and dementia patients. Keras functional API [20] is used to create all the deep learning models explained below. To address the concern of overfitting, we use L2 regularizer [21] as the kernel initializer. Due to the small size of the dataset, the research makes use of 10-fold crossvalidation on each model. The model atempts to capture the language impairments that are often seen in the ealry phases of dementia. The Annexure provides the details of the model architecture.

This work uses Pitt Corpus, which is the largest En

glish dataset available in DementiaBank [ 13 ]. DementiaBank is a part of the TalkBank project initiated by Carnegie Mellon University. The National Institute of Aging funds it. The project encourages research for human communication. It uses the Codes for the Human Analysis of Transcripts (CHAT) system [ 14 ], which provides automatic analysis and testing. The CHAT system is commonly used in many datasets to provide uniformity and easy usage. Various participants from each group (Control and dementia) visited annu3.3.1. CNN Model ally for the interview. Pitt Corpus [ 13 ] is a collection In this work, the CNN model consists of a combina- of transcripts and audio files that were collected as a tion of 1DConvolution layers with an increasing num- part of a longitudinal study conducted by Alzheimer’s ber of kernels followed by MaxPool layers. A Dense and Related dementia at the University of Pittsburgh network follows this. We use the Tanh activation for School of Medicine. This dataset contains interviews

3https://github.com/JonathanRaiman/glove 5. Results All the three neural models - 1D CNN, Bidirectional

LSTM(BLSTM), and 1D CNN + Bidirectional LSTM (CThe work uses the Cookie theft part of the corpus as it contains the maximum number of participants, and previous researchers have used it.

• Cookie Theft: Patients see an image provided The paper aims to explore how the diferent Word Emby the Boston Diagnostic Aphasia Examination, bedding models and types of Embeddings perform on and then the patients (Control and Dementia) diferent neural models. It uses both the domain sperecall the events taking place in the image (Fig. 3). cific and the generic Word Embeddings to classify the transcripts. However, since the domain-specific Word • Fluency: This task is done only for dementia Embeddings have been trained on the same corpus bepatients where they respond to a word Fluency ing used, it generally provides better results. As the task. cookie theft data comprises of explaining a particular • Recall: The Dementia Patients undergo a story image, the vocabulary found in the transcripts is limrecall test. ited, and as a result, it is easier to understand the relationship between words. Using Domain-specific, Fast• Sentence: The Dementia Patients perform a Sen- text, and Word2vec provides better results than their tence construction task. Generic counterparts. Results indicate that Glove Embeddings provide similar results on both types of Word Embeddings.

If we had a combination of diferent tasks (not only cookie theft) having a larger corpus and vocabulary, Generic Embedding might perform better.

Results indicate that Word2vec has the lowest accuracy amongst the three Embedding models. This is possible because domain-specific Word2vec requires a larger corpus to develop the semantic relation as it only captures local word relations. The domain specific Fasttext Embedding gives the best result since it does not require a large corpus as it breaks each word into character n-grams, thereby increasing the vocabulary size.

Results also indicate that the hybrid CNN + BLSTM model achieves the highest accuracy of 90.6%. The CNN + BLSTM model works better than any single use of either of the model, because:

Compared to similar previous works like [2] and

[3] use a Word Embeddings layer that is trained along with the neural architecture, this study uses three Word Embedding models and from each Embedding model, a domainspecific and pre-trained Embedding is created to identify how diferent Embedding models and • CNN model captures the short-term dependen- the type of data on which the Embeddings are trained cies in text. afects the performance of detecting Alzheimer’s Disease. [ 2 ] breaks down each transcript into utterances • LSTM model captures long term dependencies and considers them as separate data samples thereby in the text. Bidirectional LSTM is better than the LSTM as it trains on two LSTM cells instead of one cell in a single input sequence.

Accuracy

Precision

Recall

F1-score creating 14362 samples as compared to our 198 samples which are complete transcripts of a patient.

7. Conclusion and Future Work This study employs three Word Embedding algorithms

on three diferent Neural Models that make use of CNN and Bidirectional LSTM for Alzheimer’s Disease Classification. For each word embedding algorithm 2 different types of word embeddings were used - Domain Specific and Generic Embeddings, where it was found that Domain Specific word embeddings performed better than Generic Word Embeddings. This work was limited by the small amount of dataset available. In future, we may gather a larger dataset that may help in creation of a more generalized embedding. Further, we can also extend the dataset for people speaking different languages.

A. Appendix A.1. Neural Model Details We used the following neural models. The batch size was kept at 10. In the last dense layer of each model softmax activation function was used. Other dense layers use a rectified linear activation function. Each CNN-1D layer in brackets represents(no-of-filters , kernel-size)

CNN-1D(8,3) → CNN-1D(10,3) → MaxPool-1D(3) → CNN-1D(12,3) → CNN-1D(14,3) → MaxPool-1D(3) → Flatten() → Dense(20,Relu) → Dense(10,Relu) → Dense(2,Softmax)

[1]

M. W.

Bondi ,

E. C.

Edmonds ,

D. P.

Salmon , Alzheimer's disease: past, present, and future , Journal of the International Neuropsychological Society 23 ( 2017 ) 818 - 831 .

[2]

Karlekar ,

Niu ,

Bansal , Detecting linguistic characteristics of Alzheimer's dementia by interpreting neural models , arXiv preprint A.1 .2. BLSTM arXiv: 1804 . 06440 ( 2018 ).

Each

LSTM

layer in brackets represents(no-of-lstm-

cells- [3] W.

Kong , H.

Jang , G. Carenini, T.

Field , A neuin-that-layer) ral model for predicting dementia from language, Bidir(LSTM(16)) → Dropout(0.3) → Bidir(LSTM(8)) in: Machine Learning for Healthcare Conference, → Bidir(LSTM(4)) → Bidir(LSTM(2 )) → Dropout(0.2) 2019 , pp. 270 - 286 .

→ Dense(8) → Dense(2,Softmax) [4] SP .reOd.iOctrivime aLyien,gJ.uSis.-tiMc . FWea.tKu.rJe.sGfoolrdeAn,lzLheeaimrneinr'gs A.1 .3. CNN+ BLSTM Disease and related Dementias using Verbal Utterances , in: Proceedings Workshop on ComCNN-1D ( 8 ,3) → CNN-1D( 10 ,3) → MaxPool-1D(3) → putational Linguistics and Clinical Psychology: CNN-1D(16,3) → CNN-1D(20,3) → MaxPool-1D(3) From Linguistic Signal to Clinical Reality, 2014 , → Bidir(LSTM(8)) → BatchNorm() → Bidir(LSTM( 16 )) pp. 78 - 87 .

→ Dense(64 , Relu

)

→ Dense(32 , Relu

)

→ Dense (2 ,Soft- [5]

Al-Hameed ,

Benaissa ,

Christensen , Simmax) ple and robust audio-based detection of biomark- [18]

Bojanowski ,

Grave ,

Joulin , T. Mikolov, ers for alzheimer's disease , in: 7th Workshop Enriching Word Vectors with Subword Informaon Speech and Language Processing for Assistive tion, arXiv:1607.04606 ( 2016 ).

Technologies (SLPAT), 2016 , pp. 32 - 36 . [19]

Rehurek ,

Sojka , Software framework for [6]

Masrani , Detecting dementia from written topic modelling with large corpora, in: Proceedand spoken language , Ph.D. thesis , University of ings International Workshop on New Challenges British Columbia, 2018 . for NLP Frameworks, 2010 .

[7]

Yancheva ,

Rudzicz , Vector-space topic mod- [20] Keras , Deep learning for humans, els for detecting Alzheimer's disease , in: Pro- https://github.com/fchollet/keras, 2015 . Last ceedings Annual Meeting of the Association for accessed on Nov 2019 .

Computational

Linguistics , 2016 , pp. 2337 - 2346 . [21]

Schmidhuber , Deep learning in neural net [8]

Guinn ,

Habash , Language analysis of works: An overview , Neural Networks 61 ( 2015 ) speakers with dementia of the Alzheimer's type , 85 - 117 .

in: AAAI Fall Symposium Series , 2012 , pp. 8 - 13 . [22]

A. F.

Agarap , Deep Learning using Rectified Lin [9]

S. O.

Orimaye , J. S.-M. Wong , K. J. Golden , ear Units (ReLU), arXiv: 1803 . 08375 ( 2018 ).

C. P.

Wong ,

I. N.

Soyiri , Predicting probable [23]

Srivastava , et al., Dropout : A Simple

Way

Alzheimer's disease using linguistic deficits and to Prevent Neural Networks from Overfitting, biomarkers , BMC Bioinformatics 18 ( 2017 ). Journal of Machine Learning Research 15 ( 2014 ) [10]

S. O.

Orimaye , J. S.-M. Wong , C. P. Wong , Deep 1929 - 1958 .

language space neural network for classifying [24]

D. P.

Kingma ,

Ba , A Method for Stochastic Opmild cognitive impairment and alzheimer-type timization , ?arXiv: 1412 .6980 ( 2014 ).

dementia, PloS one 13 ( 2018 ). [25]

Hebert ,

P. A.

Scherr ,

J. L.

Bienias ,

D. A.

Ben [11]

Pan ,

Mirheidari ,

Reuber , A . Venneri, nett,

D. A.

Evans , Alzheimer disease in the US D . Blackburn , H. Christensen , Automatic hi- population: prevalence estimates using the 2000 erarchical attention neural network for detect- census , Arch Neurol 60 ( 2003 ) 1119 - 1122 .

ing ad , 2019 , pp. 4105 - 4109 . doi: 10 .21437/ Interspeech.2019- 1799 .

[12]

Balagopalan ,

Eyre ,

Rudzicz ,

Novikova , To bert or not to bert: Comparing speech and language-based approaches for alzheimer's disease detection , 2020 . arXiv: 2008 .01551.

[13]

J. T.

Becker ,

Boller ,

O. L.

Lopez ,

Saxton , K. McGonigle, The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis ., Archives of Neurology 51 ( 1994 ) 585 - 594 .

[14]

Macwhinney , The CHILDES project: tools for analyzing talk , Child Language Teaching and Therapy 8 ( 2000 ).

[15]

J. L.

Lee ,

Burkholder ,

G. B.

Flinn ,

E. R.

Coppess , Working with CHAT transcripts in Python , Technical report TR-2016-02 , Technical

Report

, Department of Computer Science, University of Chicago, 2016 .

[16]

Mikolov , et al., Eficient Estimation of Word Representations in Vector Space, in: Proceedings International Conference on Learning Representations , 2013 .

[17] Pennington , Jefrey, R.

Socher , C. D.

Manning , Glove: Global vectors for word representation , in: Proceedings International Conference Empiricial Methods in Natural Language Processing , 2014 .