<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Early detection of anorexia using RNN-LSTM and SVM classi ers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akshaya Ranganathan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haritha A</string-name>
          <email>haritha16038g@cse.ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thenmozhi D</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chandrabose Aravindan</string-name>
          <email>aravindancg@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of CSE, SSN College of Engineering</institution>
          ,
          <addr-line>Chennai</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Social Media text analysis has engendered a variety of applications in the medical domain, a major example being the detection and cure of deleterious mental disorders. Anorexia is a deadly, psychiatric eating disorder with typical characteristics of alarmingly low body weight conditions and distorted body image, with an unreasonable sense of being overweight. With developments in the eld of Natural Language Processing, such highly lethal disorders can be identi ed and mitigated in their rudimentary stages, saving the victim a lot of mental and physical abuse. The Task 1 of CLEF 2019's eRisk lab focuses mainly on the early prediction of anorexia, analysed by posts which are sourced from social media platforms. Our team, SSN-NLP has used variations of two major models for sentiment classi cation, a deep learning RNN-LSTM, and a traditional SGDC Classi er. User-speci c data from consequent posts that were extracted from Reddit was released by CLEF eRisk, which was used in its entirety for our training, testing, evaluation and scoring process. With the help of RAKE (Automated keyword extraction), numeric scores were obtained to identify the level of anorexia/self-harm.SSN-NLP submitted 5 variant models to the server that repeatedly accepted submissions and gave user writings to the participating teams. According to the ERDE-50 and F1 scores, our 2-layer LSTM with normed-bahdanau attention, performed the best having scores of 0.07 and 0.33 respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Anorexia</kwd>
        <kwd>early detection deep learning</kwd>
        <kwd>machine Learning</kwd>
        <kwd>LSTM</kwd>
        <kwd>natural language processing</kwd>
        <kwd>SVM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Anorexia Nervosa is a potentially life-threatening psychiatric disorder
characterized by very extreme unhappiness over one's body image and intense desire
to lose weight even if it's lower than what's considered normal. In the age of
Instagram celebrities showing o their perfectly toned bodies, internet culture
has created harsh rules that people, especially teenagers are expected to adhere
to. According to a study by National Eating Disorders Association1,
irrespective of the time, 0.3-0.4% of women and 0.1% men test positive for anorexia
nervosa. DSM-5 (Diagnostic and Statistical Manual of Mental Disorders) gives
de nitions and diagnostic material for mental disorders. According to DSM-5,
Anorexia Nervosa is characterized by the following criteria: 2
1. Restriction of energy intake relative to requirements leading to signi cantly
low body weight in the context of age, sex, developmental trajectory, and
physical health.
2. Intense fear of gaining weight or becoming fat, even though underweight
3. Disturbance in the way in which ones body weight or shape is experienced,
undue in uence of body weight or shape on self-evaluation, or denial of the
seriousness of the current low body weight.</p>
      <p>
        However, another serious type of anorexia is called Atypical Anorexia where
a person maintains a healthy weight despite consistent loss in weight.Types of
anorexia include:
1. Binge/purge type: A person tries to purge by over-exercising or even
vomiting after eating in an attempt to compensate for the weight gained by
eating.
2. Restrictive type:: A person levies harsh restrictions on the quantity of
food consumed, which in most cases is barely su cient for survival.
eRisk 2019 primarily focuses on early detection of risk on the internet. The
primary goal is to use text mining solutions for early detection in various areas like
detection of people with suicidal tendencies, tendency to fall prey to criminal
organizations, etc [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The aim of Task 1 under Erisk 2019 is to detect symptoms
of anorexia as early as possible. Early detection technologies using text
processing can be employed in di erent areas, particularly those related to health and
safety. A few applications of early detection include the areas of sexual
predators, mental disorders and cyber-bullying. Prediction is broadly classi ed into
two stages: - the training stage and the test stage. In the training stage eRisk
released chunks of training data as well as test data of eRisk 2018. The chunks
consisted of user writings posted on Reddit as well as classi cation results. Users
are classi ed as Anorexic and Non-Anorexic. During the testing stage, an
automatic server repeatedly accepted our submissions and released test data batch
by batch. The task evaluates the earliness of predictions in addition to their
correctness. The task aims to obtain a scoring system based on the level of alert.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Extant methods to detect Anorexia can be categorized into two types. One
method is the analysis of change in behavioral patterns by general physicians
1 https://www.nationaleatingdisorders.org/statistics-research-eating-disorders
2 https://www.nationaleatingdisorders.org/learn/by-eating-disorder/anorexia
as well as friends and family of the patient through structured mental analysis.
According to a study that weighs the importance of a primary physician in
detecting eating disorders, a series of questions are used to detect the presence of
anorexia which is done by examining the answers to each of these questions [16].
A few examples include What did you eat yesterday?, Do you ever binge eat (eat
more than you want) or use laxatives, diuretics, or diet pills?, Do you think you
are thin (too thin), etc. The second method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] involves the use of Sentiment
Analysis on Social Media posts. For example, a research work showcased that
students with signs of depression use more personal pronouns like 'I' and negative
valence possessing words (eg: gloomy, sad). Erisk aims at early detection of
anorexic tendencies by analyzing posts of users on Reddit. One such approach
involves the Bag of Words (BoW) model [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] that uses a vocabulary comprising
of all the unique words in the text and performs vectorization assigning a speci c
weight to each word. The term weighting for the BoW model has been split into
3 components: a term frequency component, a document frequency component,
and a normalization component. Yet another approach involves UMLS based
MetaMap [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] assistance for keyword detection. Further, Traditional Learning
algorithms were applied to the information collected by the methods mentioned
above (eg. SVM, logistic regression, RF). Yet another approach involved the
use of TF-IDF similar to the works mentioned before. However, this research
adopted a deep learning approach using CNN-LSTM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our work involves the
usage of Recurrent Neural Networks with Long short term memory (LSTM) to
analyze patterns and make predictions on sequences of texts. Rapid automated
keyword extraction (RAKE) was implemented to identify the most frequently
occurring keywords relating to anorexia in the training data. The results were
combined to devise a prediction and risk-based scoring system.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Analysis</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Dataset analysis - Task 1</title>
        <p>
          This year's Task 1 was an extension of CLEF eRisk 2018's Task 2, the training
data [14] of this years task was a combination of both the test and training
data of the previous year. Reddit, much like twitter o ers a python
supporting API that can be used to scrape required data e ectively. Twitter sentiment
analysis [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] has proven to be a powerful indicator of mental illnesses like
depression and PTSD. While the training data was categorized into negative and
positive examples, the labels of test data had to be extracted from the le
riskgolden-truth-test.txt and mapped on to the actual writings of the users. Each
document had an XML tree structure comprising of the tags : INDIVIDUAL ,ID
,WRITING ,TITLE ,DATE and TEXT. For the training-examples, a total of
152 user writings were given in comparison to 320 users for the test-examples
all out of which only the TEXT and TITLE attributes were separated to be fed
as training data. Table 1 gives a summary of all heading levels.
Attributes
Number of users
Positives/Negatives
Number of documents
Avg documents per user 558.26
Avg words per document 184.54
The data [14] given was a consolidation of the test and training data of CLEF
2018. Data were represented as positive-examples and negative-examples chunks,
each containing XML les of writings done by a certain subject. Using XML
ElementTree library of Python, the given TEXT elements of each le were
consolidated as follows: (see Fig. 1) To atten out the discrepancies in the data set,
all special characters, erroneous blank spaces and empty strings (NULL) were
removed using Regular Expressions. The cleanup of data was done in accordance
with the input expected by the Neural Machine Translation model. Cleaned text
and respective labels were stored in the form of comma-separated values using
FileWriter of python. A vocabulary le comprising of all unique words in the
training set was built to be fed into the Deep learning model.
3.3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Data augmentation</title>
        <p>Due to the sparse characteristics of positive examples in the training set, Data
Augmentation had to be done using the mentioned mechanism: Synonym
generation using POS Tagging: Using the POSTagger module 3, various parts
3 https://github.com/nltk/nltk
of speech were identi ed from each positive example of the text. Post identi
cation, the NLTK WordNet 4 module identi ed the synonyms for adjectives(JJ)
and adverbs(RB) and populated the dataset with replaced text which led to a
signi cant increase of tuples in our dataset. As shown in the gure, (Fig. 3) the
POS Tagger splits each sentence into relevant parts of speech, and the wordnet
(Fig. 2) generates synonyms for each word. Multiple sentences of anorexia
positive users were augmented to the dataset by replacing each adverb and adjective
in a sentence with their respective list of most relevant synonyms. Take an
example sentence: My body is so heavy that I actively need to exercise
every moment of the day. The POS Tagger identi es heavy and actively
as adjective and adverb respectively. Synset identi es synonyms for heavy as
weighty, hefty, big, massive and synonyms for actively as e ectively, usefully,
productively. Now, sentences with combinations of these synonyms are
generated. Nearly 45,000 sentences were added to our dataset through the mentioned
methodology.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Proposed methodologies and Implementation</title>
      <sec id="sec-4-1">
        <title>Deep learning approach -Neural Machine Translation</title>
        <p>
          Task 1's primary goal was to classify the user as anorexia-positive or
anorexianegative. We have used a Deep Learning based approach for our
implementation using Neural Machine Translation to solve the classi cation problem. Basic
Architecture of Neural Machine Translation is a Sequence to Sequence model
(Seq2Seq). NMT is built based on the concept of an Encoder- Decoder [15]. The
encoder converts the input sequence to a thought vector while the decoder maps
it to a target language. In our case, the decoder maps the input sequences to
two classes- positive and negative indication of anorexia. The TensorFlow
code based on tutorial code released by Neural Machine Translation5 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] that
was developed based on Seq2Seq models [
          <xref ref-type="bibr" rid="ref1 ref12 ref6">12, 1, 6</xref>
          ] was used to implement our
4 https://github.com/wordnet/wordnet
5 https://github.com/tensor ow/nmt
deep learning approach for sentiment classi cation. Neural Machine Translation
(NMT) was implemented with LSTM. LSTM is expanded as Long Short Term
memory which is used to remember only the important parts of each input
sentence and is trained to forget the rest. Thus, the output is a combination of the
current input sentences predictions as well as the memory of previous important
parts of sentences. LSTM captures Long Term Dependencies using 3 gates
{ Forget Gate: Decides what part of previous cell state must be forgotten.
{ Input Gate: Responsible for the addition of information to the cell state.
{ Output Gate: Responsible for selecting useful information to output at
current cell state
        </p>
        <p>it = (wx(i)x + wh(i)ht 1 + b(i))
ft = (wx(f)x + wh(f)ht 1 + b(f) + 1)</p>
        <p>ot = (wx(o)x + wh(o)ht 1 + b(o))
ct = tanh(wx(c)x + wh(c)ht 1 + b(c))
e
ct = ft ect 1 + it ect
hb=f = ot tanh(ct)
where ws are the weight matrices, ht 1 is the hidden layer state at time t 1, it,
ft, ot are the input, forget, output gates respectively at time t, and hb=f is the
hidden state of backward, forward LSTM cells. Four di erent NMT variations
have been implemented for runs 1-4 of our submissions.</p>
        <p>{ Model 1: 2 layer bidirectional LSTM with Scaled Luong attention
{ Model 2: 4 layer bidirectional LSTM with Scaled Luong attention
{ Model 3: 2 layer bidirectional LSTM with Normed Bahdanau attention
{ Model 4: 4 layer bidirectional LSTM with Normed Bahdanau attention
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Traditional Learning Approach</title>
        <p>
          TF-IDF is used to assign weights to words to nd out important words. TF
stands for term frequency. It is a measure of the number of times a word occurs
in a given document [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. It is calculated by dividing the number of occurrences
of a given word by the total number of words in a document. However, words
like a, the occur a lot of times and are not very signi cant. So, we calculate the
Inverse Document Frequency.
        </p>
        <p>W eights = T F</p>
        <p>
          IDF
Stochastic Gradient Descent[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is essentially Gradient Descent with a batch size
of 1 and works e ectively when redundant data is present. SGD Classi er of
sklearn performs Stochastic Gradient Descent Optimization on SVM Classi
cation Model. Stochastic Gradient Descent is proven to be useful especially for
large datasets and has found increased usage in several text mining applications
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. After data augmentation, the dataset was cleaned and fed to the model.
The accuracy of the model while training was found to be 90%.
{ Model 0: SVM Classi er with SGD optimization using TF-IDF
(1)
(2)
(3)
(4)
(5)
(6)
(7)
The motive behind Task 1 of eRisk 2019 was to facilitate the early prediction
of anorexia. This year, they added another feature to the submissions called a
score of positivity or negativity. Score is a numeric estimation of the level of
anorexia/self-harm. Using this score, CLEF 2019 adapts ranking based
measures for the evaluation of participants. The module Rapid Automated Keyword
Extraction (RAKE) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] was used to identify the most frequently occurring
keywords in our training set, and to calculate the score based on these keywords.
The input parameters for RAKE comprise a list of stop words (or stoplist)
usually provided by NLTK for the English language, a set of phrase delimiters, and
a set of word delimiters. RAKE uses stop words and phrase delimiters to
segment the chunk of text into candidate keywords. The number of times each word
occurs in the document gives the frequency score, and the number of times each
keyword occurs with each other keyword is found as the co-occurrence score.
f inalscore = co
occurrencescore=f requencyscore
(8)
RAKE eliminates words that occur very frequently in the document but are of
trivial relevance. Using co-occurring keywords, we successfully mined out pairs
like body-mass, anorexia-nervosa,purge-eating, binge-eating. The fundamental
di erence between RAKE and TF-IDF scores is that RAKE nds word phrases
in a single document and assigns relevance scores, while TF-IDF uses multiple
documents to assign a single word score. Since our work required a single but
voluminous training document, RAKE outperformed its TF-IDF counterpart.
To achieve stable prediction scores, we used a function that checks the following
:
{ If a user classi ed as anorexia positive has stopped posting altogether, the
score was signi cantly increased, causing a high level of alert.
{ If a user was classi ed positive both in the current and previous runs, the
score was boosted so as to con rm the decision of positive anorexia, as early
as possible.
{ If a user was classi ed positive in the previous run, but the current run is
negative, the score was balanced out, waiting for further writings to make
the ultimate decision.
5
5.1
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Evaluation</title>
      <sec id="sec-5-1">
        <title>Decision based evaluation</title>
        <p>
          According to the task, several methods of evaluation were considered [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Evaluation of results was initially based on Early Risk Detection Error (ERDE).
ERDE gives a measure of correctness of decision as well as delay taken to arrive
at a decision.
        </p>
        <p>P =</p>
        <p>T P
T P + F P</p>
        <p>T P
R = (10)</p>
        <p>T P + F N
2 P R
F = (11)</p>
        <p>P + R
However, ERDE has certain drawbacks. For example, a system that detects all
the true positive writings still does not get an error of zero. Alternatively, a
modi cation ERDEo%was suggested. This method considers the percentage of
writings of the users seen before making a decision as opposed to the number
of user writings. However, this method has a major aw as in real life the total
number of user writings may not be known.Another method based on Flatency
was proposed. For a user u 2 U , ku writings are seen before making a decision
du. gu stands for the ground truth of decisions. Delay in nding true positives
are considered as</p>
        <p>latencyT P = median fku : u U; du = gu = 1g
speed = (1</p>
        <p>median fpenalty(ku : u U; du = gu = 1)g)
Based on the speed and F1 score, latency weighted F1 score is calculated.</p>
        <p>
          Flatency = F
speed
The maximum precision attained by our system is 0.48, whereas the overall
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] maximum among all systems is 0.71. Maximum recall of our system is 0.26,
as opposed to overall maximum of all systems which is 1. Maximum F1 score
is 0.34, whereas maximum of all systems id 0.71. ERDA5 is relatively low with
a value of 0.08 as opposed to least value of 0.06 amongst all systems. Least
ERDA50 is 0.07 for our system, while overall least is 0.03. Speed of a system is
1 if it detects true positive in the rst writing of a user. Systems speed is 1.
Standard measures of precision, recall, F-measure are calculated as follows:
P = ju U : du = gu = 1j
        </p>
        <p>
          ju U : du = 1j
R = ju U : du = gu = 1j
ju U : gu = 1j
Yet another factor for evaluating performance is the speed of a system. A speed
of 1 indicates that the system predicted true positives in the rst writing as
opposed to 0 if the system predicts only after a few hundred writings.
(12)
(13)
(14)
(16)
(17)
(18)
Along with the decision, a score which is an estimate of the level of risk, was also
calculated for each user. The evaluation algorithm assigns ranks to users based
on decreasing level of risk. The ranks are re-calculated after each set of writings.
The rankings are evaluated with P@10 and NDCG metrics. The relatively long
duration between submissions of various runs can be attributed to the o ine
processes used by our system(6 days,22 hs )
From the released evaluation results, it can be inferred that our models
performed extremely well with respect to early prediction (speed ), as the true
positives were correctly classi ed within the rst few sets of user writings. Our
Flatency however, was not up to standards, in comparison with a few of the
best functioning systems, such as CLAC, which achieved a weighted F1 score
of 0.69. Model 1 : 2 Layer BLSTM with scaled luong attention and
Model 4: 4 Layer BLSTM with normed bahdanau attention have shown
the best performance and this could be explained by taking the concept behind
these attention mechanisms. As mentioned in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], the luong mechanism simply
uses hidden states at the top LSTM layers in both the encoder and decoder,
thus explaining why for a lesser number of layers (2 layers) scaled - luong
attention worked better. The reason why bahdanau attention worked for a
deeper number of layers (4 layers) can be justi ed, as a hidden state in
Bahdanau goes through a deep-output and a max-out layer before making
predictions [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future work</title>
      <p>In this paper, we have presented the participation of our team, SSN-NLP at
the eRisk 2019 task of early detection of signs of anorexia. Early risk prediction
on the internet is vital to the development in the eld of mental health and
safety. We have treated this as a classi cation problem and presented 4 variations
of Deep learning approaches, and one Traditional learning model using Neural
Machine Translation (NMT) and SVM with SGD optimizer. The future scope for
our model includes complete automation, devoid of any kind of online processing
and research on other algorithms that could improve our model accuracy.
14. Trotzek, Marcel, Sven Koitka, and Christoph M. Friedrich. "Utilizing neural
networks and linguistic metadata for early detection of depression indications in text
sequences." IEEE Transactions on Knowledge and Data Engineering (2018).
15. Verma, A. A., and Bhattacharyya, P. Literature Survey: Neural Machine
Translation.
16. Walsh, J M et al. Detection, evaluation, and treatment of eating disorders the role
of the primary care physician. Journal of general internal medicine vol. 15,8 (2000):
577-90.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bahdanau</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473. 2014 Sep</source>
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Coppersmith</surname>
          </string-name>
          , Glen, Mark Dredze, Craig Harman, Kristy Hollingshead, and Margaret Mitchell.
          <article-title>"CLPsych 2015 shared task: Depression and PTSD on Twitter."</article-title>
          <source>In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</source>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Liu, Ning, Zheng Zhou,
          <source>Xin Kang, and Fuji Ren. "TUA1 at eRisk</source>
          <year>2018</year>
          .
          <article-title>" (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>David E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabio</surname>
            <given-names>Crestani</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Javier</given-names>
            <surname>Parapar</surname>
          </string-name>
          .
          <article-title>"Overview of eRisk: Early Risk Prediction on the Internet." In International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          , pp.
          <fpage>343</fpage>
          -
          <lpage>361</lpage>
          . Springer, Cham,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>David E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Crestani</surname>
          </string-name>
          , Fabio and Parapar, Javier.Overview of eRisk 2019:
          <article-title>Early Risk Prediction on the Internet</article-title>
          .
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 10th International Conference of the CLEF Association</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>(</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Luong</surname>
            , Minh-Thang,
            <given-names>Hieu</given-names>
          </string-name>
          <string-name>
            <surname>Pham</surname>
            , and
            <given-names>Christopher D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>"E ective approaches to attention-based neural machine translation</article-title>
          .
          <source>" arXiv preprint arXiv:1508.04025</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Neural machine translation (seq2seq) tutorial</source>
          .
          <year>2017</year>
          . URL: https://www. tensor ow.
          <source>org/tutorials/seq2seq (17.02</source>
          .
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Robbins</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monro S</surname>
          </string-name>
          .
          <article-title>A stochastic approximation method</article-title>
          .
          <source>The annals of mathematical statistics. 1951 Sep</source>
          <volume>1</volume>
          :
          <fpage>400</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Paul, Sayanta, Jandhyala Sree Kalyani, and
          <string-name>
            <given-names>Tanmay</given-names>
            <surname>Basu</surname>
          </string-name>
          .
          <article-title>"Early Detection of Signs of Anorexia and Depression Over Social Media using E ective Machine Learning Frameworks</article-title>
          .
          <source>"</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Qaiser</surname>
            , Shahzad &amp; Ali,
            <given-names>Ramsha.</given-names>
          </string-name>
          (
          <year>2018</year>
          ). Text Mining:
          <article-title>Use of TF-IDF to Examine the Relevance of Words to Documents</article-title>
          .
          <source>International Journal of Computer Applications</source>
          .
          <volume>181</volume>
          . 10.5120/ijca2018917395.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rose</surname>
            , Stuart, Dave Engel, Nick Cramer, and
            <given-names>Wendy</given-names>
          </string-name>
          <string-name>
            <surname>Cowley</surname>
          </string-name>
          .
          <article-title>"Automatic keyword extraction from individual documents." Text mining: applications and theory (</article-title>
          <year>2010</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          (pp.
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Word</given-names>
            <surname>Embeddings</surname>
          </string-name>
          and
          <article-title>Linguistic Metadata at the CLEF 2018 Tasks for Early Detection of Depression and Anorexia</article-title>
          .(
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>