<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Neural Network Approach to Early Risk Detection of Depression and Anorexia on Social Media Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yu-Tseng Wang</string-name>
          <email>ytswang@nlg.csie.ntu.edu.tw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hen-Hsen Huang</string-name>
          <email>hhhuang@nlg.csie.ntu.edu.tw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hsin-Hsi Chen</string-name>
          <email>hhchen@ntu.edu.tw</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Information Engineering, National Taiwan University</institution>
          ,
          <addr-line>Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MOST Joint Research Center for AI Technology and All Vista Healthcare</institution>
          ,
          <addr-line>Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, people actively write text messages on social media platforms like Twitter and Reddit. The text shared on social media drives various applications including influenza detection, suicide detection, and mental illness detection. This work presents our approach to early risk detection of depression and anorexia on social media in CLEF eRisk 2018. For the two mental illnesses, our models combine TF-IDF information and convolutional neural networks (CNNs) to identify the articles written by potential patients. The official evaluation shows our models achieve ERDE5 of 10.81%, ERDE50 of 9.22%, and F-score of 0.37 in depression detection and ERDE5 of 13.65%, ERDE50 of 11.14%, and F-score of 0.67 in anorexia detection.</p>
      </abstract>
      <kwd-group>
        <kwd>Early Risk Detection</kwd>
        <kwd>Depression</kwd>
        <kwd>Anorexia</kwd>
        <kwd>Convolutional Neural Network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this work, explore people sharing their opinions, experiences, and feelings, on social
media platforms from Twitter and Reddit. Textual information extraction can be used
for various intelligent applications in the real world such as healthcare, communication,
entertainment, journalism, and advertising. According to data from 2010 to 2018
reported by statista.com1, the number of Facebook users increased from 431 million to
2,234 million, and the number of Twitter users grew from 30 million to 330 million. As
of April 2018, Reddit had about 33 millions of users. In social media, life experiences
and conversation history from a large number of users are recorded. In recent years,
there is a variety of research focused on social media, including hate speech detection
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], information extraction [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], analysis on gender differences [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], nastiness detection
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], named entity recognition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>In most cases, the detection task can be considered as a classification problem.
Various learning models and linguistic features are explored to deal with different goals.
For example, the detection of terrorist attack needs to take latency into account because
it is extremely important to prevent an attack from happening. Similar situations also
1 https://www.statista.com
hold in the detection of illnesses. In CLEF eRisk 20182, two tasks on early risk detection
of mental illnesses are conducted. The goal is to find out potential patients of depression
and anorexia as early as possible. In other words, we aim not only to accurately predict
if a social media user is a patient of depression/anorexia, but also to minimize the
revealed user information. In contrast to usual detection tasks, early risk detection is more
challenging. In this work, we conduct an analysis on the datasets and propose a neural
network-based approach to the two detection tasks. The rest of this paper is organized
as follows. Section 2 briefly describes the CLEF eRisk 2018 task and the dataset. We
present our model in Section 3. In Section 4, experimental results are discussed. Section
5 concludes this work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>CLEF eRisk 2018 Task</title>
      <sec id="sec-2-1">
        <title>Task Description</title>
        <p>
          Early risk prediction on the Internet (eRisk), which started since 2017, is a task held in
the Conference and Labs of the Evaluation Forum (CLEF) based on the consideration
that automatic detection models could be applied to identify the risk as early as possible
to help people avoid becoming victims of mental illnesses. In eRisk 2017 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a pilot
task on the detection of depression is conducted, and the metrics including precision
(P), recall (R), F1-score, and Early Risk Detection Error (ERDE) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] are used for
evaluation.
        </p>
        <p>In this year, eRisk 2018 extends eRisk 2017 by introducing another mental illness,
anorexia, to detect. In addition, the dataset of depression detection is also extended.
Both tasks are organized in training stage and test stage. The training data is the writing
history of users who are labeled as either risk or safe. The test data is composed of ten
chunks released sequentially. For each chunk of a user’s data, the model has to make a
decision among three choices: (1) The model does not want to emit a decision on this
user in this time. (2) The model emits a risk on this user. (3) The model emits a
nonrisk on this user. In Chunk 10, the last chunk, the undecided users should be determined
as either risk or non-risk.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Datasets</title>
        <p>
          In eRisk 2018[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the datasets on depression and anorexia are released. Table 1 shows
the statistics of the training sets. The posts and comments on Reddit, submitted by
normal and risk users, are collected. In both datasets, we observe that the average
submission per user in the normal group is higher than that in the risk group. On the other
hand, the average length per submission in the normal group is lower than that in the
risk group. Compared with Table 2, where the statistics of the test sets are shown,
similar phenomena are also observed.
        </p>
        <sec id="sec-2-2-1">
          <title>2 http://early.irlab.org/index.html</title>
          <p>The words with the highest TF-IDF score in the risk and the normal groups in both
datasets are listed in Table 3. The top words of the anorexia patients, marked as bold,
denote cues to the illness.</p>
          <p>For each user, their posts/comments are equally divided into 10 chunks based on the
chronological order. Each post/comment or WRITING includes four fields: TITLE,
DATE, INFO and TEXT. TITLE is the post title. For a comment, TITLE is always
empty. INFO means the source of the message. TEXT is the body of the post/comment.
The number of posts/comments varies from user to user. Moreover, there is no
consensus on the total time of writing. Since it is difficult to obtain the standardized time as
feature, our models take the information from only TITLE and TEXT into account.
F-score and ERDE are the major metrics used in CLEF eRisk. Equation 1 shows the
formula of F-score, where β = 1. ERDE complementally rewards early alerts because
F-score is unaware of time. Equation 2 shows the latency cost function lco(k), where k
is the number of textual items giving the answer, also called delay k times, and o is the
parameter that controls the cost rate. The relationship between k and o is shown in Fig.
1. For a true negative or a true positive prediction, the ERDE is zero; for a false negative
prediction, the ERDE is one; for a false positive prediction, the ERDE is set by Equation
3. In eRisk 2018, the averaged ERDE5 and the averaged ERDE50 are employed to
evaluate the performance.</p>
          <p>β =</p>
          <p>(1 + β2) × true positive
(1 + β2) × true positive + β2 × false negative + false positive</p>
          <p>1
  ( ) = 1 − 1+  −
ERDE true positive =   ( ) × truepositive
(1)
(2)
(3)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed Method</title>
      <p>
        We formulate the detection task as the problem of sentence classification. A classifier
based on convolutional neural network (CNN) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is proposed and trained on the
depression and the anorexia datasets. Scikit-learn [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is also used for computing the
TFIDF for each word in both datasets.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Training Model</title>
        <p>The dataflow of the training procedure is shown in Fig. 2. We first compute the
TFIDF for each word, and remove the words with low TF-IDF score in the sentence.
Finally, the sentence classifier is trained with the refined sentences. The details are listed
as follows.</p>
        <p>Keyword Selection. See Fig. 2 (a), we select the top 300 words with the highest
TFIDF, calculated in the risk documents. The toolkit TF-IDF Vectorizer is used to index
and convert each word to a unique integer in the range between 1 and 300.
Sentence Representation. The contents in TITLE and in TEXT from a WRITING are
concatenated as a sequence of words. We discard the words other than the top 300
keywords. The rest of the sequence will be trained to encode as a vector by using the
CNN-based sentence encoder. This step is important to convert an instance into a vector
and an example of sentence encoding in Figure 2 (b).</p>
        <p>Model Training. We regard the posts/comments written by risk users as positive
instances, and those written by normal users as negative instances. Then, we train the
CNN model3 to identify the potential patients and model architecture is shown in Figure
2 (c).</p>
        <sec id="sec-3-1-1">
          <title>3 https://github.com/Shawn1993/cnn-text-classification-pytorch</title>
          <p>
            Based on the binary classification results, we design a strategy to predict the high-risk
users as early as possible. First, we perform the CNN classifier to predict every
post/comment in a chunk of a user. See Table 4. We emit a risk on this user if more
than θ1 of posts/comments are labeled as positive. On the other hand, we emit a
nonrisk on this user if less than θ2 of posts/comments are labeled as negative. Otherwise,
we do not emit on this user except in the last chunk. In the last chuck, we emit a risk on
the user if more than θ3 of posts/comments are labeled as positive. Otherwise, a
nonrisk is emitted. The thresholds θ1, θ2, and θ3 are real values between 0 and 1. We tune
them with the development set.
After the last chunk submitted, scoreboard reports [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] shows performance with ERDE5,
ERDE50, Precision, Recall, and F-score. We compare our performance (denoted as
TBS) with those of leading teams in the depression task and the anorexia task in Table
5 and Table 6, respectively. In terms of ERDE5, the performance of our model in
depression detection is better than that in anorexia detection.
          </p>
          <p>
            There are different leading models in terms of ERDE5, ERDE50, F1, P and R. There
is a tradeoff between the different goals. The model with higher F-score usually suffers
from poor ERDE5. In addition, the performances of the same models in the depression
and the anorexia tasks are inconsistent. This result reveals the difference between these
two mental illnesses. Overall, early risk detection is challenging, especially when
multiobjectives are needed to optimize.
This work shows our proposed model that combines TF-IDF and CNN classification
for early risk detection of depression and anorexia. In CLEF eRisk 2018, our model
achieves decent ERDE5 in both tasks. According to the challenging issues discussed in
this paper, we will explore advanced methodologies for early risk detection. In future
work, we will improve the model according to the knowledge extracted from in-domain
resources such as Diagnostic and Statistical Manual of Mental Disorders (MSD-5) [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <source>Detecting Hate Speech in Social Media: Proceedings of Recent Advances in Natural Language Processing</source>
          , pages
          <fpage>467</fpage>
          -
          <lpage>472</lpage>
          , Varna, Bulgaria, (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Habib</surname>
            ,
            <given-names>M. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keulen</surname>
            ,
            <given-names>M. V.</given-names>
          </string-name>
          :
          <article-title>Information Extraction for Social Media:</article-title>
          <source>Proceedings of Third Workshop on Semantic Web and Information Extraction</source>
          , pages
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          , Dublin, Ireland, (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Garimella</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
          </string-name>
          , R.:
          <source>Zooming in on Gender Differences in Social Media: Proceedings of the Workshop on Computational Modeling of People's Opinions</source>
          , Personality, and Emotions in Social Media, pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          , Osaka, Japan, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Samghabadi</surname>
            ,
            <given-names>N. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maharjan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sprague</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz-Sprague</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solorio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <source>Detecting Nastiness in Social Media: Proceedings of the First Workshop on Abusive Language Online</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          , Vancouver, Canada, (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Zirikly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Named Entity Recognition for Arabic Social Media:</article-title>
          <source>Proceedings of NAACL-HLT</source>
          <year>2015</year>
          , pages
          <fpage>176</fpage>
          -
          <lpage>185</lpage>
          , Denver, Colorado, (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
            ,
            <given-names>J.: eRISK</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>CLEF Lab on Early Risk Prediction on the Internet: Experimental Foundations</article-title>
          .
          <source>Proceedings Conference and Labs of the Evaluation Forum CLEF</source>
          <year>2017</year>
          , pages
          <fpage>346</fpage>
          -
          <lpage>360</lpage>
          , Dublin, Ireland (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A Test Collection for Research on Depression and Language Use</article-title>
          .
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International Conference of the CLEF Association</source>
          , pages
          <fpage>28</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>CLEF 2016</year>
          ,
          <article-title>É vora</article-title>
          , Portugal, (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>Overview of eRisk - Early Risk Prediction on the Internet</article-title>
          .
          <source>In Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Ninth International Conference of the CLEF Association (CLEF</source>
          <year>2018</year>
          ), Avignon, France, (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <source>Convolutional Neural Networks for Sentence Classification: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1746</fpage>
          -
          <lpage>1751</lpage>
          , EMNLP 2014, Doha, Qatar, (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Buitinck</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louppe</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedregosa</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            <given-names>O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niculae</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grobler</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Layton</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VanderPlas</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holt</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>API design for machine learning software: experiences from the scikit-learn project: ECML PKDD workshop: languages for data mining and machine learning</article-title>
          , pages
          <fpage>108</fpage>
          -
          <lpage>22</lpage>
          , (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. American Psychiatric Association:
          <article-title>Diagnostic and statistical manual of mental disorders (5th ed)</article-title>
          , VA: American Psychiatric Association. Arlington, (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>