<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Early Risk Prediction by means of DeepLearning?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pablo Raez Garcia Retamero</string-name>
          <email>praez@pa.uc3m.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel Segura Bedmar</string-name>
          <email>isegura@inf.uc3m.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Risk Pre-</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Carlos III de Madrid</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work presents our ve approaches to early risk detection of anorexia on social media in CLEF eRisk 2019. Our models make use of di erent kinds of deep neural networks to classify the users in a danger situation. We show the e ectiveness of our models by using the validation and test datasets. The best model obtains a F1 score of 0.57 over the objective class in the validation and a 0.20 over the test.</p>
      </abstract>
      <kwd-group>
        <kwd>Deep Learning diction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Anorexia is an eating disorder which presents symptoms such as fear of gaining
weight or a distorted and delirious perception of the own body. This disease
is often associated with severe psychological alterations that cause changes in
the emotional behaviour. These psychologycal alterations are discernible in the
behaviour of the a ected and are usually re ected in social media as posts and
comments. Currently several anorexia detection methods exist [
        <xref ref-type="bibr" rid="ref11 ref13 ref14 ref18 ref19 ref2">11, 2, 19, 14, 18,
13</xref>
        ], which are mainly based in behavioural analysis. Anorexia symptoms are
usually very diverse and probably hidden by the subjects of study, which makes
it harder to make a decision, delaying the diagnoses.
      </p>
      <p>
        Much research has been carried out to early detect these symptoms in social
media in an automatic way. Even being a well known problem, anorexia is still
hard to diagnose, due to it having wide variety of symptoms as well as the long
periods needed for them to show up, as in the amenorrhea case [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Because
getting to diagnose the patients is an arduous task, patients will receive treatment
in later stages of anorexia. This, in turn, will make the therapy longer and more
expensive than if the problem was promptly diagnosed. The automatic detection,
with the highest possible accuracy, of anorexia in its early stages would mean
great time savings as well as considerable patient health improvements who had
been treated quickly.
      </p>
      <p>Five di erent approaches were carried out in order to address this problem.
These approaches are explained in further details in section 4. Both, the results
obtained by the validation and testing dataset are included.</p>
      <p>The paper is structured as follows. Section 2 gathers the state of the art of
Natural Language Processing techniques applied to the risk prediction domain.
Next, in section 3 the dataset and tools used are named. It is followed by section
4 where the methods as well as the neural architectures proposed are described.
In section 5 the results obtained are shown. Finally in section 6 the conclusions
and the future work are gathered.
2</p>
    </sec>
    <sec id="sec-2">
      <title>State of the Art</title>
      <p>
        This section gathers the main works related to early risk prediction on the
internet. The usage of machine learning techniques in mental illness detection such
as anorexia is quite recent. Even so, there is considerable bibliography on the
matter [
        <xref ref-type="bibr" rid="ref11 ref13 ref14 ref18 ref19 ref2">11, 2, 19, 14, 18, 13</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], Deep Learning techniques have been applied to the problem of anorexia
and depression detection for the CLEF eRisk 2018 tasks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The authors
approach the problem by turning it into a sentence classi cation one, where the
sentences are classi ed as positive if they have been written by an ill user and
negative otherwise. They make use of the TF-IDF algorithm to get the most
representative words for each one of the classes. Then, the sentences are encoded
by means of a Convolutional Neural Network (CNN). They managed to obtain
F1 scores of 0.64 and 0.85 as well as ERDE5 of 8.78 and 11.40 in the depression
and anorexia tasks, respectively.
      </p>
      <p>Our rst approach is quite similar to the one previously described, but we
also make use of word or char embeddings in every model, as well as a fully
connected layer after the CNN ones, which have been shown to improve the
results of the classi er.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] approach to the CLEF eRisk 2018 tasks, di erent machine
learning techniques are presented, such as Linear Regression [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Super Vector
Machines [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Ada Boost [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Random Forests [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and Recursive Neural Network
(RNN) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Texts are represented using di erent features such as Bag Of Words
(BOW) and Uni ed Medical Language System (UMLS). Experiments show that
the best results are obtained by BOW and using the classi ers Ada Boost and
the Random Forests. They managed to obtain F1 scores of 0.58 and 0.67 as well
as ERDE5 of 9.81 and 12.17 in the depression and anorexia tasks, respectively.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] approach to the CLEF eRisk 2017 task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], several combinations
of user-level linguistic metadata, BoW [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], neural word embeddings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
CNN [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are used. Obtaining an F1 value of 0.48 and an ERDE5 of 12.73 on the
depression task.
      </p>
      <p>
        There have been some interesting approaches not so heavily focused into
machine and deep learning techniques such as the one described in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which
focuses into Author Pro ling (AP). It consists in analysing texts to predict
general or demographic attributes of authors such as: gender, age, personality,
native language, and political orientation, among others.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Materials</title>
      <p>This section gathers the materials used.
3.1</p>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>
          The dataset for this task has the same format as the one described in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The
collection provided, for training and validation, is composed by 152 subjects, of
them 20 are anorexic and 132 are not. The texts from these subjects are formed
by a total of 253,341 posts and comments, of which 24,874 come from ill subjects
and 228,467 are from healthy people. As it can be seen, the training set is very
unbalanced, which in turn makes the whole task harder to perform.
        </p>
        <p>For every di erent subject, we get all their writings with several information
elds, being them the title of the post (sometimes blank), as well as the date and
time. It also contains info about the platform where the post was made, may it
be reddit or other, and the posted text itself.</p>
        <p>The test dataset is hosted as a server that iteratively yields user writings
to the participating teams. These iterations go across time to get the writtings
of each user in a more real-world-like scenario. It will only give back the
writings when all runs of a timestep for a team are sent. This dataset counts with
2000 timestep for over 800 users. Being them "id", "nick", "redditor", "title",
"content", and "date". The "nick" is used as the subject id, and the "title",
"content" and "date" ones are used as their homonyms in the training dataset.
"Redditor" and "id" do not relate with any of the training dataset and nally
number indicates the iteration on the test dataset, which is used for validation
purposes.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Tools</title>
        <p>Google Colab was used to run the experiments. It consists of a machine with
an Intel(R) Xeon(R) CPU @ 2.20GHz as a CPU and its equipped with 12Gb of
RAM. The most interesting part of it for us is the GPU they provide, being it
a Tesla K80 GPU with 12Gb of memory as well.</p>
        <p>The experiments were developed using python, and its libraries Keras and
Tensor ow for DL models. Some other libraries were used such as Pandas or
NumPy for the processing of the data.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Method</title>
      <p>In this section the method followed for the development of the approaches is
explained. This method includes all the pre and post processing of the data.</p>
      <p>TFIDF text</p>
      <p>TFIDF text
3000
2500
se2000
cn
e
r
ru1500
c
O
1000
500
0 0 200 400 600Length
800 1000 1200 1400
1000 2000 Length
3000 4000 5000
(a) Hist. of post length used in A and C in (b) Hist. of total subject posts length used
words. in B in words.</p>
      <p>CHAR text
160000
140000
s120000
ce100000
n
e
rru 80000
c
O 60000
40000
20000</p>
      <p>0 0 200 400 600 Leng8t0h0 1000 1200 1400
(c) Hist. of post length used in D and E in
characters.
Di erent types of neural networks such as RNN and CNN have been used to
generate deep learning models, which are further explained below.</p>
      <p>As a preprocessing step, all texts are cleaned by removing stop words,
numbers, punctuation and words with less than three characters. Then a TF-IDF
algorithm is used in order to lower the volume of words while retaining the most
representative ones.</p>
      <p>For the models A, B and C, the posts are tokenized and cropped or padded
to a xed length of 50 words per post in models A and C. This padding and
cropping takes place because the input for the neural networks must have a xed
shape. The reason why longer texts are cropped is because too much padding
will add too much noise to the networks. This is because than most texts have
less than 50 words, as shown in gure 1a. The selected length of the B model is
350. Contrary to the one selected in the previous models, this length is chosen
due to the prohibitive size of the network past it. The ideal value would have
been 1000, as can be seen in gure 1b.</p>
      <p>For the models E and D instead of tokenizing the texts and xing them to a
certain length, another preprocessing step is added, based in splitting the words
into characters. This operation is needed in order to make use of char
embedThreshold Selection</p>
      <p>Threshold Selection
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0</p>
      <p>Objetive Class F1
Macro avg
Weighted avg
0.2
0.4 Threshold Value
0.6</p>
      <p>Objetive Class F1
Macro avg</p>
      <p>Weighted avg
0.4 Threshold Value</p>
      <p>
        0.6
0.8
1.0
0.2
0.8
1.0
(a) F1 scores obtained by A model depend- (b) F1 scores obtained by C model
depending on threshold value ing on threshold value
dings, which have shown themselves useful in NLP tasks [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Char embeddings
has its advantages against word embeddings: they do not face problems when
processing unseen words as every word is formed with characters. Another
characteristic advantage is the robustness against misspelled words. Furthermore,
char embeddings are usually low dimensional ones, which in turn improves the
speed of the models. Then each text is xed to a length of 400 characters. The
length was picked by hand and it was done regarding the g 1c in the same
way as for models A, B and C. Finally the characters were fed into the di erent
neural networks.
      </p>
      <p>Finally, and after the processing performed by the di erent models, the
output of the networks is compared with a threshold to determine if it was a risk
situation or not. This threshold was obtained empirically for each model by
subduing the results to several tests in which the threshold value iterated in the
range of 0.1 and 0.9. Then, the threshold with the highest F1 of the active class
was selected. The thresholds are shown in table 1. An illustrative example of
this process can be seen in gure 2 where the evaluation of A and C thresholds
is shown.</p>
      <p>Several experiments were performed to nd out the best hyper-parameter
con guration for each one of the models, which can be found in table 3. The
tuned hyperparameters regarding the model can be found in detail in table 2</p>
      <p>Some of the hyper-parameters checked were regarding the model themselves,
such as load emb, emb size, trainable emb, cnn size, rnn size, dropout, dnn size,
and batch size. Some others were speci c of the type of networks used; in the
CNN was cnn lter, which determines the size of the kernel used, and in the
RNN we can nd cell type, determining the type of the cell used, being it GRU
or LSTM, bidirectional that indicates if the layers were bidirectional ones, and
attention which, as its own name depicts, determines if an attention mechanism
was used or not.
This model is a simple rst approach to classify the di erent records
independently. The posts are taken as if they were independent, and they are labelled
to 0 or 1 taking into account if the user who wrote them was control class or
positive class patient.</p>
      <p>This model gets as input the di erent texts, which then will undergo a Word
Embedding layer, whose output is fed to a one-dimensional CNN. Finally, the
output of the former layer is fed into a fully connected layer just before the
output one (see Img 4a).
This model similar approach to the the previous one. But in this case, instead of
taking the texts as independent bits of information, all of the texts of the same
user are processed together. This way, the input to the net is all the tokenized
text a user has ever posted and the objective value is if the subject is in risk of
su ering anorexia or not.</p>
      <p>This model gets the text input which, in the same way as in the previous
model, undergo a Word Embedding layer, whose output is in the same way fed
140
120
100
s
ce 80
n
e
r
r
cu 60
O
40
20
0
Writtings per User</p>
      <p>Train
Test
0
250
500
750 1000 1250
Number of writtings
1500
1750
2000
to a RNN layer. The result is then fed to a fully connected layer which is placed
just before the output one (see Img 4b).
4.3</p>
      <sec id="sec-4-1">
        <title>C model</title>
        <p>
          This model is a more sophisticated one in the sense that it uses previous A
models in order to generate what we call "writing embeddings" by means of
transfer learning [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Then they are fed to a RNN layer, which allows us to
process varying number of texts. This is crucial due to the dataset having very
variable number of texts per user as can be seen in g 3.
        </p>
        <p>It is composed of the whole best A model without the two last layers. Those
outputs are used as "writing embeddings" which represent the di erent texts
in just a 32 dimension vector. Then, the "writing embeddings" are fed into the
RNN layers, whose output is then passed trough a fully connected layer before
the output layer (see Img 5a).
4.4</p>
      </sec>
      <sec id="sec-4-2">
        <title>D model</title>
        <p>This model follows the same idea as the A, which is to classify the di erent texts
independently. But it di ers from the previous one in the fact that it does not
use word embeddings, but char embeddings instead.</p>
        <p>This model gets as input the di erent chars from every post, which then will
undergo a Char Embedding layer, which mainly di ers from the word embedding
(a) A model
(c) D model
(b) B model
one in the dimensions of the vector which is way shorter, as well as in the
vocabulary which is, as well, way smaller. The output of this layer is then fed
into a one-dimensional CNN in the same way as the A model. Finally, the output
of the CNN is fed into a fully connected layer, and then the output one (see
Img 4c).
This model follows the same idea as the C, which is to use a pre-trained model to
generate "writing embeddings". But it di ers from the previous one in the fact
that instead of using a pre-trained model on word embeddings, it uses a char
embeddings pre-trained model. This model also takes advantage on the RNN
layers that allow it to process the users no matter the number of texts they
individually have, which, as aforementioned, is really disperse (see Fig 3).</p>
        <p>This model makes use of the best D model weights, but without the two last
layers. The outputs resulting of the processing with the cropped D model, which
are given in the form of a 64 dimension vector, are fed into the RNN layers.
Finally, likewise the previous models, the output of the former layer is fed into
a fully connected network, and then it goes under the output one (see Img 5b).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>In this section, the results obtained by the ve di erent approaches are shown.
We divide this section in the validation results and the test results. The
evaluation has taken place by means of the test server presented in section 3. We also
include the best results obtained in the challenge.</p>
      <p>
        The common measure of performance in terms of precision and recall is the
F1-score [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This metric is the harmonic mean of the precision and recall. As we
are mostly concerned about the performance over the positive class, only the F1
of that class is shown in the validation results. We also add the Macro F1 due to
it being a good measure of the performance with unbalanced classes, where the
most important is the least represented one. Finally we add the weighted F1 as
a comparison.
      </p>
      <p>The best results of each model can be seen in the table 4. The best results are
in bold. These metrics are very limited in comparison with the ones provided by
the challenge organisers. Still the validation metrics provided are promising,
specially the ones obtained by the C approach. Still further work must be performed
in order to improve the overall results.</p>
      <p>The results obtained in the o cial evaluation are shown in table 5. The best
results obtained for each metric are also shown in the aforementioned table.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>Five di erent approaches to the CLEF eRisk 2019 task 1 have been described. All
the approaches make use of some kind of neural networks and two of them
bene t from concepts such as transfer learning. Several hyper-parameters of those
models were nely-tuned in order to achieve the better performance possible.
Although our o cial results are very low, we can conclude that our models provide
promising results for the early detection of anorexia in social media, obtaining
an F1 score up to a 0.57 in the positive class. Not so good results were obtained
in the test experimentation, F1-wise, even so, for ERDE5 and ERDE50, results
close to the best ones were obtained.</p>
      <p>Still, further work is needed. We would like to feed the di erent approaches
with more kinds of embeddings such as concept embeddings, as well as to put
to test the usage of word embeddings and char embeddings in the same model.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by the Research Program of the Ministry of
Economy and Competitiveness - Government of Spain, (DeepEMR project
TIN201787548-C2-1-R).
(a) C model
(b) E model</p>
      <p>Fig. 5: Structure of models C and E.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random forests</article-title>
          .
          <source>Machine learning 45(1)</source>
          ,
          <volume>5</volume>
          {
          <fpage>32</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Quantifying mental health signals in twitter</article-title>
          .
          <source>In: Proceedings of the workshop on computational linguistics</source>
          and
          <article-title>clinical psychology: From linguistic signal to clinical reality</article-title>
          . pp.
          <volume>51</volume>
          {
          <issue>60</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.:</given-names>
          </string-name>
          <article-title>word2vec explained: deriving mikolov et al.'s negativesampling word-embedding method</article-title>
          .
          <source>arXiv preprint arXiv:1402.3722</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Goutte</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaussier</surname>
          </string-name>
          , E.:
          <article-title>A probabilistic interpretation of precision, recall and fscore, with implication for evaluation</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>345</volume>
          {
          <fpage>359</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gutierrez-Barqu n</surname>
          </string-name>
          , I.E.:
          <article-title>Alteraciones menstruales y anorexia nerviosa</article-title>
          .
          <source>Trastornos de la conducta alimentaria (3)</source>
          ,
          <volume>277</volume>
          {
          <fpage>284</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Karpathy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toderici</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shetty</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sukthankar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Largescale video classi cation with convolutional neural networks</article-title>
          .
          <source>In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>1097</volume>
          {
          <issue>1105</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.: erisk 2017:
          <article-title>Clef lab on early risk prediction on the internet: experimental foundations</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>346</volume>
          {
          <fpage>360</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>Overview of erisk: Early risk prediction on the internet</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>343</volume>
          {
          <fpage>361</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <source>Overview of eRisk</source>
          <year>2019</year>
          :
          <article-title>Early Risk Prediction on the Internet</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 10th International Conference of the CLEF Association</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2019</year>
          . Springer International Publishing, Lugano, Switzerland (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mohr</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schueller</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>Personal sensing: Understanding mental health using ubiquitous sensors and machine learning</article-title>
          .
          <source>Annual review of clinical psychology 13</source>
          ,
          <volume>23</volume>
          {
          <fpage>47</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Montgomery</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peck</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vining</surname>
            ,
            <given-names>G.G.</given-names>
          </string-name>
          :
          <article-title>Introduction to linear regression analysis</article-title>
          , vol.
          <volume>821</volume>
          . John Wiley &amp; Sons (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ortega-Mendoza</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco-Arcega</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-y Gomez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Peimex at erisk2018:
          <article-title>Emphasizing personal information for depression and anorexia detection</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Paul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyani</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Early detection of signs of anorexia and depression over social media using e ective machine learning frameworks</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Schapire</surname>
          </string-name>
          , R.E.:
          <article-title>Explaining adaboost</article-title>
          . In: Empirical inference, pp.
          <volume>37</volume>
          {
          <fpage>52</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Scholkopf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          :
          <article-title>Learning with kernels: support vector machines, regularization, optimization, and beyond</article-title>
          . MIT press (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Dynamic pooling and unfolding recursive autoencoders for paraphrase detection</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>801</volume>
          {
          <issue>809</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.:</given-names>
          </string-name>
          <article-title>Word embeddings and linguistic metadata at the clef 2018 tasks for early detection of depression and anorexia</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          , Chen12,
          <string-name>
            <surname>H.H.:</surname>
          </string-name>
          <article-title>A neural network approach to early risk detection of depression and anorexia on social media text</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.:
          <article-title>Text understanding from scratch</article-title>
          .
          <source>arXiv preprint arXiv:1502.01710</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.H.</given-names>
          </string-name>
          :
          <article-title>Understanding bag-of-words model: a statistical framework</article-title>
          .
          <source>International Journal of Machine Learning and Cybernetics</source>
          <volume>1</volume>
          (
          <issue>1-4</issue>
          ),
          <volume>43</volume>
          {
          <fpage>52</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>