<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uppsala University and Gavagai at CLEF eRISK: Comparing Word Embedding Models</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Gavagai</institution>
          ,
          <addr-line>Stockholm</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KTH Royal Institute of Technology</institution>
          ,
          <addr-line>Stockholm</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Uppsala University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2040</year>
      </pub-date>
      <abstract>
        <p>This paper describes an experiment to evaluate the performance of three di erent types of semantic vectors or word embeddings| random indexing, GloVe, and ELMo|and two di erent classi cation architectures|linear regression and multi-layer perceptrons|for the speci c task of identifying authors with eating disorders from writings they publish on a discussion forum. The task requires the classi er to process texts written by the authors in the sequence they were published, and to identify authors likely to be at risk of su ering from eating disorders as early as possible. The data are part of the eRISK evaluation task of CLEF 2019 and evaluated according to the eRISK metrics. Contrary to our expectations, we did not observe a clear-cut advantage using the recently popular contextualized ELMo vectors over the commonly used and much more light-weight GloVe vectors, or the more handily learnable random indexing vectors.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic vectors</kwd>
        <kwd>Word embeddings</kwd>
        <kwd>Author classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        metrics have been formulated to penalise missed cases, false positives, and late
detection [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Most authors in the test set discuss a broad range of innocuous topics
unrelated to self-harm and eating disorders. Many authors discuss eating disorders
without themselves being a icted, or even to discuss how they overcame their
ailments and no longer su er from them. To some extent, the hypotheses of the
challenge task is that even other writings may reveal personality traits or social
context of relevance for a diagnosis, but mostly, the task is about identifying
relevant texts among many less relevant ones, and to do so quickly, since waiting
incurs a penalty.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Previous Work</title>
      <p>In 2017, CLEF (Conference and Labs of the Evaluation Forum) introduced a new
laboratory, with the purpose to set up a shared task for Early Risk Prediction
on the Internet (eRisk). The rst edition was mainly meant as a trial run to
chart the speci c challenges and possibilities of this task.</p>
      <p>The rst full- edged shared task was launched in 2018. In what follows, we
will go over some of the strategies used by the teams that submitted a system
for Task 2, detection of anorexia, focusing on the approached most similar to
ours.</p>
      <p>Roughly, the solutions can be divided into traditional machine learning
approaches and other approaches based on di erent types of document and feature
representations, but many teams used a combination of both. Some researchers
also came up with innovative solutions to deal with the temporal aspect of the
task.</p>
      <p>
        A common theme was to focus on the di erence in performance between
manually engineered (meta-)linguistic features and automatic text vectorization
methods. For example, contributions of [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] both dealt with this research
question. For a more detailed description of [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] see below. The other team used
a combination of over 50 linguistic features for two of their models, and doc2vec
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which is a neural text vectorization method, for the other three. When
they submitted their 5 runs, they used the feature-based models alone or in
combination with the text vectorization models, but they report that they did
not submit any doc2vec model alone because of the poor performance shown in
their development experiments.
      </p>
      <p>
        Probably the most speci c challenge of this task was building a model which
could take the temporal progression into account. One of the teams that
obtained the best scores, the UNSL team [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], built a time-aware system which
used algorithms invented speci cally for this task. Among the other teams, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
use an approach that bears some resemblance to our system. They stacked two
classi ers, the rst one which predicted what they call the \mood" of the texts
(positive or negative), and the second which was in charge of making a decision
given this prediction. The main di erence is that they were operating with a
chunk based system, so they had to build models of di erent sizes to be able
to make a prediction without having seen all the chunks, whereas our second
classi er operates on a text-by-text basis. Furthermore, their rst models uses
Bayesian inversion on the text vectorization models, whereas we used a
feedforward neural network with LSTMs.
      </p>
      <p>
        Other notable approaches were to look speci cally at sentences which referred
to the user in the rst person [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], or to build di erent classi ers that specialized
in accurately predicting positive cases and negative cases [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. If one of the two
models' output rose above a predetermined threshold of con dence, that decision
was emitted; if none of the models or both of them were above the threshold, the
decision was delayed. Another team used latent topics to help in classi cation
and focused on topic extraction algorithms [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        The FHDO team [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] employed a machine learning approach rather similar
to ours for some their models. They submitted ve runs to Task 2, and they
obtained the best score in three out of ve evaluation measures. Models three
and four were regular machine learning models, whereas models one, two and
ve were ensemble models that combined di erent types of classi ers to make
predictions. This team used some hand-crafted metadata features for their rst
model, for example the number of personal pronouns, the occurrence of some
phrases like \my therapist", and the presence of words that mark cognitive
processes.
      </p>
      <p>Their rst and second models consisted of an ensemble of logistic regression
classi ers, three of them based on bags of words with di erent term weightings
and the fourth, present only in their rst model, based on the metadata features.
The predictions of the classi ers were averaged and if the result was higher than
0.4 the user was classi ed as at risk. These models did not obtain any high scores,
contrary to other models submitted by this team.</p>
      <p>Their third and fourth models were convolutional neural networks (CNN)
with two di erent types of word embeddings: GloVe and FastText. The GloVe
embeddings were 50-dimensional, pre-trained on Wikipedia and news texts,
whereas the FastText embeddings were 300-dimensional, and trained on social
media texts expressly for this task. The architecture of the CNN was the same for
both models, with one convolutional layer and 100 lters. The threshold to emit
a decision of risk was set to 0.4 for model 3 and 0.7 for model 4. Unsurprisingly,
the model with larger embedding size and speci cally trained vectors performed
best, reporting the highest recall (0.88) and the lowest ERDE50 (5.96%) in the
2018 edition of eRisk. ERDE stands for Early Risk Detection Error and is an
evaluation metric created to track the performance of early risk detection
systems (see Section 4).</p>
      <p>
        The fth model presented in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] was an ensemble of the two CNN models
and their rst model, the bag of words model with metadata features. This model
obtained the highest F1 in the shared task, namely 0.85, and came close to the
best scores even for the two ERDE measures.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Conditions and Processing Pipeline</title>
      <p>We have two experimental foci: the representation of lexical items, and the
classi cation step given such representations.
3.1</p>
      <sec id="sec-3-1">
        <title>Semantic Vectors or Word Embeddings</title>
        <p>
          We represent lexical items in the posts under analysis as word embeddings,
vectors of real numbers, under the assumption that a vector space representation
allows for generalisation from the lexical items themselves to a more conceptual
level of semantics. By allowing the classi cation scheme to relax the
representation to include near neighbours in semantic space, we hope to achieve better
recall than otherwise were possible. Semantic vectors are convenient as a
learning representation, allowing for aggregation of distributional context, but if used
blindly, risk bringing in contextual information of little or even confounding
relevance. In general, semantic spaces built from similar data sets with similar
aggregation parameters should represent the same information and the actual
aggregation process is of less importance, but implementational details may have
e ects on the usefulness of the semantic space. Parameters of importance have
obviously to do with size and selection of data set, but also how the
distributional context is de ned, the dimensionality or level of compression of the
representation, weighting of items based on their information content, and how
rare or previously unseen items are treated. In these experiments we compare
three semantic vector models: Random Indexing which is used in commercial
applications; GloVe, which is used in a broad range of recent academic experiments;
and the recently published ELMo, which has shown great promise to provide a
better and more sensitive representation for general purpose application.
Random Indexing Random indexing is based on the Sparse Distributed
Memory model formulated by Pentti Kanerva [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] which is intended to be both
neurophysiologically plausible and e ciently implementable for large streaming data.
Random indexing is built for seamless online learning without explicit
compilation steps, and is based on a xed-dimensional representation of typically around
1000 dimensions. The vectors are built by simple operations: each lexical item is
assigned a randomly generated sparse index vector and an initially empty
context vector. The latter is populated for each lexical item by, for each observed
occurrence of it, adding in index vectors of items observed within a context of
interest such as a window of preceding and succeeding items. If the objective of
the semantic space is to encode synonymy or other close semantic relations, a
window of two preceding and succeeding items is used as a context. Preceding
and succeeding items are kept separate to preserve sequential information in the
representation, implemented by applying separate permutations for preceding
and succeeding items [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. In the present experiments, we use a large
2000dimensional semantic space trained on several years of social and news media
by Gavagai for inclusion in their commercial tools [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Vectors are normalised
to length 1 and items that are not found in the vocabulary are represented with
empty vectors and thus do not contribute to the classi cation.
        </p>
        <p>
          GloVe Global Vectors (or GloVe for short) are semantic vectors which are built
to provide downstream processes with handy access to lexical cooccurrence data
from large data sets [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The vectors are populated with data from cooccurrence
within a 15-word window, thus providing a more associative relation between
items than the random indexing model above. The quality of the vectors has
proven useful for a wide range of tasks and GloVe vectors have in recent years
been used as a standard way of achieving a conceptual generalisation from simple
words in text. There are several GloVe vector sets that can be retrieved at no
cost, and in these experiments we chose a 200-dimensional set provided by the
Stanford NLP group trained on microblog data which we judged to be the closest
t to the data under analysis.4 Items that are not found in the vocabulary are
replaced with a stand-in vector populated with values from a normal distribution
where the mean and standard deviation are obtained from all available vectors.
ELMo Semantic vector models in general produce vectors that are intended to
encode information from language usage in general (or language usage in the
training set). They do not accommodate to the speci c task at hand and are
trained on large amounts of previous knowledge. Recent approaches try to
address the challenge of domain and task accommodation more explicitly by
combining a previously trained general representation with a more rapid learning
process on the data set under analysis. For linguistic data, ELMo (Embeddings
from Language Models) proposed by [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is one such model. ELMo
representations are di erent from traditional semantic vectors in that individual vectors
are generated for each token in the data under analysis, based on a large
pretrained language model represented in a richer three-level representation trained
on sentence-by-sentence cooccurrences. The ELMo processing model
incorporates a character-based model which means that no items will be out of
vocabulary: previously unseen items inherit a representation based on the similarity
of their character sequence to other known items. We use the AllenNLP python
package to generate vectors [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Each lexical item is represented by an average
of the three ELMo layers in one 1024-dimensional vector and they are passed in,
sentence by sentence to the classi er.
        </p>
        <p>Baseline representation As a baseline we use randomly initialized word
embeddings obtained from the Keras embedding layer. First a tokenizer is used to
obtain a list of all lexical items in the training set. Only the top most common
10,000 words are considered for the classi cation task, and they are converted
into 100-dimensional word vectors generated by Keras. These vectors thus
contain no information about previous usage of the lexical items.
4 https://nlp.stanford.edu/projects/glove/
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Classi er</title>
        <p>
          The rst step in our processing pipeline involves building a text classi er. Texts
are classi ed to be written either by authors with eating disorders or by
authors without eating disorders. This is in keeping with the underlying
hypothesis above, that some characteristics of authors with eating disorders may be
discernible even in texts about other topics. Text classi cation is done with a
Recurrent Neural Network (RNN) implemented with Long Short-Term Memory
cells (LSTMs). Recurrent neural networks are neural architectures where the
output of the hidden layer at each time step is also used as input for the
hidden layer at the next time step. This type of processing model is particularly
suitable for tasks that involve processing of sequences, for example sentences
in natural language. LSTM cells retain information over longer distances than
regular RNN cells [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Our neural architecture consists of an embedding layer,
two hidden layers of size 100 and a fully connected layer with one neuron and
sigmoid activation (as illustrated in Figure 3.2). The embedding layer di ers
according to which type of representations we use for each model, whereas the rest
of the model is equivalent for all of our neural models. The output layer with a
sigmoid activation function makes sure that the network assigns a probability to
each text instead of a class label. We set the maximum sentence length to 100
words and the vocabulary to 10,000 words in order to make the training process
more e cient.
        </p>
        <p>This recurrent neural network takes care of the text classi cation task: it
outputs the probability that each text belongs to the 1 (at risk) class. The
output of the text classi er is passed on as input to a second author classi er in
a feature vector composed of the following elements:
{ The number of texts seen up to that point, min-max scaled to match the
order of magnitude of the other features
{ The average score of the texts seen up to that point
{ The standard deviation of the scores seen up to that point
{ The average score of the top 20% texts with the highest scores
{ The di erence between the average of the top 20% and the bottom 20% of
texts.</p>
        <p>We experimented with two architectures for the author classi er: logistic
regression and multi-layer perceptron. Logistic regression is a linear classi er that uses
a logistic function to model the probability that an instance belongs to the
default class in a binary classi cation problem. A multi-layer perceptron, on the
other hand, is a deep feed-forward neural network, and therefore a non-linear
classi er. We tested their performance by feeding each architecture with
identical input from the text classi er. We varied di erent hyperparameters such
as embedding size, hidden layer size, number of layers, vocabulary size, etc., to
nd the best combination, also taking practical issues such as training time into
account. One important factor to keep in mind is that we wanted to compare
word embedding methods, so it was desirable to have the same (or very similar)
settings for all models. During our development phase we found that often a
hyperparameter setting that worked well for one model was not ideal for another
model, and compromises had to be made.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Practical Considerations</title>
        <p>
          For the implementation we use Sci-kit learn [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and Keras [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], two popular
Python packages that support traditional machine learning algorithms as well
as deep learning architectures, and we use NLTK for preprocessing purposes [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
We pre-processed the documents in the same way for all our runs: we used the
stop-word list provided with the package Natural Language Toolkit, but we did
not remove any pronouns, as they have been found to be more prominent in the
writing style of mental health patients. We replaced URLs and long numbers
with ad hoc tokens and the Keras tokenizer lters out punctuation, symbols and
all types of blank space characters.
        </p>
        <p>We only took into consideration those messages where at least one out of the
text and title elds was not blank. Similarly, at test time we did not process
blank documents, instead we repeated the prediction from the previous round if
we encountered an empty document. If any empty documents appeared in the
rst round, we emitted a decision of 0, following the rationale that in absence of
evidence we should assume that the user belongs to the majority class.</p>
        <p>The text classi er is trained on the training set with a validation split of 0.2
using model checkpoints to save the models at each epoch, and early stopping
based on validation loss. Two dropout layers are added after the hidden LSTM
layers with a probability of 0.5. Both early stopping and dropout are intended to
avoid over tting, given that the noise in the data makes the model more prone
to this type of error.</p>
        <p>
          For the author classi er, we experimented with di erent settings for logistic
regression and the multi-layer perceptron. For the logistic regression classi er, we
used the SAGA optimizer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We used balanced class weight to give the minority
class (the positive cases) more importance during training. For the multi-layer
perceptron, we used two hidden layers of size 10 and 2.
        </p>
        <p>Since we need to focus on early prediction of positive cases and on recall,
precision in our system tends to su er. In order to improve precision as much as
possible, we experimented with di erent cut-o points for the probability scores
to try to reduce the number of false positives as much as possible. We ended up
using a high cut-o probability of 0.9 for a positive decision, because we found
that this did not a ect our recall score too badly, and it did help improve
precision. We made the practical assumption that a good balance between precision
and recall would more useful in a real-life setting than really good scores on the
early prediction metrics.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Metrics</title>
      <p>Precision and Recall Precision and recall are calculated over only the positive
items in the test set and they are combined into the F1 score in the traditional
way.</p>
      <p>
        ERDE Originally proposed by the eRisk organisers in 2016 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and applied
in every year since, the Early Risk Detection Error (ERDE) score takes into
account both the correctness of the decision and the number of texts needed
to emit that decision. ERDE assigns each classi cation decision|in this case,
identifying a user to be ill or healthy|by a system an editorially determined
cost: cfn for false negatives, cfp for false positives, ctn for true negatives, and ctp
for true positives. The true positive score ctp is weighted by a latency cost factor
lc(o; k), where k is the number of texts seen by the system before a decision is
made and o is a parameter to control for how many texts are considered to be
acceptable or expected before a decision is made. The lc(o; k) factor increases
rapidly after o texts have been seen. The objective is to minimise this score. In
the 2019 evaluation cycle, ctn was set to 0, cfn to 1, cfp to the relative frequency
of the positive items in the test set, ctp to 1, and o variously to 5 and 50, shown
as ERDE5 and ERDE50 respectively.
      </p>
      <p>
        Latency Proposed by Sadeque et al [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], the latency measure is the median
of the number of documents seen by the system until it makes a determination
that a user is at risk. This is only computed for true positives, and thus carries
no penalty for false or missed positives. The latency score is can be reformulated
as a speed factor which is used to rescore the raw F1 score to a latency-weighted
Flatency score. A system which identi es positive items from their rst writing
will have F1 = Flatency.
      </p>
      <p>
        Ranking based metrics: P@10, nDCG@10, nDCG@100 The
participating systems were required to rank the users in order of assessed risk, and then
the precision of that list was measured at 10 items, and compared to a
perfect ranking at 10 and at 100 using the normalised discounted cumulative gain
measure (nDCG) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>O</p>
      <p>cial Results
We submitted 5 runs to the o cial test phase, the maximum number allowed for
each team. They are listed in Table 1. A total of 13 teams successfully completed
at least one run in eRISK. Some teams stopped processing texts before the
stream of 2000 texts was exhausted. Unfortunately, due to a processing error,
our submissions were among them: we only processed the rst round of texts
and emitted our decisions based on that. Our o cial scores are thus not based
on the entire test material but are an extreme case of early risk detection, based
on the rst text round only. The results are given in Tables 2 and 3.</p>
      <p>Run ID</p>
      <p>Vector type</p>
      <p>for
text classi er
0
1
2
3
4</p>
      <p>Baseline
Baseline
GloVe
GloVe
Random
indexing</p>
      <p>Author classi er
Logistic
Regression
Multi-layer
Perceptron
Logistic
Regression
Multi-layer
Perceptron
Multi-layer
Perceptron</p>
      <p>The model with the best performance was run 4, with random indexing
vectors and a multi-layer perceptron. This holds for both the decision-based
evaluation and the ranking-based evaluation. The only exception to this is the best
recall score, obtained by the baseline model with a logistic regression classi er.
In development experiments we found that the random indexing model had the
least number of false positives and that the multi-layer perceptron balances
precision and recall well. We believe that the GloVe model, in combination with the
multi-layer perceptron, is too conservative to give a good performance after only
one round of texts, whereas the random indexing model strikes a better balance
early on in the data stream.</p>
      <p>Compared to the other submissions, our scores for the decision-based
evaluation were excellent in terms of latency, speed, and ERDE5, since we always
made our decisions at the rst possible time. On most other evaluation
parameters our o cal scores were ranked in the lower third, if compared to the best
scores of the other participants. For the ranking results, given in Table 3, the
results were more respectable (although due to the processing error, they did
not change as more data was processed).
5.2</p>
      <sec id="sec-5-1">
        <title>Continued Experimentation</title>
        <p>
          After the o cial testing period was over, the organizers made the test set
available to the participating teams. This allowed us to carry out continued
experiments, including ELMo which was not practicable during the o cial training
period due to lengthy processing times. Table 4 shows the performance of our
models on the o cial test set. We used a script provided by the organizers to
evaluate precision, recall, F1, and ERDE, so the results should be comparable
to the o cial ones. These results should be comparable to the results reported
in the table, because they are obtained under the same testing conditions. We
found that once the processing error was sorted out, we were able to produce
scores on par with the top participants: our best F1 score on the test set was
0.68, whereas the best F1 in the shared task was 0.71, and we obtained a recall
of 0.9 which is close to the best score of 1.0, obtained by a team that heavily
sacri ced precision. For ERDE5 and ERDE50 more than one team shared the
rst place with the same non-perfect scores of respectively 0.06 and 0.03. These
values are likely rounded up the the nearest percentage point, and if we do the
same thing with our continued results, we actually obtain an ERDE5 of 0.04
for all the vector representations in using the logistic regression model, and an
ERDE50 of 0.02 for our GloVe/ELMo and logistic regression models. More
details about these further experiments can be found in a comprehensive report
by Fano [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>Baseline Rand. Ind. GloVe ELMo Best o cial score
LSTM
classi er</p>
        <p>Accuracy 96.17
We found that in the continued experiments, the model with GloVe embeddings
and the multi-layer perceptron classi er had the best precision, without sacri
cing recall. ELMo vectors did not make much of a di erence for the multi-layer
perceptron condition, but held a slight edge on the generally lower performing
logistic regression classi er. In general, the bene t of using knowledge from the
generalised vector models was relatively small. Compared to the baseline, the
three pre-trained models show a better balance between precision and recall, and
they also show worse ERDE scores, which are a symptom of a more conservative
behavior, especially in the early phases.</p>
        <p>Regarding the di erence between the logistic regression and multi-layer
perceptron classi ers, we could detect a clearer trend on the test set than we had on
the development set. We had already observed that logistic regression seemed to
lead to worse precision scores, but on the test set we could also determine that
it also gave rise to better ERDE scores. This result can be explained as follows:
if the system incurs in many false positives, it will likely also be able to correctly
identify the true positives, and zooming in on many true positives early on also
leads to good ERDE scores.</p>
        <p>The more far-reaching conclusions that can be drawn from our experiments
is that the choice of representation and classi er does have some e ect on the
results, and that the chronological aspect of this task made clear the compound
e ect of learning curves and robustness of the combination of the two: more
conservative models which are likely to perform better in the long run su er
from not daring to pronounce a decision early in the sequence.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit.</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cacheda</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novoa</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carneiro</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Analysis and experiments on early detection of depression</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.: Keras. https://keras.io (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Defazio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>F.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacoste-Julien</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives</article-title>
          .
          <source>Computing Research Repository (CoRR)</source>
          (
          <year>2014</year>
          ), http://arxiv.org/abs/1407.0202
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fano</surname>
          </string-name>
          , E.:
          <article-title>A comparative study of word embedding methods for early risk prediction on the Internet</article-title>
          .
          <source>Master's thesis</source>
          , Uppsala University (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Funez</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ucelay</surname>
            ,
            <given-names>M.J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burdisso</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cagnina</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-y Gomez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Errecalde</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          :
          <article-title>Unsls participation at erisk 2018 lab</article-title>
          . In: Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          . vol.
          <volume>2125</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grus</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tafjord</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dasigi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.S.:</given-names>
          </string-name>
          <article-title>Allennlp: A deep semantic natural language processing platform</article-title>
          . In: arXiv (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Jarvelin,
          <string-name>
            <surname>K.</surname>
          </string-name>
          , Kekalainen, J.:
          <article-title>Cumulated gain-based evaluation of ir techniques</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS) 20(4)</source>
          ,
          <volume>422</volume>
          {
          <fpage>446</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kristoferson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holst</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Random indexing of text samples for latent semantic analysis</article-title>
          .
          <source>In: Proceedings of the 22nd Annual Meeting of the Cognitive Science Society (CogSci)</source>
          . vol.
          <volume>22</volume>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>In: International conference on machine learning</source>
          . pp.
          <volume>1188</volume>
          {
          <issue>1196</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A test collection for research on depression and language use</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 7th
          <source>International Conference of the CLEF Association</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <source>Overview of eRisk</source>
          <year>2019</year>
          :
          <article-title>Early Risk Prediction on the Internet</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 10th International Conference of the CLEF Association</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Maupome</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meurs</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Using topic extraction on social media content for the early detection of depression</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ortega-Mendoza</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco-Arcega</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , y Gomez,
          <string-name>
            <surname>M.M.</surname>
          </string-name>
          : Peimex at erisk2018:
          <article-title>Emphasizing personal information for depression and anorexia detection</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proceedings of the</source>
          <year>2018</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ragheb</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moulahi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aze</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Servajean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Temporal mood variation: at the clef erisk-2018 tasks for early risk detection on the internet</article-title>
          .
          <source>In: CLEF: Conference and Labs of the Evaluation</source>
          . p.
          <fpage>78</fpage>
          . No.
          <volume>2125</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ramiandrisoa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          : Irit at e-risk
          <year>2018</year>
          . In: E-Risk workshop. pp.
          <volume>367</volume>
          {
          <issue>377</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sadeque</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Measuring the latency of depression detection in social media</article-title>
          .
          <source>In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM)</source>
          .
          <source>ACM</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gyllensten</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Espinoza</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamfors</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlgren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Persson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viswanathan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holst</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Gavagai living lexicon</article-title>
          .
          <source>In: Proceedings of the Language Resources and Evaluation Conference (LREC)</source>
          .
          <source>ELRA</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holst</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Permutations as a means to encode order in word space</article-title>
          .
          <source>In: Proceedings of The 30th Annual Meeting of the Cognitive Science Society (CogSci)</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.:</given-names>
          </string-name>
          <article-title>Word embeddings and linguistic metadata at the clef 2018 tasks for early detection of depression and anorexia</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          . vol.
          <volume>2125</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>