=Paper= {{Paper |id=Vol-1866/paper_107 |storemode=property |title=LIDIC - UNSL's Participation at eRisk 2017: Pilot Task on Early Detection of Depression |pdfUrl=https://ceur-ws.org/Vol-1866/paper_107.pdf |volume=Vol-1866 |authors=María Paula Villegas,Dario Gustavo Funez,María José Garciarena Ucelay,Leticia Cecilia Cagnina,Marcelo Luis Errecalde |dblpUrl=https://dblp.org/rec/conf/clef/VillegasFUCE17 }} ==LIDIC - UNSL's Participation at eRisk 2017: Pilot Task on Early Detection of Depression== https://ceur-ws.org/Vol-1866/paper_107.pdf
       LIDIC - UNSL's parti ipation at eRisk 2017:

       Pilot task on Early Dete tion of Depression

                      Notebook for eRisk at CLEF 2017

                                      1                  1
                   Ma. Paula Villegas , Dario G. Funez ,
                          1                    1,2                           1
Ma. José Gar iarena U elay , Leti ia C. Cagnina    , and Mar elo L. Erre alde

                                   1
                                       LIDIC Resear h Group,
                          Universidad Na ional de San Luis, Argentina
       2
            Consejo Na ional de Investiga iones Cientí as y Té ni as (CONICET)
        {villegasmariapaula74,funezdario,mjgar iarenau elay}gmail. om
                        {l agnina,merre alde}gmail. om


           Abstra t. In this paper we des ribe the parti ipation of the LIDIC Re-
           sear h Group of Universidad Na ional de San Luis (UNSL) - Argentina at
           eRisk 2017 pilot task. The main goal of this task is   onsidering early risk
           dete tion s enarios, depression in this    ase, where the issue of getting
           timely predi tions with a reasonable      onden e level be omes     riti al.
           In the pilot task, systems must be able to sequentially pro ess pie es
           of eviden e and dete t early tra es of depression as soon as possible.
           The used data set is a      olle tion of writings labeled as depressed or
           non-depressed that were released to the pilot task parti ipants in two
           dierent stages: training and test. Our proposal for this task was based
           on a semanti    representation of do uments that expli itly    onsiders the
           partial information that is made available in the dierent  hunks to the
           early risk dete tion systems along the time. That temporal approa h
           was   omplemented with other standard text        ategorization models in
           some spe i    situations that seemed not to be     orre tly addressed by
           our approa h alone. In the test stage, the resulting system obtained the
           lowest ERDE50 error on a total of 30 submissions from 8 dierent in-
           stitutions. However, on e the golden-truth of the testing data set was
           released, we   ould verify that our temporal approa h alone might have
           obtained very robust results and the lowest reported ERDE error for
           both thresholds used in the pilot task.
           Keywords: Early Risk Dete tion, Unbalan ed Data Sets, Text Repre-
           sentations, Semanti   Analysis Te hniques.




1      Introdu tion
The widespread use of Internet, so ial networks and other             omputer te hnologies
has marked the beginning of a new era in the             ommuni ation among people. In
this   ontext, we have now available a lot of information that might be useful to
dete t, in a timely way, those situations that might be potentially dangerous or
risky for the physi al well-being and health of individuals,            ommunities and so-
 ial organizations. As examples of those situations we            an mention the dete tion
of potential paedophiles, people with sui idal in linations, or people sus eptible
to depression, among others.

     This type of situations have started to be studied in a new resear h eld
known as       early risk dete tion (ERD) whi h has re eived in reasingly interest
from s ienti      resear hers at world level due to the important impa t that it
 ould have in many relevant and           urrent problems of the real world. In this
 ontext, this year was organized the rst early risk predi tion        onferen e eRisk
2017
       3 in the ontext of the CLEF 2017 Workshop. As part of this event, a pilot
task was organized and the present arti le des ribes our parti ipation in this
task.

     The eRisk 2017 pilot task was organized into two dierent stages: training
and test stages. It also assumed an ERD s enario, that is, data are sequentially
read as a stream and the       hallenge    onsists in dete ting risk   ases as soon as
possible. In order to reprodu e this s enario, the eRisk 2017's organizers released
the test data set following a sequential,  hunk by      hunk   riterion; that is, the
rst    hunk    ontained the oldest 10% of the messages, the se ond     hunk   ontains
the se ond oldest 10%, and so forth up to       omplete 10   hunks that represent the
full writing of the analysed individuals.

     To deal with this problem we implemented an original proposal that we
named    temporal variation of terms (TVT) [3℄. However, at the training stage,
TVT seemed to show some weakness at spe i             hunks so we de ided to     om-
plement it with other standard methods to help our approa h in these spe i
situations. Thus, our implemented ERD system is in fa t a        ombined system that
strongly depends on TVT but also uses other methods as additional sour e of
opinions. The resulting system had a very a       eptable performan e on the data
set released in the test stage obtaining the lowest ERDE50 error on a total of 30
submissions from 8 dierent institutions. However, on e the golden-truth of the
test data set was released, we    ould verify that TVT alone might have obtained
very robust results and the lowest reported ERDE error for both thresholds
used in the pilot task. For this reason, we also in lude an additional se tion in
this work with preliminary results obtained with TVT alone on the test set in
order to observe the potential of this method for general ERD problems.

     The rest of the arti le is organized as follows: Se tion 2 des ribes some general
aspe ts of the data set used in the pilot task and the methods used in our ERD
system. Next, in Se tion 3 the a tivities       arried out in the training stage are
des ribed and the justi ations of the main design de isions made on our ERD
system, are presented. Se tion 4 shows the obtained results with our proposal
on the eRisk 2017 dataset released in the test stage. Then, some        omplementary
results are presented in Se tion 5 where interesting aspe ts are shown on the
performan e of TVT working alone on the test set. Finally, Se tion 6 depi ts
potential future works and the obtained        on lusions.




3
    http://early.irlab.org/
2      Data set and methods

2.1     Data Set

The data set used in the pilot task of the eRisk          ompetition
                                                                     4 was initially de-
s ribed in [8℄. It is a    olle tion of writings (posts or     omments) from a set of
So ial Media users. There are two           ategories of users, depressed and non-
depressed and, for ea h user, the        olle tion   ontains a sequen e of writings (in
 hronologi al order). For ea h user, his        olle tion of writings has been divided
into 10    hunks. The rst     hunk     ontains the oldest 10% of the messages, the
se ond    hunk    ontains the se ond oldest 10%, and so forth. This        olle tion was
split into a training and a test set that we will refer as T RDS and T E DS re-
spe tively. The T RDS set       ontained 486 users (83 positive, 403 negative) and
the (test) T E DS set     ontained 401 users (52 positive, 349 negative). The users
labeled as positive are those that have expli itly mentioned that they have been
diagnosed with depression.
      The pilot task was divided by their organizers into a        training stage and a
test stage. In the rst one, the parti ipating teams had a        ess to the T RDS data
set with all    hunks of all training users. They     ould therefore tune their systems
with the training data. Then, in the       test stage, the 10 hunks of the T E DS data
set were gradually released by the organizers one by one until         ompleting all the
 hunks that      orrespond to the      omplete writings of the    onsidered individuals.
Ea h time that a       hunk chi was released, parti ipants in the pilot task were
asked to give their predi tions on the users      ontained in the T E DS , based on the
partial information read from       hunks ch1 to chi .




2.2     Methods

In order to deal with the problem posed in the pilot task we proposed a new
do ument representation named          temporal variation of terms (TVT) [3℄. TVT is
an elaborate approa h that requires enough spa e to be des ribed in an adequate
way. For this reason, in the present work we will only fo us on the de ision aspe ts
that were      onsidered at the pilot task to    ombine TVT with other well-known
do ument representation approa hes like          Bag of Words (BoW). Thus, we will
give below a separate des ription of the method with its main             hara teristi s
but the interest reader     an obtain a detailed des ription of TVT in [3℄.
      We used in our experiments dierent do ument representations and learning
algorithms. Therefore, we start giving some general ideas of those do ument
representations and a little more detailed explanation of TVT. After that, some
general    onsiderations on the used learning algorithms are briey introdu ed.




Do ument representations

4
    http://early.irlab.org/task.html
   Bag of Words. The traditional Bag of Words (BoW) representation is one of
    the language models most used in text       ategorization tasks. In BoW repre-
    sentations, features are words and do uments are simply treated as             olle -
    tions of unordered words. Formally, a do ument d is represented by the ve tor
    of weights dBoW    = (w1 , w2 , ..., wn ) where n is the size of the vo abulary of
    the dataset. Ea h weight wi is a value that is assigned to ea h feature (word)
    a   ording to whether the word appears in a do ument or how frequently this
    word appears. This popular representation is simple to implement, fast to
    obtain and    an be used under dierent weighting s hemes (boolean, term
    frequen y (tf ) or term frequen y - inverse do ument frequen y (tf − idf )).
    In our study we used BoW with the boolean weighting s heme.

 Con ise Semanti       Analysis (or Se ond Order Attributes (SOA)). Con ise
    Semanti    Analysis is a semanti   analysis te hnique that interprets words and
    text fragments in a spa e of    on epts that are   lose (or equal) to the    ategory
    labels. For instan e, if do uments in the data set are labeled with q dierent
     ategory labels (usually no more than 100 elements), words and do uments
    will be represented in a q -dimensional spa e. That spa e size is usually mu h
    smaller than standard BoW representations whi h dire tly depend on the
    vo abulary size (more than 10000 or 20000 elements in general). CSA has
    been used in general text      ategorization tasks [6℄ and has been adapted to
    work in author proling tasks under the name of Se ond Order Attributes
    (SOA) [7℄.

 Chara ter 3-grams. A n-gram is a sequen e of n            hara ters obtained from
    ea h text in the dataset. In the n-gram model, the di tionary          ontains all
    n-grams that o ur in any term in the vo abulary. The representations using
     hara ter n-grams have demonstrated to be ee tive in many appli ations
    [5℄ where n-grams are onsidered the terms used in BoW representations.

   LIW C -based features. Features derived from Linguisti Inquiry and Word
    Count (LIWC)[11, 10℄ have been used in several studies related to psy ho-
    logi al aspe ts of individuals. LIWC has been su        essfully used to identify
    depressed and non-depressed people analyzing linguisti        markers of depres-
    sion su h as the use of the personal pronouns and positive-negative emotions
    [12℄ and the presen e of words related to the death (e.g., dead, kill, sui-
     ide), sex (e.g., arouse, makeout, orgasm) and ingestion (e.g.,  hew,
    drink, hunger) besides emotions also resulting useful [1℄. Studies on sui-
     idal individuals have in orporated LIWC as a valuable tool to extra t in-
    formation related to sui ide and sui idal ideation analyzing the            ategories
    death, health, sad, they, I, sexual, ller, swear, anger, and negative emotions
    [2, 4℄. Due to the fa t we wanted to     onsider more meaningful features, we
    also   onsidered in preliminary studies the most informative features belong-
    ing to linguisti   dimensions (for example, personal pronouns and number
    of fun tion words), summary of language variables (for example, di tionary
    words and words with more than 6 letters), psy hologi al pro ess (for ex-
    ample, negative emotions and ae tive pro esses) and, grammar (verbs and
    numbers) and pun tuation (number of apostrophes) aspe ts.
Temporal Variation of Terms (TVT)                   The   temporal variation of terms
(TVT) method [3℄ is an approa h for early risk dete tion based on using the
variation of vo abulary along the dierent time steps as       on ept spa e for do u-
ment representation. This method is the key          omponent of our EDS system for
the eRisk pilot task.
     TVT is based on the       on ise semanti      analysis (CSA) te hnique proposed
in [6℄ and later extended in [7℄ for author proling tasks. In this       ontext, the
underlying idea is that variations of the terms used in dierent sequential stages
of the do uments may have relevant information for the         lassi ation task. With
this idea in mind, this method enri hes the do uments of the minority             lass
with the partial do uments read in the rst          hunks. These rst   hunks of the
minority    lass, along with their    omplete do uments, are      onsidered as a new
 on ept spa e for a CSA method.
     TVT naturally      opes with the sequential    ara teristi s of ERD problems and
also gives a tool for dealing with unbalan ed data sets. Preliminary results of this
method in     omparison to CSA and BoW representations [3℄ showed its potential
to deal with ERD problems.



Learning algorithms In preliminary studies we tested dierent learning algo-
rithms and we obtained the best results with Random Forest and Naïve Bayes in
several
                                                                    5
          omparisons with other popular methods like LIBSVM . We also used in
our system, a de ision tree (Weka's J48) obtained by rst sele ting the 100 words
with the highest information gain and then removing from that list those words
that were     onsidered dependent of spe i        domains (names of politi ians like
Obama and      ountries like China). The J48 algorithm trained on this subset
of features obtained a de ision tree of only 39 nodes       ontaining some interesting
words like meds, depression, therapist and  ry, among others. As we will
see later, this de ision tree was used to assist to the TVT method in the initial
 hunks.
     Even though we      arried out several   omparative studies with LIWC features,
 hara ter 3-grams, CSA representations and the LIBSVM algorithm, we de ided
to sele t only those approa hes that seemed to be ee tive to assist TVT in some
spe i    situations. For this reason, in our ERD system we only used the Random
Forest and Naïve Bayes algorithms with BoW and TVT representations and the
J48 de ision tree previously explained.



3     Training Stage
The pilot task was divided into a training stage and a test stage. So, in the
rst one, the parti ipating teams had a         ess to the T RDS set with all   hunks
of all training users. Therefore, we    ould tune in this stage our system with the
training data.

5
    For all the tested algorithms we used the implementations provided by the Weka
    data mining tool.
      Even though the models obtained with the TVT representation showed in
several preliminary studies to give better results than BoW and CSA (see [3℄ for
a detailed study), we la ked of a robust        riterion to solve a   riti al aspe t in ERD
problems that we will refer as thelassi ation time de ision (CTD) issue. That
is, an ERD system needs to onsider not only whi h lass should be assigned
to a do ument; it also needs to de ide when to make that assignment. In other
words, ERD methods must have a riterion to de ide when (in what situations)
the    lassi ation generated by the system is        onsidered the nal/denitive de i-
sion on the evaluated instan es. Although this aspe t has been addressed with
very simple heuristi
                                 6
                            rules , we wanted to have a rule for the CTD issue that
attempted to exploit the strengths of the TVT method and simultaneously al-
leviate its weaknesses. With this goal in mind, we began a thorough study to
determine in whi h situations the TVT method behaved satisfa torily and when
it needed to be improved.
      We started our study by dividing the training set T RDS into a new training
set that we will refer as T RDS − train and a test set named T RDS − test.
Those sets maintained the same proportions of post per user and words per
user as des ribed in [8℄. T RDS − train and T RDS − test were generated by
randomly sele ting around a 70% of writings for the rst one and the rest 30%
for the se ond one. Thus, T RDS − train resulted in 351 individuals (63 positive,
288 negative) meanwhile T RDS − test            ontains 135 individuals (20 positive, 115
negative). The division into 10         hunks of the writings provided by the organizers
was kept for both        olle tions.
   Then, we analyzed the performan e of the TVT method by using the T RDS −
train set as training set and evaluating the obtained results with T RDS − test.
We started assuming that the           lassi ation is made on a stati    hunk by   hunk
basis. That is, for ea h      hunk Ĉi provided to the ERD systems we evaluated the
TVT's performan e          onsidering that the model is (simultaneously) applied to the
writings re eived up to the          hunk Ĉi . With this type of study it was possible to
observe to what extent the TVT method was robust to the partial information in
the dierent stages, in whi h moment it started to obtain a            eptable results, and
other interesting statisti s. In this        ontext, we   onsidered that the F1 measure
whi h     ombines pre ision and re all would be a valuable measure to gain some
insight into the performan e of the TVT method along the dierent                hunks.
      Figure 1 shows this type of information where we         an see that TVT represen-
tations with two dierent learning algorithms (Naïve Bayes (NB) and Random
Forest (RF)) rea hed the highest F1 values in the           hunks 3 and 4. In both       ases,
those values were obtained when an instan e was                lassied as positive if the
probability assigned by the          lassier was greater or equal than 0.6 (p ≥ 0.6). Be-
sides, we        ould observe that in the rst    hunk, TVT representations produ ed
a high number of false positives. That weakness of TVT methods is shown in
Figure 2 where we        an see the lowest pre ision values in the     hunk 1. That prob-
lem of TVT methods looks reasonable if we             onsider that TVT is based on the

6
    For instan e, ex eeding a spe i       onden e threshold in the predi tion of the    las-
    sier [8℄.
variations of terms along the time. Thus, we        an   on lude that in the rst two
hunks, where little information is available, TVT will need to be assisted by
other   omplementary methods.




                      Fig. 1. F1 values obtained with TVT models.




                 Fig. 2. Pre ision values obtained with TVT models.




     Due to the fa t that we wanted to fo us our predi tions on those               hunks
where best results were obtained ( hunks 3 and 4) and assuming that after those
hunks the penalization      omponent in the ERDE          omputation would ae t the
TVT's results we de ided to set some  hunk by           hunk rules that a        omplish
 ertain basi   properties in the dierent    hunks:


1.   Chunk 1 : Here the ERD system should be extremely onservative and only
      lassifying an instan e as positive (depressed) if there exists strong eviden e
     of that. We used for this    ase, the   riterion of only    lassifying an instan e
     as positive ifboth models obtained with TVT-NB and TVT-RF lassied
     the instan e as positive with probability p = 1 and the text in luded all
     the words in a white list. That list was obtained from the words with the
     highest information gain of the do uments of the rst       hunk. It in luded the
     words depression and diagnosed.
2.   Chunk 2 : Here the restri tion of the white list     ould be relaxed and       om-
     plement the TVT methods with a more general approa h. For this end, we
     used the predi tions of the J48 de ision tree explained above and         lassied
     an instan e as positive if the three    lassiers   lassied it in that way.
3.   Chunk 3 : In this    hunk most of the      lassiers obtained the best pre ision
     values. However, they obtained low re all values. In order to address this
     aspe t, an instan e was    lassied as positive if at least two   lassiers    lassify
     it as positive with probability p ≥ 0.9.
 4.   Chunk 4 : BoW obtained in this         hunk the highest re all values but low
      pre ision. We also    ould observe that when it was       ombined with the TVT
      method, this last method played a role of lter obtaining good results when
      both methods    lassied an instan e as positive. On the other hand, many
      instan es not dete ted for this      ombination were      lassied as positive by
      the J48 method. For this reason, any instan e        lassied as positive by both,
      BoW and TVT methods with probability p ≥ 0.7           or by the J48 method was
       lassied as positive.
 5.   Chunks 5 to 10 : From hunk 5 forward, we assumed that most of the relevant
       lassi ations had already been made in the previous          hunks. However, we
      kept a monitoring system to identify those          ases that needed mu h more
      additional information to determine whether an individual was depressive or
      not. For this purpose, we used the same rule of        hunk 4 for   hunks 5 and 6
      but in reasing the probability of BoW and TVT to p ≥ 0.8. From               hunks 7
      to 10 an instan e was     lassied as positive if at least two   lassiers   lassied
      it as positive with probability p ≥ 0.9.


      With those empiri ally derived rules as general       riterion for the CTD issue,
we then analyzed if the TVT method            ombined with the other approa hes in
spe i    situations obtained a better performan e than the individual methods
working separately. The results obtained from this analysis are presented below.



3.1     Analysis of results

In order to study the performan e of our  ombined approa h against ea h
method tested separately, we used T RDS − train and T RDS − test sets in the
experiments. Individual methods also need a             riterion to deal with the CTD
issue. In this    ase, we      an dire tly use the probability (or some measure of
 onden e) assigned by the        lassier to de ide   when to stop reading a do ument
and giving its    lassi ation. That approa h, that in [8℄ is referred as      dynami ,
only    onsiders that this probability ex eeds some parti ular threshold to         lassify
the instan e/individual as positive. In our studies, we also used, for the individual
methods, this dynami        approa h by    onsidering dierent probability values: p =
1, p ≥ 0.9, p ≥ 0.8, p ≥ 0.7 and p ≥ 0.6.
      Tables 1, 2 and 3 report the values of        pre ision (π), re all (ρ) and F1 -
measure (F1 ) of the target (depressed)         lass for ea h    onsidered individual
                                                          early risk dete tion
model and dierent probabilities. Statisti s also in lude the
error (ERDE) measure proposed in [8℄ with two values of the parameter o used
in the pilot task: o = 5 (ERDE5 ) and o = 50 (ERDE50 ).
      As we   an see from Table 1 the model TVT-NB obtained the best perfor-
man e when the      lassier assigned to the positive     lass (depressed), the instan es
with probability 1. For TVT-RF and BoW the statisti s are dierent. In Table 2
the best ERDE50 (and F1 measure) was obtained when Random Forest                   lassier
used the lowest probability (p ≥ 0.6) but at the same time with this probability
the ERDE5 error is the worst value. The BoW model (Table 3) obtained the
best ERDE50 value       onsidering p ≥ 0.8 and with this       onguration the best F1 ,
π and ρ are obtained. The J48 model assigns the instan es to the positive      lass
using probability 1, then its performan e is dire tly shown in the    omparison in
Table 4.



           Table 1. Performan e of TVT-NB model (T RDS − test set).



                            p = 1 p ≥ 0.9 p ≥ 0.8 p ≥ 0.7 p ≥ 0.6
                  ERDE5 14.13         16.66   16.68   16.82   16.85
                  ERDE50 11.25        15.34   15.21   15.24   15.07
                  F1      0.40        0.27     0.28   0.27     0.28
                  π       0.47        0.50     0.48   0.43     0.44
                  ρ       0.35        0.18     0.19   0.19     0.20




           Table 2. Performan e of TVT-RF model (T RDS − test set).



                            p = 1 p ≥ 0.9 p ≥ 0.8 p ≥ 0.7 p ≥ 0.6
                   ERDE5 16.92 16.85          16.92   16.96   17.14
                   ERDE50 16.43 16.05         16.12   16.02   15.79
                   F1      0.11  0.15         0.16    0.21    0.22
                   π       0.50 0.54          0.50    0.50    0.43
                   ρ       0.06  0.08         0.10    0.13    0.14




             Table 3. Performan e of BoW model (T RDS − test set).



                            p = 1 p ≥ 0.9 p ≥ 0.8 p ≥ 0.7 p ≥ 0.6
                  ERDE5 20.79         20.83   21.05   21.16   21.27
                  ERDE50 18.82        18.66   18.13   18.24   18.35
                  F1      0.19        0.21    0.24    0.24     0.23
                  π       0.11        0.13    0.14    0.14    0.14
                  ρ        0.5        0.65    0.75    0.75    0.75




   To   hoi e a parti ular probability to     ompare the performan e of the models,
we sele ted the one for whi h the model obtained the best ERDE           onsidering
o = 50 (ERDE50 ). We de ided to use this metri          be ause we knew in advan e
that the results in the pilot task would be evaluated with ERDE but we did not
know the spe i    o value. In this    ontext, we     onsidered that ERDE50 would
be more important/informative than the ERDE5 error.
   Table 4 shows the    omparison of performan e of all models. As we       an ob-
serve, the performan e of the     ombined methods was the best in all metri s
ex ept for the ERDE5 error for whi h TVT-NB model obtained the best value.
It is interesting to note that beyond that TVT-NB obtained a good ERDE5
value, in the other metri s it was slightly worse than the              ombined methods.
These results   onrm the adequateness of the         ombined approa h for the ERD
task over the other individual methods.




          Table 4. Performan e    omparison of all models (T RDS − test set).


                                    ERDE5 ERDE50 F1               π     ρ
                Combined methods      15.36      9.16     0.59 0.48 0.75
                J48                   17.01      12.94     0.4   0.33   0.5
                BoW (p ≥ 0.8)         21.05      18.13     0.24 0.14 0.75
                TVT-NB (p = 1)       14.13       11.25     0.4   0.47 0.35
                TVT-RF (p ≥ 0.6)      17.14      15.79     0.22 0.43 0.14




4    Test Stage

The previous results were obtained by training the lassiers with the T RDS −
train data set and testing them with the T RDS − test data set. In this sub-
se tion, we show the results obtained by the            ombined methods trained with
the full training set of the pilot task (T RDS ) and tested with T E DS that was
in rementally released during the testing phase of the pilot task.

   In Table 5 we show the three submissions that obtained the best ERDE5 ,
ERDE50 and F1 values in the eRisk pilot task as reported in[9℄. There, we
 an observe that our ombined methods (U N SLA) obtained the best ERDE50
value. On the other hand, FHDO-BCSGA obtained the best F-measure with a
ERDE50 value slightly worse than our approa h. FHDO-BCSGB obtained the
best ERDE5 and the worst F1 over the three approa hes. With these results,
we   an   on lude that our proposal is a      ompetitive approa h for ERD tasks.




              Table 5. Best in the ranking of the pilot task (T E DS set).



                                          ERDE5 ERDE50 F1                π    ρ
            Combined methods (UNSLA)          13.66      9.68    0.59 0.48 0.79
            FHDO-BCSGA                        12.82      9.69    0.64 0.61 0.67
            FHDO-BCSGB                        12.70      10.39   0.55 0.69 0.46
5    Complementary studies on the test set
Our ERD system tested on the pilot task was derived from our analysis of the
weakness and strengths of the TVT method on the training data. However,
on e the golden-truth information of the     T E DS set was made available by
the organizers, we   ould analyze what would have been the performan e of the
models, in parti ular the TVT method, working alone if dierent probabilities
had been sele ted.
    Table 6 shows this type of information by reporting the results obtained
with TVT representation and using Naïve Bayes and Random Forest as learning
algorithms. Besides, dierent probability values were tested for the dynami
approa hes to the CTD aspe t. The obtained results are      on lusive in this    ase.
TVT shows a high robustness in the         ERDE measures independently of the
algorithm used to learn the model, and the probability threshold. Most of the
TVT's ERDE5 values were low and in 7 out of 10 settings, the ERDE50 values
were lowest than the best one reported in the pilot task (the     ombined methods
(U N SLA) with 9.68). In this     ontext, TVT a hieves the best ERDE5 value
reported up to now (12.30) with the setting TVT-RF (p ≥ 0.8) and the lowest
ERDE50 value (8.17) with the model TVT-NB (p ≥ 0.8). The performan e of
the TVT methods on the test set was surprising for us be ause on the training
stage we   ould not obtain su h as good results. This makes us        on lude that if
the TVT method had parti ipated alone in the pilot task, it had obtained similar
or better results than the ones obtained with the    ombined methods.



                  Table 6. TVT's performan e on the T E DS set.


                                  ERDE5 ERDE50 F1          π      ρ
               TVT-NB (p ≥ 0.6)    13.59      8.40   0.50 0.37 0.75
               TVT-NB (p ≥ 0.7)    13.43      8.24   0.51 0.39 0.75
               TVT-NB (p ≥ 0.8)    13.13     8.17    0.54 0.42 0.73
               TVT-NB (p ≥ 0.9)    13.07      8.35   0.52 0.42 0.69
               TVT-NB(p = 1)       12.38      9.84   0.42 0.50 0.37
               TVT-RF (p ≥ 0.6)    12.46      8.37   0.55 0.49 0.63
               TVT-RF (p ≥ 0.7)    12.49      8.52   0.55 0.50 0.62
               TVT-RF (p ≥ 0.8)    12.30      8.95   0.56 0.54 0.58
               TVT-RF (p ≥ 0.9)    12.34     10.28   0.47 0.55 0.40
               TVT-RF(p = 1)       12.82     11.82   0.20 0.67 0.12




6    Con lusions and future work
This arti le presents the parti ipation of LIDIC - UNSL at eRisk 2017 Pilot
task on Early Dete tion of Depression. We proposed an interesting representa-
tion named   temporal variation of terms (TVT) whi h onsiders the variation of
vo abulary along the dierent time steps as        on ept spa e for the representation
of the do uments. Be ause the experiments on the training stage showed some
weakness of TVT, we proposed a           ombined approa h to assist TVT in some
spe i   situations. The results obtained on the pilot task with our            ombined
methods were good, obtaining the best ERDE50 value over all parti ipants. An
interesting aspe t was that the TVT representation alone on the test set, ob-
tained better results than the ones a hieved by the         ombined approa h.
As future work we plan to extend the analysis of TVT representation to solve
other ERD problems su h as the identi ation of sexual predators and people
with sui ide tenden y.



Referen es
 1. M. R. Cape elatro, M. D. Sa     het, P. F. Hit h o k, S. M. Miller, and W. B. Britton.
    Major depression duration redu es appetitive word use: An elaborated verbal re all
    of emotional photographs. Journal of Psy hiatri     Resear h, 47(6):809  815, 2013.
 2. M. De Choudhury, E. Ki iman, M. Dredze, G. Coppersmith, and M. Kumar. Dis-
     overing shifts to sui idal ideation from mental health    ontent in so ial media. In
    Pro eedings of the 2016 CHI Conferen e on Human Fa tors in Computing Systems,
    CHI '16, pages 20982110, New York, NY, USA, 2016. ACM.
 3. M. L. Erre alde, M. P. Villegas, D. G. Funez, M. J. Gar iarena U elay, and L. C.
    Cagnina. Temporal Variation of Terms as     on ept spa e for early risk predi tion. In
    G. J. F. Jones, S. Lawless, J. Gonzalo, L. Kelly, L. Goeuriot, T. Mandl, L. Cappel-
    lato, and N. Ferro, editors, Experimental IR Meets Multilinguality, Multimodality,
    and Intera tion., volume 10456. Springer, 2017.
 4. A. Gar ia-Caballero, J. Jiménez, M. Fernandez-Cabana, and I. Gar ía-Lado. Last
    words: an liw    analysis of sui ide notes from spain. European Psy h., 27:1 , 2012.
 5. M. Koppel, J. S hler, and S. Argamon. Computational methods in authorship attri-
    bution. Journal of the Ameri an So iety for Information S ien e and Te hnology,
    60(1):926, 2009.
 6. Z. Li, Z. Xiong, Y. Zhang, C. Liu, and K. Li. Fast text   ategorization using   on ise
    semanti   analysis. Pattern Re ogn. Lett., 32(3):441448, February 2011.
 7. A. P. López-Monroy, M. Montes y Gómez, H. J. Es alante, L. Villaseñor-Pineda,
    and Efstathios Stamatatos. Dis riminative subprole-spe i        representations for
    author proling in so ial media. Knowledge-Based Systems, 89:134  147, 2015.
 8. D. E. Losada and F. Crestani. A Test Colle tion for Resear h on Depression and
    Language Use, pages 2839. Springer International Publishing, Cham, 2016.
 9. D. E. Losada, F. Crestani, and J. Parapar. eRISK 2017: CLEF Lab on Early Risk
    Predi tion on the Internet: Experimental foundations. In Pro eedings Conferen e
    and Labs of the Evaluation Forum CLEF 2017, Dublin, Ireland, 2017.
10. J. W. Pennebaker, R. L. Boyd, K. Jordan, and K. Bla kburn. The development
    and psy hometri     properties of liw 2015. 2015.
11. J.W. Pennebaker, M.R. Mehl, and K.G. Niederhoer. Psy hologi al aspe ts of nat-
    ural language use: Our words, our selves. Annual review of psy hology, 54(1):547
    577, 2003.
12. N. Ramirez-Esparza, C. K. Chung, E. Ka ewi z, and J. W. Pennebaker.               The
    psy hology of word use in depression forums in english and in spanish: Testing two
    text analyti    approa hes. In Pro . ICWSM 2008, 2008.