=Paper=
{{Paper
|id=Vol-1866/paper_107
|storemode=property
|title=LIDIC - UNSL's Participation at eRisk 2017: Pilot Task on Early Detection of Depression
|pdfUrl=https://ceur-ws.org/Vol-1866/paper_107.pdf
|volume=Vol-1866
|authors=María Paula Villegas,Dario Gustavo Funez,María José Garciarena Ucelay,Leticia Cecilia Cagnina,Marcelo Luis Errecalde
|dblpUrl=https://dblp.org/rec/conf/clef/VillegasFUCE17
}}
==LIDIC - UNSL's Participation at eRisk 2017: Pilot Task on Early Detection of Depression==
LIDIC - UNSL's parti ipation at eRisk 2017:
Pilot task on Early Dete tion of Depression
Notebook for eRisk at CLEF 2017
1 1
Ma. Paula Villegas , Dario G. Funez ,
1 1,2 1
Ma. José Gar iarena U elay , Leti ia C. Cagnina , and Mar elo L. Erre alde
1
LIDIC Resear h Group,
Universidad Na ional de San Luis, Argentina
2
Consejo Na ional de Investiga iones Cientí as y Té ni as (CONICET)
{villegasmariapaula74,funezdario,mjgar iarenau elay}gmail. om
{l agnina,merre alde}gmail. om
Abstra t. In this paper we des ribe the parti ipation of the LIDIC Re-
sear h Group of Universidad Na ional de San Luis (UNSL) - Argentina at
eRisk 2017 pilot task. The main goal of this task is onsidering early risk
dete tion s enarios, depression in this ase, where the issue of getting
timely predi tions with a reasonable onden e level be omes riti al.
In the pilot task, systems must be able to sequentially pro ess pie es
of eviden e and dete t early tra es of depression as soon as possible.
The used data set is a olle tion of writings labeled as depressed or
non-depressed that were released to the pilot task parti ipants in two
dierent stages: training and test. Our proposal for this task was based
on a semanti representation of do uments that expli itly onsiders the
partial information that is made available in the dierent hunks to the
early risk dete tion systems along the time. That temporal approa h
was omplemented with other standard text ategorization models in
some spe i situations that seemed not to be orre tly addressed by
our approa h alone. In the test stage, the resulting system obtained the
lowest ERDE50 error on a total of 30 submissions from 8 dierent in-
stitutions. However, on e the golden-truth of the testing data set was
released, we ould verify that our temporal approa h alone might have
obtained very robust results and the lowest reported ERDE error for
both thresholds used in the pilot task.
Keywords: Early Risk Dete tion, Unbalan ed Data Sets, Text Repre-
sentations, Semanti Analysis Te hniques.
1 Introdu tion
The widespread use of Internet, so ial networks and other omputer te hnologies
has marked the beginning of a new era in the ommuni ation among people. In
this ontext, we have now available a lot of information that might be useful to
dete t, in a timely way, those situations that might be potentially dangerous or
risky for the physi al well-being and health of individuals, ommunities and so-
ial organizations. As examples of those situations we an mention the dete tion
of potential paedophiles, people with sui idal in linations, or people sus eptible
to depression, among others.
This type of situations have started to be studied in a new resear h eld
known as early risk dete tion (ERD) whi h has re eived in reasingly interest
from s ienti resear hers at world level due to the important impa t that it
ould have in many relevant and urrent problems of the real world. In this
ontext, this year was organized the rst early risk predi tion onferen e eRisk
2017
3 in the ontext of the CLEF 2017 Workshop. As part of this event, a pilot
task was organized and the present arti le des ribes our parti ipation in this
task.
The eRisk 2017 pilot task was organized into two dierent stages: training
and test stages. It also assumed an ERD s enario, that is, data are sequentially
read as a stream and the hallenge onsists in dete ting risk ases as soon as
possible. In order to reprodu e this s enario, the eRisk 2017's organizers released
the test data set following a sequential, hunk by hunk riterion; that is, the
rst hunk ontained the oldest 10% of the messages, the se ond hunk ontains
the se ond oldest 10%, and so forth up to omplete 10 hunks that represent the
full writing of the analysed individuals.
To deal with this problem we implemented an original proposal that we
named temporal variation of terms (TVT) [3℄. However, at the training stage,
TVT seemed to show some weakness at spe i hunks so we de ided to om-
plement it with other standard methods to help our approa h in these spe i
situations. Thus, our implemented ERD system is in fa t a ombined system that
strongly depends on TVT but also uses other methods as additional sour e of
opinions. The resulting system had a very a eptable performan e on the data
set released in the test stage obtaining the lowest ERDE50 error on a total of 30
submissions from 8 dierent institutions. However, on e the golden-truth of the
test data set was released, we ould verify that TVT alone might have obtained
very robust results and the lowest reported ERDE error for both thresholds
used in the pilot task. For this reason, we also in lude an additional se tion in
this work with preliminary results obtained with TVT alone on the test set in
order to observe the potential of this method for general ERD problems.
The rest of the arti le is organized as follows: Se tion 2 des ribes some general
aspe ts of the data set used in the pilot task and the methods used in our ERD
system. Next, in Se tion 3 the a tivities arried out in the training stage are
des ribed and the justi ations of the main design de isions made on our ERD
system, are presented. Se tion 4 shows the obtained results with our proposal
on the eRisk 2017 dataset released in the test stage. Then, some omplementary
results are presented in Se tion 5 where interesting aspe ts are shown on the
performan e of TVT working alone on the test set. Finally, Se tion 6 depi ts
potential future works and the obtained on lusions.
3
http://early.irlab.org/
2 Data set and methods
2.1 Data Set
The data set used in the pilot task of the eRisk ompetition
4 was initially de-
s ribed in [8℄. It is a olle tion of writings (posts or omments) from a set of
So ial Media users. There are two ategories of users, depressed and non-
depressed and, for ea h user, the olle tion ontains a sequen e of writings (in
hronologi al order). For ea h user, his olle tion of writings has been divided
into 10 hunks. The rst hunk ontains the oldest 10% of the messages, the
se ond hunk ontains the se ond oldest 10%, and so forth. This olle tion was
split into a training and a test set that we will refer as T RDS and T E DS re-
spe tively. The T RDS set ontained 486 users (83 positive, 403 negative) and
the (test) T E DS set ontained 401 users (52 positive, 349 negative). The users
labeled as positive are those that have expli itly mentioned that they have been
diagnosed with depression.
The pilot task was divided by their organizers into a training stage and a
test stage. In the rst one, the parti ipating teams had a ess to the T RDS data
set with all hunks of all training users. They ould therefore tune their systems
with the training data. Then, in the test stage, the 10 hunks of the T E DS data
set were gradually released by the organizers one by one until ompleting all the
hunks that orrespond to the omplete writings of the onsidered individuals.
Ea h time that a hunk chi was released, parti ipants in the pilot task were
asked to give their predi tions on the users ontained in the T E DS , based on the
partial information read from hunks ch1 to chi .
2.2 Methods
In order to deal with the problem posed in the pilot task we proposed a new
do ument representation named temporal variation of terms (TVT) [3℄. TVT is
an elaborate approa h that requires enough spa e to be des ribed in an adequate
way. For this reason, in the present work we will only fo us on the de ision aspe ts
that were onsidered at the pilot task to ombine TVT with other well-known
do ument representation approa hes like Bag of Words (BoW). Thus, we will
give below a separate des ription of the method with its main hara teristi s
but the interest reader an obtain a detailed des ription of TVT in [3℄.
We used in our experiments dierent do ument representations and learning
algorithms. Therefore, we start giving some general ideas of those do ument
representations and a little more detailed explanation of TVT. After that, some
general onsiderations on the used learning algorithms are briey introdu ed.
Do ument representations
4
http://early.irlab.org/task.html
Bag of Words. The traditional Bag of Words (BoW) representation is one of
the language models most used in text ategorization tasks. In BoW repre-
sentations, features are words and do uments are simply treated as olle -
tions of unordered words. Formally, a do ument d is represented by the ve tor
of weights dBoW = (w1 , w2 , ..., wn ) where n is the size of the vo abulary of
the dataset. Ea h weight wi is a value that is assigned to ea h feature (word)
a ording to whether the word appears in a do ument or how frequently this
word appears. This popular representation is simple to implement, fast to
obtain and an be used under dierent weighting s hemes (boolean, term
frequen y (tf ) or term frequen y - inverse do ument frequen y (tf − idf )).
In our study we used BoW with the boolean weighting s heme.
Con ise Semanti Analysis (or Se ond Order Attributes (SOA)). Con ise
Semanti Analysis is a semanti analysis te hnique that interprets words and
text fragments in a spa e of on epts that are lose (or equal) to the ategory
labels. For instan e, if do uments in the data set are labeled with q dierent
ategory labels (usually no more than 100 elements), words and do uments
will be represented in a q -dimensional spa e. That spa e size is usually mu h
smaller than standard BoW representations whi h dire tly depend on the
vo abulary size (more than 10000 or 20000 elements in general). CSA has
been used in general text ategorization tasks [6℄ and has been adapted to
work in author proling tasks under the name of Se ond Order Attributes
(SOA) [7℄.
Chara ter 3-grams. A n-gram is a sequen e of n hara ters obtained from
ea h text in the dataset. In the n-gram model, the di tionary ontains all
n-grams that o ur in any term in the vo abulary. The representations using
hara ter n-grams have demonstrated to be ee tive in many appli ations
[5℄ where n-grams are onsidered the terms used in BoW representations.
LIW C -based features. Features derived from Linguisti Inquiry and Word
Count (LIWC)[11, 10℄ have been used in several studies related to psy ho-
logi al aspe ts of individuals. LIWC has been su essfully used to identify
depressed and non-depressed people analyzing linguisti markers of depres-
sion su h as the use of the personal pronouns and positive-negative emotions
[12℄ and the presen e of words related to the death (e.g., dead, kill, sui-
ide), sex (e.g., arouse, makeout, orgasm) and ingestion (e.g., hew,
drink, hunger) besides emotions also resulting useful [1℄. Studies on sui-
idal individuals have in orporated LIWC as a valuable tool to extra t in-
formation related to sui ide and sui idal ideation analyzing the ategories
death, health, sad, they, I, sexual, ller, swear, anger, and negative emotions
[2, 4℄. Due to the fa t we wanted to onsider more meaningful features, we
also onsidered in preliminary studies the most informative features belong-
ing to linguisti dimensions (for example, personal pronouns and number
of fun tion words), summary of language variables (for example, di tionary
words and words with more than 6 letters), psy hologi al pro ess (for ex-
ample, negative emotions and ae tive pro esses) and, grammar (verbs and
numbers) and pun tuation (number of apostrophes) aspe ts.
Temporal Variation of Terms (TVT) The temporal variation of terms
(TVT) method [3℄ is an approa h for early risk dete tion based on using the
variation of vo abulary along the dierent time steps as on ept spa e for do u-
ment representation. This method is the key omponent of our EDS system for
the eRisk pilot task.
TVT is based on the on ise semanti analysis (CSA) te hnique proposed
in [6℄ and later extended in [7℄ for author proling tasks. In this ontext, the
underlying idea is that variations of the terms used in dierent sequential stages
of the do uments may have relevant information for the lassi ation task. With
this idea in mind, this method enri hes the do uments of the minority lass
with the partial do uments read in the rst hunks. These rst hunks of the
minority lass, along with their omplete do uments, are onsidered as a new
on ept spa e for a CSA method.
TVT naturally opes with the sequential ara teristi s of ERD problems and
also gives a tool for dealing with unbalan ed data sets. Preliminary results of this
method in omparison to CSA and BoW representations [3℄ showed its potential
to deal with ERD problems.
Learning algorithms In preliminary studies we tested dierent learning algo-
rithms and we obtained the best results with Random Forest and Naïve Bayes in
several
5
omparisons with other popular methods like LIBSVM . We also used in
our system, a de ision tree (Weka's J48) obtained by rst sele ting the 100 words
with the highest information gain and then removing from that list those words
that were onsidered dependent of spe i domains (names of politi ians like
Obama and ountries like China). The J48 algorithm trained on this subset
of features obtained a de ision tree of only 39 nodes ontaining some interesting
words like meds, depression, therapist and ry, among others. As we will
see later, this de ision tree was used to assist to the TVT method in the initial
hunks.
Even though we arried out several omparative studies with LIWC features,
hara ter 3-grams, CSA representations and the LIBSVM algorithm, we de ided
to sele t only those approa hes that seemed to be ee tive to assist TVT in some
spe i situations. For this reason, in our ERD system we only used the Random
Forest and Naïve Bayes algorithms with BoW and TVT representations and the
J48 de ision tree previously explained.
3 Training Stage
The pilot task was divided into a training stage and a test stage. So, in the
rst one, the parti ipating teams had a ess to the T RDS set with all hunks
of all training users. Therefore, we ould tune in this stage our system with the
training data.
5
For all the tested algorithms we used the implementations provided by the Weka
data mining tool.
Even though the models obtained with the TVT representation showed in
several preliminary studies to give better results than BoW and CSA (see [3℄ for
a detailed study), we la ked of a robust riterion to solve a riti al aspe t in ERD
problems that we will refer as thelassi ation time de ision (CTD) issue. That
is, an ERD system needs to onsider not only whi h lass should be assigned
to a do ument; it also needs to de ide when to make that assignment. In other
words, ERD methods must have a riterion to de ide when (in what situations)
the lassi ation generated by the system is onsidered the nal/denitive de i-
sion on the evaluated instan es. Although this aspe t has been addressed with
very simple heuristi
6
rules , we wanted to have a rule for the CTD issue that
attempted to exploit the strengths of the TVT method and simultaneously al-
leviate its weaknesses. With this goal in mind, we began a thorough study to
determine in whi h situations the TVT method behaved satisfa torily and when
it needed to be improved.
We started our study by dividing the training set T RDS into a new training
set that we will refer as T RDS − train and a test set named T RDS − test.
Those sets maintained the same proportions of post per user and words per
user as des ribed in [8℄. T RDS − train and T RDS − test were generated by
randomly sele ting around a 70% of writings for the rst one and the rest 30%
for the se ond one. Thus, T RDS − train resulted in 351 individuals (63 positive,
288 negative) meanwhile T RDS − test ontains 135 individuals (20 positive, 115
negative). The division into 10 hunks of the writings provided by the organizers
was kept for both olle tions.
Then, we analyzed the performan e of the TVT method by using the T RDS −
train set as training set and evaluating the obtained results with T RDS − test.
We started assuming that the lassi ation is made on a stati hunk by hunk
basis. That is, for ea h hunk Ĉi provided to the ERD systems we evaluated the
TVT's performan e onsidering that the model is (simultaneously) applied to the
writings re eived up to the hunk Ĉi . With this type of study it was possible to
observe to what extent the TVT method was robust to the partial information in
the dierent stages, in whi h moment it started to obtain a eptable results, and
other interesting statisti s. In this ontext, we onsidered that the F1 measure
whi h ombines pre ision and re all would be a valuable measure to gain some
insight into the performan e of the TVT method along the dierent hunks.
Figure 1 shows this type of information where we an see that TVT represen-
tations with two dierent learning algorithms (Naïve Bayes (NB) and Random
Forest (RF)) rea hed the highest F1 values in the hunks 3 and 4. In both ases,
those values were obtained when an instan e was lassied as positive if the
probability assigned by the lassier was greater or equal than 0.6 (p ≥ 0.6). Be-
sides, we ould observe that in the rst hunk, TVT representations produ ed
a high number of false positives. That weakness of TVT methods is shown in
Figure 2 where we an see the lowest pre ision values in the hunk 1. That prob-
lem of TVT methods looks reasonable if we onsider that TVT is based on the
6
For instan e, ex eeding a spe i onden e threshold in the predi tion of the las-
sier [8℄.
variations of terms along the time. Thus, we an on lude that in the rst two
hunks, where little information is available, TVT will need to be assisted by
other omplementary methods.
Fig. 1. F1 values obtained with TVT models.
Fig. 2. Pre ision values obtained with TVT models.
Due to the fa t that we wanted to fo us our predi tions on those hunks
where best results were obtained ( hunks 3 and 4) and assuming that after those
hunks the penalization omponent in the ERDE omputation would ae t the
TVT's results we de ided to set some hunk by hunk rules that a omplish
ertain basi properties in the dierent hunks:
1. Chunk 1 : Here the ERD system should be extremely onservative and only
lassifying an instan e as positive (depressed) if there exists strong eviden e
of that. We used for this ase, the riterion of only lassifying an instan e
as positive ifboth models obtained with TVT-NB and TVT-RF lassied
the instan e as positive with probability p = 1 and the text in luded all
the words in a white list. That list was obtained from the words with the
highest information gain of the do uments of the rst hunk. It in luded the
words depression and diagnosed.
2. Chunk 2 : Here the restri tion of the white list ould be relaxed and om-
plement the TVT methods with a more general approa h. For this end, we
used the predi tions of the J48 de ision tree explained above and lassied
an instan e as positive if the three lassiers lassied it in that way.
3. Chunk 3 : In this hunk most of the lassiers obtained the best pre ision
values. However, they obtained low re all values. In order to address this
aspe t, an instan e was lassied as positive if at least two lassiers lassify
it as positive with probability p ≥ 0.9.
4. Chunk 4 : BoW obtained in this hunk the highest re all values but low
pre ision. We also ould observe that when it was ombined with the TVT
method, this last method played a role of lter obtaining good results when
both methods lassied an instan e as positive. On the other hand, many
instan es not dete ted for this ombination were lassied as positive by
the J48 method. For this reason, any instan e lassied as positive by both,
BoW and TVT methods with probability p ≥ 0.7 or by the J48 method was
lassied as positive.
5. Chunks 5 to 10 : From hunk 5 forward, we assumed that most of the relevant
lassi ations had already been made in the previous hunks. However, we
kept a monitoring system to identify those ases that needed mu h more
additional information to determine whether an individual was depressive or
not. For this purpose, we used the same rule of hunk 4 for hunks 5 and 6
but in reasing the probability of BoW and TVT to p ≥ 0.8. From hunks 7
to 10 an instan e was lassied as positive if at least two lassiers lassied
it as positive with probability p ≥ 0.9.
With those empiri ally derived rules as general riterion for the CTD issue,
we then analyzed if the TVT method ombined with the other approa hes in
spe i situations obtained a better performan e than the individual methods
working separately. The results obtained from this analysis are presented below.
3.1 Analysis of results
In order to study the performan e of our ombined approa h against ea h
method tested separately, we used T RDS − train and T RDS − test sets in the
experiments. Individual methods also need a riterion to deal with the CTD
issue. In this ase, we an dire tly use the probability (or some measure of
onden e) assigned by the lassier to de ide when to stop reading a do ument
and giving its lassi ation. That approa h, that in [8℄ is referred as dynami ,
only onsiders that this probability ex eeds some parti ular threshold to lassify
the instan e/individual as positive. In our studies, we also used, for the individual
methods, this dynami approa h by onsidering dierent probability values: p =
1, p ≥ 0.9, p ≥ 0.8, p ≥ 0.7 and p ≥ 0.6.
Tables 1, 2 and 3 report the values of pre ision (π), re all (ρ) and F1 -
measure (F1 ) of the target (depressed) lass for ea h onsidered individual
early risk dete tion
model and dierent probabilities. Statisti s also in lude the
error (ERDE) measure proposed in [8℄ with two values of the parameter o used
in the pilot task: o = 5 (ERDE5 ) and o = 50 (ERDE50 ).
As we an see from Table 1 the model TVT-NB obtained the best perfor-
man e when the lassier assigned to the positive lass (depressed), the instan es
with probability 1. For TVT-RF and BoW the statisti s are dierent. In Table 2
the best ERDE50 (and F1 measure) was obtained when Random Forest lassier
used the lowest probability (p ≥ 0.6) but at the same time with this probability
the ERDE5 error is the worst value. The BoW model (Table 3) obtained the
best ERDE50 value onsidering p ≥ 0.8 and with this onguration the best F1 ,
π and ρ are obtained. The J48 model assigns the instan es to the positive lass
using probability 1, then its performan e is dire tly shown in the omparison in
Table 4.
Table 1. Performan e of TVT-NB model (T RDS − test set).
p = 1 p ≥ 0.9 p ≥ 0.8 p ≥ 0.7 p ≥ 0.6
ERDE5 14.13 16.66 16.68 16.82 16.85
ERDE50 11.25 15.34 15.21 15.24 15.07
F1 0.40 0.27 0.28 0.27 0.28
π 0.47 0.50 0.48 0.43 0.44
ρ 0.35 0.18 0.19 0.19 0.20
Table 2. Performan e of TVT-RF model (T RDS − test set).
p = 1 p ≥ 0.9 p ≥ 0.8 p ≥ 0.7 p ≥ 0.6
ERDE5 16.92 16.85 16.92 16.96 17.14
ERDE50 16.43 16.05 16.12 16.02 15.79
F1 0.11 0.15 0.16 0.21 0.22
π 0.50 0.54 0.50 0.50 0.43
ρ 0.06 0.08 0.10 0.13 0.14
Table 3. Performan e of BoW model (T RDS − test set).
p = 1 p ≥ 0.9 p ≥ 0.8 p ≥ 0.7 p ≥ 0.6
ERDE5 20.79 20.83 21.05 21.16 21.27
ERDE50 18.82 18.66 18.13 18.24 18.35
F1 0.19 0.21 0.24 0.24 0.23
π 0.11 0.13 0.14 0.14 0.14
ρ 0.5 0.65 0.75 0.75 0.75
To hoi e a parti ular probability to ompare the performan e of the models,
we sele ted the one for whi h the model obtained the best ERDE onsidering
o = 50 (ERDE50 ). We de ided to use this metri be ause we knew in advan e
that the results in the pilot task would be evaluated with ERDE but we did not
know the spe i o value. In this ontext, we onsidered that ERDE50 would
be more important/informative than the ERDE5 error.
Table 4 shows the omparison of performan e of all models. As we an ob-
serve, the performan e of the ombined methods was the best in all metri s
ex ept for the ERDE5 error for whi h TVT-NB model obtained the best value.
It is interesting to note that beyond that TVT-NB obtained a good ERDE5
value, in the other metri s it was slightly worse than the ombined methods.
These results onrm the adequateness of the ombined approa h for the ERD
task over the other individual methods.
Table 4. Performan e omparison of all models (T RDS − test set).
ERDE5 ERDE50 F1 π ρ
Combined methods 15.36 9.16 0.59 0.48 0.75
J48 17.01 12.94 0.4 0.33 0.5
BoW (p ≥ 0.8) 21.05 18.13 0.24 0.14 0.75
TVT-NB (p = 1) 14.13 11.25 0.4 0.47 0.35
TVT-RF (p ≥ 0.6) 17.14 15.79 0.22 0.43 0.14
4 Test Stage
The previous results were obtained by training the lassiers with the T RDS −
train data set and testing them with the T RDS − test data set. In this sub-
se tion, we show the results obtained by the ombined methods trained with
the full training set of the pilot task (T RDS ) and tested with T E DS that was
in rementally released during the testing phase of the pilot task.
In Table 5 we show the three submissions that obtained the best ERDE5 ,
ERDE50 and F1 values in the eRisk pilot task as reported in[9℄. There, we
an observe that our ombined methods (U N SLA) obtained the best ERDE50
value. On the other hand, FHDO-BCSGA obtained the best F-measure with a
ERDE50 value slightly worse than our approa h. FHDO-BCSGB obtained the
best ERDE5 and the worst F1 over the three approa hes. With these results,
we an on lude that our proposal is a ompetitive approa h for ERD tasks.
Table 5. Best in the ranking of the pilot task (T E DS set).
ERDE5 ERDE50 F1 π ρ
Combined methods (UNSLA) 13.66 9.68 0.59 0.48 0.79
FHDO-BCSGA 12.82 9.69 0.64 0.61 0.67
FHDO-BCSGB 12.70 10.39 0.55 0.69 0.46
5 Complementary studies on the test set
Our ERD system tested on the pilot task was derived from our analysis of the
weakness and strengths of the TVT method on the training data. However,
on e the golden-truth information of the T E DS set was made available by
the organizers, we ould analyze what would have been the performan e of the
models, in parti ular the TVT method, working alone if dierent probabilities
had been sele ted.
Table 6 shows this type of information by reporting the results obtained
with TVT representation and using Naïve Bayes and Random Forest as learning
algorithms. Besides, dierent probability values were tested for the dynami
approa hes to the CTD aspe t. The obtained results are on lusive in this ase.
TVT shows a high robustness in the ERDE measures independently of the
algorithm used to learn the model, and the probability threshold. Most of the
TVT's ERDE5 values were low and in 7 out of 10 settings, the ERDE50 values
were lowest than the best one reported in the pilot task (the ombined methods
(U N SLA) with 9.68). In this ontext, TVT a hieves the best ERDE5 value
reported up to now (12.30) with the setting TVT-RF (p ≥ 0.8) and the lowest
ERDE50 value (8.17) with the model TVT-NB (p ≥ 0.8). The performan e of
the TVT methods on the test set was surprising for us be ause on the training
stage we ould not obtain su h as good results. This makes us on lude that if
the TVT method had parti ipated alone in the pilot task, it had obtained similar
or better results than the ones obtained with the ombined methods.
Table 6. TVT's performan e on the T E DS set.
ERDE5 ERDE50 F1 π ρ
TVT-NB (p ≥ 0.6) 13.59 8.40 0.50 0.37 0.75
TVT-NB (p ≥ 0.7) 13.43 8.24 0.51 0.39 0.75
TVT-NB (p ≥ 0.8) 13.13 8.17 0.54 0.42 0.73
TVT-NB (p ≥ 0.9) 13.07 8.35 0.52 0.42 0.69
TVT-NB(p = 1) 12.38 9.84 0.42 0.50 0.37
TVT-RF (p ≥ 0.6) 12.46 8.37 0.55 0.49 0.63
TVT-RF (p ≥ 0.7) 12.49 8.52 0.55 0.50 0.62
TVT-RF (p ≥ 0.8) 12.30 8.95 0.56 0.54 0.58
TVT-RF (p ≥ 0.9) 12.34 10.28 0.47 0.55 0.40
TVT-RF(p = 1) 12.82 11.82 0.20 0.67 0.12
6 Con lusions and future work
This arti le presents the parti ipation of LIDIC - UNSL at eRisk 2017 Pilot
task on Early Dete tion of Depression. We proposed an interesting representa-
tion named temporal variation of terms (TVT) whi h onsiders the variation of
vo abulary along the dierent time steps as on ept spa e for the representation
of the do uments. Be ause the experiments on the training stage showed some
weakness of TVT, we proposed a ombined approa h to assist TVT in some
spe i situations. The results obtained on the pilot task with our ombined
methods were good, obtaining the best ERDE50 value over all parti ipants. An
interesting aspe t was that the TVT representation alone on the test set, ob-
tained better results than the ones a hieved by the ombined approa h.
As future work we plan to extend the analysis of TVT representation to solve
other ERD problems su h as the identi ation of sexual predators and people
with sui ide tenden y.
Referen es
1. M. R. Cape elatro, M. D. Sa het, P. F. Hit h o k, S. M. Miller, and W. B. Britton.
Major depression duration redu es appetitive word use: An elaborated verbal re all
of emotional photographs. Journal of Psy hiatri Resear h, 47(6):809 815, 2013.
2. M. De Choudhury, E. Ki iman, M. Dredze, G. Coppersmith, and M. Kumar. Dis-
overing shifts to sui idal ideation from mental health ontent in so ial media. In
Pro eedings of the 2016 CHI Conferen e on Human Fa tors in Computing Systems,
CHI '16, pages 20982110, New York, NY, USA, 2016. ACM.
3. M. L. Erre alde, M. P. Villegas, D. G. Funez, M. J. Gar iarena U elay, and L. C.
Cagnina. Temporal Variation of Terms as on ept spa e for early risk predi tion. In
G. J. F. Jones, S. Lawless, J. Gonzalo, L. Kelly, L. Goeuriot, T. Mandl, L. Cappel-
lato, and N. Ferro, editors, Experimental IR Meets Multilinguality, Multimodality,
and Intera tion., volume 10456. Springer, 2017.
4. A. Gar ia-Caballero, J. Jiménez, M. Fernandez-Cabana, and I. Gar ía-Lado. Last
words: an liw analysis of sui ide notes from spain. European Psy h., 27:1 , 2012.
5. M. Koppel, J. S hler, and S. Argamon. Computational methods in authorship attri-
bution. Journal of the Ameri an So iety for Information S ien e and Te hnology,
60(1):926, 2009.
6. Z. Li, Z. Xiong, Y. Zhang, C. Liu, and K. Li. Fast text ategorization using on ise
semanti analysis. Pattern Re ogn. Lett., 32(3):441448, February 2011.
7. A. P. López-Monroy, M. Montes y Gómez, H. J. Es alante, L. Villaseñor-Pineda,
and Efstathios Stamatatos. Dis riminative subprole-spe i representations for
author proling in so ial media. Knowledge-Based Systems, 89:134 147, 2015.
8. D. E. Losada and F. Crestani. A Test Colle tion for Resear h on Depression and
Language Use, pages 2839. Springer International Publishing, Cham, 2016.
9. D. E. Losada, F. Crestani, and J. Parapar. eRISK 2017: CLEF Lab on Early Risk
Predi tion on the Internet: Experimental foundations. In Pro eedings Conferen e
and Labs of the Evaluation Forum CLEF 2017, Dublin, Ireland, 2017.
10. J. W. Pennebaker, R. L. Boyd, K. Jordan, and K. Bla kburn. The development
and psy hometri properties of liw 2015. 2015.
11. J.W. Pennebaker, M.R. Mehl, and K.G. Niederhoer. Psy hologi al aspe ts of nat-
ural language use: Our words, our selves. Annual review of psy hology, 54(1):547
577, 2003.
12. N. Ramirez-Esparza, C. K. Chung, E. Ka ewi z, and J. W. Pennebaker. The
psy hology of word use in depression forums in english and in spanish: Testing two
text analyti approa hes. In Pro . ICWSM 2008, 2008.