This type of situations have started to be studied in a new resear h eld

2017

2 13

To deal with this problem we implemented an original proposal that we The eRisk 2017 pilot task was organized into two dierent stages: training read as a stream and the hallenge onsists in dete ting risk ases as soon as test data set was released, we ould verify that TVT alone might have obtained the test data set following a sequential, hunk by hunk riterion; that is, the very robust results and the lowest reported error for both thresholds ERDE system. Next, in Se tion 3 the a tivities arried out in the training stage are opinions. The resulting system had a very a eptable performan e on the data ould have in many relevant and urrent problems of the real world. In this named temporal variation of terms (TVT) [3℄. However, at the training stage, plement it with other standard methods to help our approa h in these spe i on the eRisk 2017 dataset released in the test stage. Then, some omplementary The rest of the arti le is organized as follows: Se tion 2 des ribes some general the se ond oldest 10%, and so forth up to omplete 10 hunks that represent the set released in the test stage obtaining the lowest error on a total of 30 ERDE50 task was organized and the present arti le des ribes our parti ipation in this potential future works and the obtained on lusions. submissions from 8 dierent institutions. However, on e the golden-truth of the known as early risk dete tion (ERD) whi h has re eived in reasingly interest ontext, this year was organized the rst early risk predi tion onferen e eRisk this work with preliminary results obtained with TVT alone on the test set in to depression, among others. and test stages. It also assumed an ERD s enario, that is, data are sequentially of potential paedophiles, people with sui idal in linations, or people sus eptible system, are presented. Se tion 4 shows the obtained results with our proposal full writing of the analysed individuals. order to observe the potential of this method for general ERD problems. task.

TVT seemed to show some weakness at spe i hunks so we de ided to omrst hunk ontained the oldest 10% of the messages, the se ond hunk ontains results are presented in Se tion 5 where interesting aspe ts are shown on the performan e of TVT working alone on the test set. Finally, Se tion 6 depi ts strongly depends on TVT but also uses other methods as additional sour e of used in the pilot task. For this reason, we also in lude an additional se tion in possible. In order to reprodu e this s enario, the eRisk 2017’s organizers released aspe ts of the data set used in the pilot task and the methods used in our ERD from s ienti resear hers at world level due to the important impa t that it in the ontext of the CLEF 2017 Workshop. As part of this event, a pilot 20173 situations. Thus, our implemented ERD system is in fa t a ombined system that des ribed and the justi ations of the main design de isions made on our ERD

2.2 Methods

Do ument representations 2 Data set and methods 2.1 Data Set

In our study we used BoW with the boolean weighting s heme.

frequen y ) or term frequen y - inverse do ument frequen y )). (tf (tf − idf word appears. This popular representation is simple to implement, fast to sentations, features are words and do uments are simply treated as olle Bag of Words. The traditional Bag of Words (BoW) representation is one of a ording to whether the word appears in a do ument or how frequently this tions of unordered words. Formally, a do ument is represented by the ve tor d the dataset. Ea h weight is a value that is assigned to ea h feature (word) wi of weights where is the size of the vo abulary of dBoW = (w1, w2, ..., wn) n obtain and an be used under dierent weighting s hemes (boolean, term the language models most used in text ategorization tasks. In BoW reprelogi al aspe ts of individuals. LIWC has been su essfully used to identify numbers) and pun tuation (number of apostrophes) aspe ts. ea h text in the dataset. In the model, the di tionary ontains all n-gram idal individuals have in orporated LIWC as a valuable tool to extra t inCount (LIWC)[11, 10℄ have been used in several studies related to psy hohara ter have demonstrated to be ee tive in many appli ations n-grams depressed and non-depressed people analyzing linguisti markers of depres[2, 4℄. Due to the fa t we wanted to onsider more meaningful features, we death, health, sad, they, I, sexual, ller, swear, anger, and negative emotions words and words with more than 6 letters), psy hologi al pro ess (for ex[12℄ and the presen e of words related to the death (e.g., dead, kill, suifeatures. Features derived from Linguisti Inquiry and Word LIW C-based sion su h as the use of the personal pronouns and positive-negative emotions ide), sex (e.g., arouse, makeout, orgasm) and ingestion (e.g., hew, [5℄ where are onsidered the terms used in BoW representations. n-grams also onsidered in preliminary studies the most informative features belongof fun tion words), summary of language variables (for example, di tionary that o ur in any term in the vo abulary. The representations using n-grams ample, negative emotions and ae tive pro esses) and, grammar (verbs and formation related to sui ide and sui idal ideation analyzing the ategories Chara ter 3-grams. A is a sequen e of hara ters obtained from n-gram n drink, hunger) besides emotions also resulting useful [1℄. Studies on suiing to linguisti dimensions (for example, personal pronouns and number see later, this de ision tree was used to assist to the TVT method in the initial our system, a de ision tree (Weka’s J48) obtained by rst sele ting the 100 words Even though we arried out several omparative studies with LIWC features, J48 de ision tree previously explained. rithms and we obtained the best results with Random Forest and Nave Bayes in spe i situations. For this reason, in our ERD system we only used the Random to sele t only those approa hes that seemed to be ee tive to assist TVT in some of features obtained a de ision tree of only 39 nodes ontaining some interesting that were onsidered dependent of spe i domains (names of politi ians like hunks. words like meds, depression, therapist and ry, among others. As we will Obama and ountries like China). The J48 algorithm trained on this subset several omparisons with other popular methods like We also used in LIBSVM5. Learning algorithms In preliminary studies we tested dierent learning algoForest and Nave Bayes algorithms with BoW and TVT representations and the with the highest information gain and then removing from that list those words hara ter 3-grams, CSA representations and the LIBSVM algorithm, we de ided ertain basi properties in the dierent hunks: values. However, they obtained low re all values. In order to address this an instan e as positive if the three lassiers lassied it in that way. hunks the penalization omponent in the omputation would ae t the ERDE it as positive with probability p ≥ 0.9. where best results were obtained ( hunks 3 and 4) and assuming that after those as positive if both models obtained with TVT-NB and TVT-RF lassied Fig. 2. Pre ision values obtained with TVT models. the words in a white list. That list was obtained from the words with the the instan e as positive with probability and the text in luded all p = 1 words depression and diagnosed. 1. Chunk 1 : Here the ERD system should be extremely onservative and only lassifying an instan e as positive (depressed) if there exists strong eviden e highest information gain of the do uments of the rst hunk. It in luded the plement the TVT methods with a more general approa h. For this end, we TVT’s results we de ided to set some hunk by hunk rules that a omplish 2. Chunk 2 : Here the restri tion of the white list ould be relaxed and omused the predi tions of the J48 de ision tree explained above and lassied aspe t, an instan e was lassied as positive if at least two lassiers lassify Due to the fa t that we wanted to fo us our predi tions on those hunks of that. We used for this ase, the riterion of only lassifying an instan e 3. Chunk 3 : In this hunk most of the lassiers obtained the best pre ision

3.1 Analysis of results

method, this last method played a role of lter obtaining good results when the J48 method. For this reason, any instan e lassied as positive by both, instan es not dete ted for this ombination were lassied as positive by additional information to determine whether an individual was depressive or 5. Chunks 5 to 10 : From hunk 5 forward, we assumed that most of the relevant pre ision. We also ould observe that when it was ombined with the TVT not. For this purpose, we used the same rule of hunk 4 for hunks 5 and 6 but in reasing the probability of BoW and TVT to From hunks 7 p ≥ 0.8. lassied as positive.

BoW and TVT methods with probability or by the J48 method was p ≥ 0.7 both methods lassied an instan e as positive. On the other hand, many it as positive with probability p ≥ 0.9. lassi ations had already been made in the previous hunks. However, we 4. Chunk 4 : BoW obtained in this hunk the highest re all values but low to 10 an instan e was lassied as positive if at least two lassiers lassied kept a monitoring system to identify those ases that needed mu h more 0.24 0.14 0.75 0.22 0.43 0.14 0.4 0.47 0.35 0.4 0.33 0.5 0.59 0.48 0.75 0.52 0.42 0.69 0.42 0.50 0.37 0.50 0.37 0.75 0.56 0.54 0.58 0.54 0.42 0.73 0.47 0.55 0.40 0.20 0.67 0.12 0.55 0.49 0.63 0.51 0.39 0.75 0.55 0.50 0.62 algorithm used to learn the model, and the probability threshold. Most of the approa hes to the CTD aspe t. The obtained results are on lusive in this ase. stage we ould not obtain su h as good results. This makes us on lude that if the organizers, we ould analyze what would have been the performan e of the with 9.68). In this ontext, TVT a hieves the best value (U N SLA) ERDE5 Our ERD system tested on the pilot task was derived from our analysis of the algorithms. Besides, dierent probability values were tested for the dynami were lowest than the best one reported in the pilot task (the ombined methods with TVT representation and using Nave Bayes and Random Forest as learning the TVT method had parti ipated alone in the pilot task, it had obtained similar had been sele ted. reported up to now (12.30) with the setting TVT-RF and the lowest (p ≥ 0.8) value (8.17) with the model TVT-NB The performan e of ERDE50 (p ≥ 0.8). TVT shows a high robustness in the measures independently of the ERDE TVT’s values were low and in 7 out of 10 settings, the values ERDE5 ERDE50 on e the golden-truth information of the set was made available by T EDS weakness and strengths of the TVT method on the training data. However, or better results than the ones obtained with the ombined methods. the TVT methods on the test set was surprising for us be ause on the training models, in parti ular the TVT method, working alone if dierent probabilities Table 6 shows this type of information by reporting the results obtained

3 Training Stage

CHI '16 , pages 20982110 , New York, NY, USA, 2016 . ACM.

Major depression duration redu es appetitive word use: An elaborated verbal re all

and psy hometri properties of liw 2015 . 2015 .

of emotional photographs . Journal of Psy hiatri Resear h , 47 ( 6 ): 809 815 , 2013 .

577, 2003 .

Cagnina . Temporal Variation of Terms as on ept spa e for early risk predi tion . In

Predi tion on the Internet: Experimental foundations . In Pro eedings Conferen e

1. M. R. Cape elatro, M. D. Sa het, P. F. Hit h o k, S . M. Miller , and W. B. Britton .

3. M. L. Erre alde , M. P.

Villegas , D. G.

Funez , M. J.

Gar iarena U elay , and L. C.

60 ( 1 ): 926 , 2009 .

G. J. F.

Jones ,

Lawless ,

Gonzalo ,

Kelly ,

Goeuriot ,

Mandl , L. Cappel-

and Efstathios

Stamatatos . Dis riminative subprole-spe i representations for

and Labs of the Evaluation Forum CLEF 2017 , Dublin, Ireland, 2017 .

bution. Journal of the Ameri an So iety for Information S ien e and Te hnology,

and Intera tion. , volume 10456 . Springer, 2017 .

Language

Use , pages 2839 . Springer International Publishing, Cham, 2016 .

words: an liw analysis of sui ide notes from spain . European Psy h. , 27 : 1 , 2012 .

lato , and N. Ferro, editors, Experimental IR Meets Multilinguality , Multimodality,

Pro eedings of the 2016 CHI Conferen e on Human Fa tors in Computing Systems,

Gar ia-Caballero, J. JimØnez , M. Fernandez-Cabana ,

and I.

Gar a-Lado. Last

D. E.

Losada and

Crestani . A Test Colle tion for Resear h on Depression and

11.

J.W.

Pennebaker ,

M.R.

Mehl , and

K.G.

Niederhoer . Psy hologi al aspe ts of nat-

ural language use: Our words, our selves . Annual review of psy hology , 54 ( 1 ): 547

2. M. De Choudhury , E. Ki iman, M. Dredze, G. Coppersmith, and M. Kumar . Dis-

10.

J. W.

Pennebaker ,

R. L.

Boyd ,

Jordan , and K. Bla kburn . The development

author proling in so ial media . Knowledge-Based Systems , 89 : 134 147, 2015 .

Koppel , J.

S hler, and

Argamon . Computational methods in authorship attri-

semanti analysis. Pattern Re ogn . Lett. , 32 ( 3 ): 441448 , February 2011 .

psy hology of word use in depression forums in english and in spanish: Testing two

A. P.

Lpez-Monroy , M. Montes y Gmez, H. J. Es alante , L. Villaseaeor-Pineda,

12. N. Ramirez-Esparza , C. K.

Chung , E.

Ka ewi z, and

J. W.

Pennebaker . The

D. E.

Losada ,

Crestani , and J. Parapar. eRISK 2017 : CLEF Lab on Early Risk

overing shifts to sui idal ideation from mental health ontent in so ial media . In

text analyti approa hes . In Pro . ICWSM 2008 , 2008 .

Li ,

Xiong ,

Zhang , C. Liu, and

Li . Fast text ategorization using on ise