<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta />
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>3 For instan e, ex eeding a spe i onden e threshold in the predi tion of the
laste hnique that aims at dealing with those short omings by interpreting words
sis approa hes su h as latent semanti analysis (LSA) [2℄ and expli it semanti
2.1 Con ise Semanti Analysis
two well known drawba ks. First, their high dimensionality and sparsity; se
and do uments in a spa e of on epts. Dierently from other semanti
analysier [9℄.
ond, they do not apture relationships among words. CSA is a semanti analysis
Standard text representation methods su h as Bag of Words (BoW) suer of
The rest of this do ument is organized as follows: Se tion 2 des ribes our
with our method on the eRisk 2017 dataset. Finally, Se tion 4 depi ts potential
spe i point of the time. That aspe t, that ould be named as lassi ation
eRisk 2017 data set and rea hed the best (lowest) reported results up to the
do uments read up to the lassi ation point as standard omplete do uments.
with partial information (CPI) might be addressed with a simple approa h that
onsists in training with omplete do uments as usual and onsidering the partial
poral variation of terms (TVT), seems to show some interesting hara teristi s
use the temporal variation of terms as on ept spa e of a re ent on ise
semanrated approa hes might be used.</p>
      <p>In this arti le we propose an original idea that expli itly onsiders the
sequente hnique whi h models words and do uments in a small on ept spa e whose
issue has been addressed with very simple heuristi although more elabo- rules3
proposed method for the ERD problem. Se tion 3 shows the obtained results
ment. This aspe t, that we will refer as the lassi ation time de ision (CTD)
should be assigned to a do ument, but also de iding when to make that
assignBayes algorithm to deal with partial information.
ti analysis (CSA) approa h [7℄. CSA is an interesting do ument representation
future works and the obtained on lusions.</p>
      <p>In [3℄ the CPI aspe t was onsidered by analysing the robustness of the Nave
tiality of data to deal with the unbalan ed data sets problem. In a nutshell, we
to deal with the ERD problem. In fa t, it obtained a robust performan e on the
on epts are obtained from ategory labels. CSA has obtained good results in
author proling tasks [8℄ and the variant proposed in this arti le, named
temLast, but not least, an ERD system needs to onsider not only whi h lass
moment for and error evaluation measures. ERDE5 ERDE50
words and text fragments in a spa e of on epts that are lose (or equal) to the
uments will be represented in a spa e. That spa e size is usually q-dimensional
ategory labels. For instan e, if do uments in the data set are labeled with q
to standard ategory labels although, as we will see later, they might represent
To explain the main on epts of the CSA te hnique we rst introdu e some
basi notation that will be used in the rest of this work. Let D = {hd1, y1i, . . . , hdn, yni}
more elaborate aspe ts. In this ontext, we will denote as to V = {t1, . . . , tm}
the vo abulary of terms of the olle tion being analysed.
be a training set formed by pairs of do uments and variables that indi- n (di) (yi)
is the on ept spa e. For the moment, onsider that these on epts orrespond
vo abulary size (more than 10000 or 20000 elements in general).
analysis (ESA) [4℄ whi h usually require huge omputing osts, CSA interprets
dierent ategory labels (usually no more than 100 elements), words and do
mu h smaller than standard BoW representations whi h dire tly depend on the
ate the on ept the do ument is asso iated with, where yi ∈ C C = {c1, . . . , cq}</p>
    </sec>
    <sec id="sec-2">
      <title>4 In that work, on epts are referred as proles and subgroups as sub-proles .</title>
    </sec>
    <sec id="sec-3">
      <title>2.2 Temporal Variation of Terms</title>
      <p>
        (
        <xref ref-type="bibr" rid="ref18">4</xref>
        )
do uments, will be onsidered as a new on ept spa e for a CSA method.
als but also by the partial do uments obtained in the dierent hunks. Following
the minority lass is formed not only by the omplete do uments of the
individuinto a ount. In this ontext, one might think that variations of the terms used in
the general ideas posed in CSA, we ould onsider that the partial do uments
work named temporal variation of terms (TVT) arises, whi h onsists in enri
hing the do uments of the minority lass with the partial do uments read in the
An alternative to try to alleviate the UDS problem would be to onsider that
read in the dierent hunks represent temporal on epts that should be taken
these dierent sequential stages of the do uments may have relevant information
for the lassi ation task. With this idea in mind, the method proposed in this
rst hunks. These rst hunks of the minority lass, along with their omplete
      </p>
    </sec>
    <sec id="sec-4">
      <title>3.2 Experimental Results</title>
      <p>3 Experimental Analysis
3.1 Data Set</p>
    </sec>
    <sec id="sec-5">
      <title>6 All the tables generated for the dierent probabilities an be downloaded from</title>
      <p>https://sites.google. om/site/l agnina/resear h/Tables_eRisk17.rar</p>
    </sec>
    <sec id="sec-6">
      <title>However, this is not the worst aspe t. Only a 12% of the individual lassied</title>
      <p>an observe that this model only re overs a 45% of the depressed individuals.</p>
      <p>In our study we onsider the two values of used in the pilot task: o o = 5
low measure (0.19). Table 2 shows similar results when a (random F1 CSA⋆-RF
Bayes lassier. Those values orrespond to the setting where an instan e is
probability greater or equal than 0.8 Surprisingly, the best results for (p ≥ 0.8).
and error values with respe t to the previous model. (ERDE50)
forest) ombination with is used to lassify the writings of the individuals. p ≥ 0.6
all the onsidered measures are obtained on the rst hunk. In this hunk, we
asso iated probability is greater (or equal) than ertain threshold p tr (p ≥ tr).
onsidered as depressive if the lassier assigns to the target/positive lass a
In this study we evaluated 5 dierent settings for the probabilities assigned for
ability of the predi ted lass. In those ases, we an sele t dierent thresholds tr
Table 1 shows the results obtained with a BoW representation and a Nave
their predi tions with some kind of onden e, in general, the estimated
proband In ea h hunk, lassiers usually produ e (ERDE5) o = 50 (ERDE50).
onsidering that an instan e (do ument) is assigned to the target lass when its
as depressed ee tively had this ondition resulting in onsequen e in a very
ea h lassier: and Due to spa e p = 1, p ≥ 0.9, p ≥ 0.8, p ≥ 0.7 p ≥ 0.6.
onstraints, only the best results obtained with a parti ular setting are shown.6
Here, measure is also low but we an observe a deterioration in the F1 (ERDE5)
0.40 0.47 0.35
0.24 0.14 0.75
0.06 0.04 0.15</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Ch. Liu, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Fast text ategorization using on ise</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>minority over-sampling te hnique</article-title>
          .
          <source>J. Artif. Intell. Res.</source>
          ,
          <volume>16</volume>
          :
          <fpage>321357</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          1.
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Bowyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Hall</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Ph</surname>
          </string-name>
          . Kegelmeyer. Smote: Syntheti
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          editors,
          <source>CLEF (Online Working Notes/Labs/Workshop)</source>
          , pages
          <fpage>112</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>- 7th Int. Conf. of the CLEF Asso iation, Portugal</source>
          , pages
          <fpage>2839</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>methods for automati text ategorization</article-title>
          .
          <source>IEEE TPAMI</source>
          ,
          <volume>31</volume>
          (
          <issue>4</issue>
          ):
          <fpage>721735</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>guage use</article-title>
          .
          <source>In Experimental IR Meets Multilinguality</source>
          , Multimodality, and Intera tion
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          .
          <article-title>A test olle tion for resear h on depression and lan-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lan</surname>
          </string-name>
          , Ch. Tan,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Supervised and traditional term weighting</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>author proling in so ial media</article-title>
          .
          <source>Knowledge-Based Systems</source>
          ,
          <volume>89</volume>
          :
          <fpage>134</fpage>
          147,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>California</surname>
          </string-name>
          , USA, pages
          <fpage>9199</fpage>
          .
          <article-title>The Asso iation for Computer Linguisti s</article-title>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          3.
          <string-name>
            <given-names>H.</given-names>
            <surname>Jair</surname>
          </string-name>
          <article-title>Es alante, M. Montes-y-</article-title>
          <string-name>
            <surname>Gmez</surname>
            ,
            <given-names>L. Villaseaeor</given-names>
          </string-name>
          <string-name>
            <surname>Pineda</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <article-title>Erre alde</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>ation ompetition at pan-2012</article-title>
          . In P. Forner,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Womser-Ha</surname>
          </string-name>
          <string-name>
            <surname>ker</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Pineda</surname>
            ,
            <given-names>and E.</given-names>
          </string-name>
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          .
          <article-title>Dis riminative subprole-spe i representations for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          5.
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>In hes and F. Crestani. Overview of the international sexual predator identi-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>semanti analysis. Pattern Re ogn</article-title>
          .
          <source>Lett.</source>
          ,
          <volume>32</volume>
          (
          <issue>3</issue>
          ):
          <fpage>441448</fpage>
          ,
          <year>February 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A.</given-names>
            <surname>Pastor</surname>
          </string-name>
          Lpez-Monroy, M. Montes y Gmez, H. Jair Es alante, L. Villaseaeor-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          4.
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Gabrilovi h and S. Markovit h. Wikipedia-based semanti interpretation for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Harshman</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>and A</surname>
          </string-name>
          . Montoyo, editors,
          <source>Pro . of WASSA NAACL-HLT</source>
          <year>2016</year>
          ,
          <year>2016</year>
          , San Diego,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>Early text lassi ation: a nave solution</article-title>
          . In A.
          <string-name>
            <surname>Balahur</surname>
            ,
            <given-names>E. Van der Goot</given-names>
          </string-name>
          , P. Vossen,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>natural language pro essing</article-title>
          .
          <source>JAIR</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <fpage>443498</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
            <given-names>h</given-names>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>Indexing by latent semanti analysis</article-title>
          .
          <source>Journal of the ASIS</source>
          ,
          <volume>41</volume>
          (
          <issue>6</issue>
          ):
          <fpage>391407</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>