<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UAM's participation at CLEF eRisk 2017 task: Towards modelling depressed bloggers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Esau Villatoro-Tello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriela Ram rez-de-la-Rosa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hector Jimenez-Salazar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Language and Reasoning Research Group, Information Technologies Department, Universidad Autonoma Metropolitana (UAM) Unidad Cuajimalpa</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we describe the participation of the Language and Reasoning Research Group of UAM Cuajimalpa at eRisk 2017 pilot task: Early Risk Prediction on the Internet. The goal of the eRisk task consists in detecting with enough anticipation cases of depression on texts. For evaluating this task, organizers provided a dataset containing comments from a set of Social Media users. All comments are chronologically ordered and represent writings from depressed and non-depressed users. Our proposed approach addressed this problem by means of graph models. This type of representation allows to capture some inherent characteristics from documents that can be determined though traditional graph measurements, and then, employed as features in a supervised classi cation system. Obtained results indicate that more experiments, as well as a more thorough analysis is required to conclude regarding the pertinence (or not) of our proposed strategy.</p>
      </abstract>
      <kwd-group>
        <kwd>N-gram graphs</kwd>
        <kwd>Graph similarities</kwd>
        <kwd>Early text classi cation</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Social media is an excellent tool for anyone to express their opinion about any
topic in any context; furthermore, social media is perhaps the most used
communication channel nowadays. In spite of this easiness of communication, this type
of tools comprise a major threat to users, who are exposed to a number of risks
and potential attacks. Consider, for instance, the problem of detecting sexual
predators approaching minors or the identi cation of aggressive users [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1,2,3,4,5</xref>
        ].
These threats pose a challenge to the research community, that has to develop
protective and preventive tools for avoiding potential risks. As mentioned in [
        <xref ref-type="bibr" rid="ref6">6,7</xref>
        ],
more recently, special attention has been paid to another type of threats, such as
those menaces coming from the individuals themselves, for instance depression,
mental situation that may lead to more complicated situations such as suicide.
      </p>
      <p>Although considerable research has been devoted to detect di erent types of
threats, most of the current solutions work in a forensic scenario, i.e., they are
applied once the treat or the attack has been accomplished. Even though these
forensic methods are very useful in some particular scenarios, it is known that
preventive mechanisms would have a greater and immediate impact into users
own security.</p>
      <p>Accordingly, the eRisk [7] at CLEF-20171 proposes an exploratory task on
early risk detection of depression. The challenge consists of sequentially
processing pieces of evidence and detect early traces of depression as soon as possible.
The task is mainly concerned about evaluating Text Mining solutions and, thus,
it concentrates on texts written in Social Media2.</p>
      <p>We address the eRiks problem as a Text Classi cation (TC) problem.
However, contrary to most TC techniques, we go beyond the Bag-of-Terms (BoT)
models, and instead we represent documents and categories as graph models.
This type of representation is able to incorporate contextual information by
means of considering terms3 co-occurrence values, which is an important
limitation of the traditional BoT representation [8]. Thus, our main hypothesis
establishes that writing patterns, both from content and style, can be
encapsulated by means of this type of representation and would provide to a classi er
discriminatory information for learning to distinguish depressed bloggers.</p>
      <p>The remainder of the paper is organized as follows. Section 2 describes the
applied methodology; Section 3 shows our experimental setup and obtained
results; nally, in Section 4 some conclusions of this work are presented.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>Our work was mainly inspired by the ideas proposed in [8,9]. Contrary to the
traditional Bag-of-Terms representation model, a graph model considers the order
of terms' appearances in the original text, thus incorporating valuable
contextual information to the representation. Generally speaking, the graph model
associates neighboring pairs of tokens with edges that denote their frequency
of co-occurrence. As a result, documents with di erent term sequences end up
having identical or at least highly similar representations. It is worth mentioning
that authors from [8,9], performed and exhaustive evaluation of this approach
on thematic text classi cation, but for the best of our knowledge, this approach
has not been previously evaluated in a non-thematic text classi cation task.
Accordingly, we consider as an important contribution of this paper the pertinence
evaluation of this approach on the posed task.</p>
      <p>Thus, for applying the method described in [8] we need to perform the
following two steps: i) build the classes' graph, i.e., a prototype graph model for
each category; and, ii) extract similarity features to train a supervised classi er,
each instance is represented by 4 N features (where N is equal to the number
of categories, jCj).</p>
      <p>As we mentioned, during the rst step we create the prototype graph for each
category cj 2 C. Thus, we represent each document di from the training set,
1 Conference Labs of the Evaluation Forum: http://clef2017.clef-initiative.eu
2 http://early.irlab.org
3 Unless explicitly mention, a term can be either a character n-gram or word n-gram.
belonging to category cj as the graph Gij . Notice that each node from Gij is a
term (or token) w within the document di, and the edges of Gij account for the
number of co-occurrences of every pair of nodes (terms) in the same contextual
window within di. For the experiments reported in this paper we employed as
terms character n-grams of size 3, and single words; and the size of the contextual
window was de ned as a symmetric window of 4 terms, meaning two terms to
the left and two terms to the right from token w.</p>
      <p>Once the graph for every document d has been generated, its necessary to
merge all the graphs from documents of the same class cj , resulting in the
prototype graph of the class. As established in [8], the prototype graph has the
following properties: its edges (nodes) comprise the union of the edges (nodes)
of the individual document graphs, and its weights are adjusted so that they
converge to the mean value of the respective weights. At the end, the resulting
prototype graph captures patterns common in the content and style of the entire
category, such as recurring and neighboring character and word sequences.</p>
      <p>For the second stage, the training phase, we need to compute the features
for every new document in relation to the prototype graph models. Basically,
we compute the similarity between documents and prototype graphs through
the closeness of their respective graph representations. Similarly as the work
described in [8] we employed the three measures proposed by the author and we
incorporated a content based metric. Next we describe the intuitive ideas behind
each metric, however for further reference please follow [8,9].</p>
      <p>{ Containment Similarity (CS). This metric expresses the proportion of edges
of a graph Gk that are shared with a second graph Gl, i.e., sequences of
shared nodes and edges.
{ Value Similarity (VS). This measure indicates how many of the edges
contained in graph Gk are contained in graph Gl, as well, considering also the
weights of the matching edges. Notice that VS converges to its maximum
value for graphs that share both the edges and the corresponding weights.
{ Normalized Value Similarity (NVS). This is a derived measure from the
previous metric, but without considering the relative size of the graphs being
compared.
{ Dice Similarity (DS). It it worth mentioning that this metric was not
originally considered in [8]. This metric accounts for the number of shared nodes
between graphs.</p>
      <p>As we mentioned before, every instance will be represented by a feature vector
of size 4 N ; where N is the number of categories in the classi cation problem
and 4 similarity measures, namely: CS, VS, NVS and DS.</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and results</title>
      <p>This section is organized as follows: rst we describe the provided dataset for
performing our experiments; next we provide details on every system's con
guration; then a brief description of the evaluation metrics, and nally, we discuss
the obtained results.
3.1</p>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>
          The test collection for the pilot task was initially described in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Consist in a
collection of posts from a set of social media users. There are two categories of
users, depressed and non-depressed, and, for each user, the collection contains a
sequence of writings in chronological order. In order to simulate an early
detection scenario, the dataset was divided into 10 chunks. The rst chunk contains
the oldest 10% of the messages, the second chunk contains the second oldest
10%, and so forth. In summary, the training set contained 486 users (83
depressed, 403 non-depressed) and the test set contained 401 users (52 depressed,
349 non-depressed). Further details can be found in [7].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Submitted runs</title>
        <p>As we mentioned before, for generating the prototype graphs for each category
(depressed and non-depressed ), we followed the methodology described in
section 2 using two di erent forms for de ning the terms w: character 3-grams and
single words. It is important to mention that our generated graph models were
constructed using the 100% of the provided training data, i.e., the prototype
models do not consider the chronology of the writings. At this stage, we only
wanted to evaluate if this type of representation was capable of modeling
depressed blogs at any time. Next we brie y describe the general con guration of
each of our competing system.</p>
        <p>{ LyRA: For this con guration, the prototype graph models are build using
as w terms single words. The idea behind this approach was to evaluate if
thematic correspondences (sequences of words) might be helpful in
distinguishing the writing patterns of depressed users.
{ LyRB: Contrary to the previous con guration, this systems employs
character 3-grams as terms for the construction of the prototype graph models.
The rationale idea behind using character n-grams is to represent content
and style patterns in the graph model, characteristics that might result
helpful when distinguishing among depressed users.
{ LyRC: We refer to this con guration as an hybrid system given that each
instance is represented by a feature vector of size 2 4 N , i.e., 16 features
extracted from LyRA and LyRB systems. With this setting we wanted to
evaluate how complementary are both graph models (based on single words
and character n-grams), in solving the posed task.
{ LyRD: This is referred as a conservative ensemble method. This con
guration considers the output of the LyRA, LyRB and LyRC classi cation
systems, and assigns the class depressed if and only if the three systems agree
on assigning this category.
{ LyRE: Similarly to the previous system, this is also an ensemble method
but contrary to LyRD, this is a majority vote scheme among the decissions
from LyRA, LyRB and LyRC. Another characteristic of this system is that
it began to work until chunk 9, once enough information was accumulated
in order to emit a more con dent decision.</p>
        <p>As described in [7], the test stage consisted of 10 sequential releases of data,
and each participating system has to choose, for each user in the collection,
between two options: (a) emitting a decision on the user (i.e. depressed or
nondepressed), or (b) waiting to see more chunks. For all our proposed systems, if
the classi er assigns the class \depressed" to an user in any chunk, we emitted
the nal decision, otherwise we choose the option \seeing more chunks". It is
important to mention that all our experiments waited until the last chunk for
emitting the \non-depressed" decision; this is, a user that is di cult to classify
across chunks is at the very end (i.e., chunk 10) acknowledged as non-depressed
users. Finally, as our classi cation algorithm we employed a lazy method, namely
k-nearest neighbors algorithm, particularly we employed the provided
implementation by the Weka toolkit [10], with k = 1 using an Euclidean distance4.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Evaluation</title>
        <p>
          For reporting our obtained results we employ traditional set-based metrics such
as F-measure, Precision and Recall. However, as described in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], these metrics
are time unaware, and for this reason we also report our results using the o cial
ERDE metric. Intuitively, the ERDE metric considers the correctness of the
(binary) decision and the delay taken by the system to make the decision.
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Results</title>
        <p>Figure 1 shows the behaviour of our proposed systems across di erent chunks.
For this analysis we do not consider as nal decision the classi cation results of
preceding chunks, i.e., we evaluate our classi ers performance at every chunk.</p>
        <p>Several important aspects can be observed across systems, for instance, all
con gurations, except for LyRE, obtained good precision results between chunk
3 and chunk 6. In general, after chunk 5 (chunk 4 and 6 for the LyRC con
guration), the performance decays very rapidly; however as more evidence (text
from chunks) is obtained, the performance begins to recover up to chunk 10,
which is an expected behaviour since our prototype graphs were built using the
4 http://www.cs.waikato.ac.nz/ml/weka/
2
3</p>
        <p>Recall
4
7
0.25
0.2
0.15
0.1
0.05</p>
        <p>0 1
entire set of posts from depressed/non-depressed users. Although more
experiments need to be done, our explanation for the peaks reached between chunks 3
and 6 is due to highly distinctive characteristics of depressed users at particular
points of their writings. For proving this hypothesis, we need to test our
proposed method considering the chronological order of the texts, in other words,
to generate prototype graphs for di erent depression stages (chunks).</p>
        <p>Regarding the terms considered by the graphs, our experiments indicate that
both models, single words and character n-grams, are to some extent
complementary to each other, e.g., observe that the LyRC is able to obtain in a couple of
times high precision results. Similarly, the conservative ensemble (LyRD) shows
a more stable performance (in terms of precision) in comparison to LyRA and
LyRB con gurations.</p>
        <p>In Table 1 we observe the o cial results of our proposed systems during
the eRisk competition. Notice that the last three rows from Table 1 report the
maximum, minimum and average values for each metric obtained from all the
participating systems5. According to the ERDE metrics our best con guration
was the LyRE system, however, as we mentioned before, this system began to
emit decisions very late on the process, in chunk 9 (see Figure 1). For this reason,
we assume as our best con guration the LyRD system, which obtains minimum
error rates and better precision values.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper, we have described the experiments performed by the Language
and Reasoning Research Group from UAM Cuajimalpa in the context of the
eRisk 2017 pilot task. Our proposed system was designed for addressing the
posed problem by means of using as main form of representation graphs
models; a graph model considers the order of terms' appearances in the original
text, thus incorporating valuable contextual information to the representation.
5 Further details on each participating system can be found in [7].</p>
      <p>Even though this type of representation has been evaluated on thematic text
classi cation, our main goal was to determine its pertinence on non-thematic
classi cation tasks. Thus, the main hypothesis is that through this
representation we can capture patterns common in the content and style from depressed
and non-depressed users, such as recurring and neighboring character and word
sequences.</p>
      <p>From this exercise we have learned that our proposed method performs
better when character n-grams are employed to build the graph models. However,
combining this representation with graph models based on single words allows
to identify more depressed users in early stages, but not in general. In addition,
our results indicate that depressed users seem to reach a climax point during
the rst chunks of its writing, and as the times passes, they return to more
\traditional" writing. Nonetheless, we need to perform more experiments in order
to validate this hypothesis, for instance, to evaluate our proposal at modeling
di erent depression stages.</p>
      <p>As future work we plan to incorporate more features to our representation,
for instance POS tags and word n-grams, which could help to identify more
elaborated regularities among users. We also plan to model the writing
characteristics from users at di erent chunks individually. And nally we want to
perform experiments using di erent classi cation methods.</p>
      <p>Acknowledgments. This work was partially funded by CONACYT under
project grant number 258588. We also thank to UAM Cuajimalpa for the
provided support.
7. D. E. Losada, F. Crestani, and J. Parapar, \eRISK 2017: CLEF Lab on Early Risk
Prediction on the Internet: Experimental foundations," in Proceedings Conference
and Labs of the Evaluation Forum CLEF 2017, (Dublin, Ireland), 2017.
8. G. Giannakopoulos, P. Mavridi, G. Paliouras, G. Papadakis, and K. Tserpes,
\Representation models for text classi cation: A comparative analysis over three web
document types," in Proceedings of the 2Nd International Conference on Web
Intelligence, Mining and Semantics, WIMS '12, (New York, NY, USA), pp. 13:1{13:12,
ACM, 2012.
9. G. Giannakopoulos, V. Karkaletsis, G. Vouros, and P. Stamatopoulos,
\Summarization system evaluation revisited: N-gram graphs," ACM Trans. Speech Lang.</p>
      <p>Process., vol. 5, pp. 5:1{5:39, Oct. 2008.
10. S. R. Garner, \Weka: The waikato environment for knowledge analysis," in In Proc.
of the New Zealand Computer Science Research Students Conference, pp. 57{64,
1995.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>G.</given-names>
            <surname>Inches</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , \
          <article-title>Overview of the international sexual predator identi - cation competition at pan-2012.," in CLEF (Online working notes</article-title>
          /labs/workshop), vol.
          <volume>30</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>E.</given-names>
            <surname>Villatoro-Tello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Juarez-Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montes-y-</article-title>
          <string-name>
            <surname>Gomez</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Villasen</surname>
          </string-name>
          <article-title>~or-Pineda, \A two-step approach for e ective detection of misbehaving users in chats," in</article-title>
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          (Online Working Notes/Labs/Workshop) (P. Forner,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          Womser-Hacker, eds.),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Villatoro-Tello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Juarez</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montes-y-</article-title>
          <string-name>
            <surname>Gomez</surname>
          </string-name>
          , and L. Villasen~or, \
          <article-title>Sexual predator detection in chats with chained classi ers,"</article-title>
          <source>in Proceedings of the 4th Workshop on Computational Approaches</source>
          to Subjectivity, Sentiment and
          <string-name>
            <surname>Social Media Analysis</surname>
          </string-name>
          ,
          <source>(Atlanta, Georgia)</source>
          , pp.
          <volume>46</volume>
          {
          <issue>54</issue>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Callejas-Rodr guez</surname>
          </string-name>
          , E.
          <string-name>
            <surname>Villatoro-Tello</surname>
            ,
            <given-names>I. Meza</given-names>
          </string-name>
          , and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Ram rez-de-la Rosa, From Dialogue Corpora to Dialogue Systems: Generating a Chatbot with Teenager Personality for Preventing Cyber-Pedophilia</article-title>
          , pp.
          <volume>531</volume>
          {
          <fpage>539</fpage>
          . Cham: Springer International Publishing,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>K</surname>
            .-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Jeong and K.-S. Le</surname>
          </string-name>
          , \
          <article-title>Follower behavior analysis via in uential transmitters on social issues in twitter,"</article-title>
          <source>Computacion y Sistemas</source>
          , vol.
          <volume>20</volume>
          , no.
          <issue>3</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , \
          <article-title>A test collection for research on depression and language use," in Conference Labs of the Evaluation Forum</article-title>
          , p.
          <fpage>12</fpage>
          , Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>