<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>INAOE-CIMAT at eRisk 2019: Detecting Signs of Anorexia using Fine-Grained Emotions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Ezra Aragon</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Pastor Lopez-Monroy</string-name>
          <email>pastor.lopez@cimat.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Montes-y-Gomez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Investigacion en Matematicas (CIMAT)</institution>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Nacional de Astrof sica</institution>
          ,
          <addr-line>Optica y Electronica (INAOE)</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present our approach to the detection of anorexia at eRisk 2019. The main objective of this shared task is to identify as soon as possible if a user shows signs of anorexia by using their posts on Reddit. For this, we evaluate a representation called Bag of SubEmotions (BoSE), a new technique that represents user posts by building a set of ne-grained emotions. At the beginning, emotions are de ned according to categories in a given lexical resource, then ne-grained emotions are discovered by clustering word vectors in each category. For our participation, we chose to evaluate di erent strategies based on the temporal stability that a user presents and perform early predictions using this representation. The proposed approach shows better performance than the average results of other participants; in addition, due to its interpretability and simplicity, it o ers an excellent opportunity for the analysis and detection of mental disorders in social media.</p>
      </abstract>
      <kwd-group>
        <kwd>eRisk 2019 Anorexia Detection Bag of Sub-Emotions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Anorexia nervosa is an eating disorder that a ects many adolescents and young
adults these days. It is a desire to lose weight through excessive restriction of the
number of calories and the types of food people eat. Anorexia is characterized by
di culties of maintaining appropriate body weight, and in many people presents
a distorted body image. Anorexia can a ect people of all ages and genders. The
2019 Early Risk Prediction on the Internet (eRisk@CLEF) shared task 1 has the
objective of dealing with this problem by using Natural Language Processing
(NLP) techniques and machine learning. The main goal is to identify if a user
presents signs of anorexia as soon as possible, by processing their post history as
pieces of evidence. Posts are processed in the order they were created, applying
sequentially monitoring of the user's interactions in their social media platforms.</p>
      <p>In this work, we described the joint participation of INAOE-CIMAT, two
research centers from Mexico, at this forum using a new representation that
we have called Bag of Sub-Emotions (BoSE), an interpretable and
straightforward approach, based on the usage of ne-grained emotions to capture speci c
emotions that the users present on their post. This representation is created by
using a clustering algorithm over a lexical resource of emotions and then mask
the post of the users to generate a histogram of these new emotions. We evaluate
our representation using ve di erent strategies for the early prediction.</p>
      <p>The remainder of this paper is as follows: Section 2 presents some related work
for the anorexia detection task and early predictions. Section 3 describes our
new text representation based on ne-grained emotions. Section 4 and Section 5
presents the experimental settings as well as the obtained results. Lastly, Section
6 depicts our conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In this Section, we present a review of the di erent works related to the detection
of anorexia in social media. Anorexia is the most common Eating Disorder (ED)
related to a mental disorder, and consists of an unusual habit of eating or
abnormal attitudes towards food [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Several works in the literature have focused on
analyzing user-generated content from their social media platforms to identify
signs of anorexia. Some of these works have proposed to analyze the user posts
to generate syntactic and semantic features [2{6], where they explore the words
that are often used by people with anorexia signs. Another well-known strategy
is the employment of words or dictionaries that are related to anorexia, and then
create a representation by using the occurrence or frequency of such words [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Other examples that had been explored are the Deep Learning techniques, which
also are getting competitive results [
        <xref ref-type="bibr" rid="ref2 ref6 ref8">2, 6, 8</xref>
        ]. Last but not least, a traditional type
of strategy is to exploit sentiment analysis to create emotional characteristics to
represent each user post [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ]; inspired in this last approach we explore the
usefulness of a representation based on a set of automatically-learned ne-grained
emotions, which help to model the emotional pro le of users in a more speci c
way.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Representation</title>
      <p>In this section, we describe the representation that was used to participate in the
shared task. Our approach is inspired by the hypothesis that emotions are better
represented at a ner level, instead of only using general concepts as "anger" or
"joy".</p>
      <p>Figure 1 illustrates the general steps of our proposed approach. The rst part
explains the generation of the ne-grained emotions given an emotion lexicon.
The second part depicts a masking process used to have the ne-grained emotions
as tokens, and then the creation of their histogram as nal representation.</p>
      <p>
        Generate Fine-Grained Emotions: We use a lexical resource to compute
a set of ne-grained emotions based on eight recognized main emotions (Anger,
Anticipation, Disgust, Fear, Joy, Sadness, Surprise and Trust) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and two main
sentiments (Positive and Negative). In this stage, we compute a word vector
from FastText for each word presented in the lexical resource. Then, we create
subgroups of words separated by emotions employing the A nity Propagation
clustering algorithm and use their centroids (prototypes) as a new vocabulary
for the ne-grained emotions.
      </p>
      <p>
        Convert Text to Fine-Grained Emotions: Once we calculate the
negrained emotions, we utilize them to mask the text by measuring the cosine
distance between the words in the documents and the ne-grained emotions.
Then, we represent the posts of the users creating a histogram of the frequencies
of ne-grained emotions. We named this representation BoSE, for Bag of
SubEmotions (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for more details).
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Settings</title>
      <p>
        This shared task is a continuation of eRisk 2018 T2 task [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which consists in
detecting traces of anorexia in users of Reddit as soon as possible. The latter is
done by sequentially processing the users' posts. This year, organizers modi ed
the way the data is released, which was variable-chunk-lenght 3 based in 2017
3 e.g., users that wrote more, would have more information per chunk
and 2018, but now is an item-by-item version. The latter means that a server
iteratively provides user writings in chronological order, using a token identi er
for each team. For each writing that the server o ers, we need to respond with
a prediction to continue with the next round of posts; otherwise, the server will
be still waiting.
      </p>
      <p>Our objective for the shared Task 1 is to decide if a user presents signs of
anorexia applying every ve posts a preprocessing and a classi cation procedure
to make the labels for each user. Lastly, we used ve di erent strategies to sent
the predictions. We explained the whole process below.</p>
      <p>Preprocessing: For the experiments, the posts are normalized by removing
special characters and lowercasing all the words. After these processes, texts are
masked using the ne-grained emotions previously computed.</p>
      <p>
        Classi cation: Once we built the BoSE representation, we selected the most
relevant features (sequences of ne-grained emotions) by using the chi2
distribution Xk2 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. To classify the users, we used a Support Vector Machine (SVM)
with a linear kernel and C = 1.
      </p>
      <p>Prediction making: For each post that the server provides, we need to
make a prediction to tell if the user presents signs of anorexia or not, and the
main idea is to make a correct detection as soon as possible. We tackled the
task by using the following ve strategies: i ) we considered the label obtained
directly from the classi er; ii ) we used the probability of the label, assigned as
positive if the chance is higher than 60% of belonging to that class; iii ) similar
to the rst strategy, we considered the label obtained directly from the classi er,
but only assigned the label 1 if the user is detected as positive in the previous
prediction as well; iv ) the user is classi ed as positive if the probability of the
classi er is higher than 60% in the actual and previous predictions; v ) similar
to the fourth strategy but the classi cation probability needs to be higher than
70%.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>To determine the parameters for the model before the prediction in the server,
we rst evaluated our model with the dataset provided in 2018. For that corpus,
there are two categories of users: with anorexia and control. We measured the
F1 over the predictions using the whole post history of the users. In Table 1 we
present the obtained results over the training dataset; we compare our approach
with traditional representations like Bag of Words (BoW) using unigrams and
n-grams and a representation based on the core emotions that we named Bag of
Emotions (BoE).</p>
      <p>For the test dataset, we trained our model using all the users in the training
dataset and then we determined if the users show or not show traces of anorexia
using the ve di erent strategies mentioned in Section 4. Table 2 show the results
obtained by the ve strategies over the test dataset. Note that on these results:
run1 did not work on the server, and we still do not know the reason for this,
therefore, their results are not included in the table. The strategy that obtained
the best results is the fourth (marked as run3); it consists in classifying the
user as positive if the probability is higher than 60% in the actual and previous
prediction, which involves the temporal stability obtained by the classi er where
we get two consecutive positives predictions over the user.</p>
      <p>To a further analysis of our results in the rst part of Figure 2, we present
a boxplot of all the results obtained for F1 measure and Latency-weighted F1,
the green X mark represents the position of our results. We can appreciate that
our results for both evaluation metrics are in the highest quartile, indicating the
good results obtained for this task.</p>
      <p>
        In the second part of Figure 2, we present the boxplots of the results of all
participants in accordance to the ERDE5 and ERDE50 evaluation metrics. In
these results, our approach is placed in the middle quartile. This performance
was somehow expected since our approach does not focus on fast prediction, but
more on the temporal stability of the predictions. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] presents the overall results
of the task as well as a complete analysis of every team approach.
      </p>
      <p>The interpretability of our method allows us to o er more analysis of what
is captured by the ne-grained emotions, we selected some of the most relevant
ne-grained emotions for the detection according to the chi2 distribution. In
Table 3, we present some of these ne-grained emotions as well as some words
that correspond to them. We can observe that most of the emotions are related
to psychical or mental harms like bruising, breakdown, abandoned; or body parts
near the stomach or intestine, which are topics that people commonly associated
to anorexia.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we present our approach to decide if a user presents signs of
anorexia by using the post history in chronological order and make a
predicanger4 bruising, contusion, bleeding, fracture
disgust32 breakdown, ght, crushed, abandoned
disgust21 stomach, intestinal, bile, esophagus
negative65 bathroom, toilet, washroom
anticip10 hurting, refused, anxious, afraid
anticip12 ashamed, embarrass, upset, disgust
fear19 food, eating, eat, consume
tion as soon as possible. We proposed a new representation that automatically
creates ne-grained emotions using a lexical resource of emotions and FastText
sub-word embeddings. The main idea of using these ne-grained emotions is
that our representation can capture more speci c emotions and topics that the
users express through their posts and help to detect potential users that have
anorexia. Over the training dataset, our representation obtained better results
than most of the best previous eRisk participant's methods. The simplicity and
interpretability of our representation are worth mentioning, which di ers with
other methods that are more di cult and complex, in particular those that used
a lot of di erent features and di erent models from traditional to deep. For the
testing dataset, our representation also obtains good results in comparison with
most of this year participants, proving evidence about the usefulness of
capturing the speci c emotional content of users that have anorexia. Our results
represent an opportunity to use BoSE in other health tasks such as Depression
or Post-Traumatic Stress Disorder (PTSD).
This research was supported by CONACyT-Mexico (Scholarship 654803 and
Project FC-2410).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aragon</surname>
          </string-name>
          , ME.,
          <string-name>
            <surname>Lopez-Monroy</surname>
          </string-name>
          , AP.,
          <string-name>
            <surname>Gonzalez-Gurrola</surname>
          </string-name>
          , LC.,
          <string-name>
            <surname>Montes-</surname>
            y-Gomez,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Detecting Depression in Social Media using Fine-Grained Emotions</article-title>
          .
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
          </string-name>
          , CM.:
          <article-title>Word Embeddings and Linguistic Metadata at the CLEF 2018 Tasks for Early Detection of Depression and Anorexia</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ramiandrisoa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farah</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
          </string-name>
          , V.:
          <article-title>IRIT at e-Risk 2018</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ortega-Mendoza</surname>
          </string-name>
          , RM.,
          <string-name>
            <surname>Lopez-Monroy</surname>
          </string-name>
          , AP.,
          <string-name>
            <surname>Franco-Arcega</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-Y-Gomez</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>PEIMEX at eRisk2018: Emphasizing Personal Information for Depression and Anorexia Detection</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ram</surname>
            rez-Cifuentes,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freire</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>UPF's Participation at the CLEF eRisk 2018: Early Risk Prediction on the Internet</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <source>TUA1 at eRisk 2018. Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ragheb</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moulahi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aze</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Servajean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Temporal Mood Variation: at the CLEF eRisk-2018 Tasks for Early Risk Detection on The Internet</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , YT.,
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , HH.,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , HH.:
          <article-title>A Neural Network Approach to Early Risk Detection of Depression and Anorexia on Social Media Text</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Walck</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Hand-book on Statistical Distributions for experimentalists</article-title>
          . University of Stockholm,
          <source>Internal Report SUFPFY/9601</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Losada, DE.,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <source>Overview of eRisk</source>
          <year>2018</year>
          :
          <article-title>Early Risk Prediction on the Internet (extended lab overview)</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association, CLEF</source>
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Losada, DE.,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <source>Overview of eRisk</source>
          <year>2019</year>
          :
          <article-title>Early Risk Prediction on the Internet</article-title>
          .
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 10th International Conference of the CLEF Association, CLEF</source>
          <year>2019</year>
          , Lugano,
          <string-name>
            <surname>Switzerland.</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turney</surname>
          </string-name>
          , P.D.:
          <article-title>Crowdsourcing a Word-Emotion Association Lexicon</article-title>
          .
          <source>Computational Intelligence</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. American Psychiatric Association:
          <article-title>Diagnostic and Statistical Manual of Mental Disorders</article-title>
          .
          <source>Fourth Edition</source>
          . Washington, DC: American Psychiatric Press. (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>