<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integrating UMLS for Early Detection of Sings of Anorexia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Flor Miriam Plaza-del-Arco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pilar Lopez-Ubeda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel C. D az-Galiano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L. Alfonso Uren~a-Lopez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Teresa Mart n-Valdivia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Advanced Studies Center in ICT (CEATIC) Universidad de Jaen</institution>
          ,
          <addr-line>Campus Las Lagunillas, 23071, Jaen</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Mental disorders are one of the main concerns of today's society. Early detection of symptoms can greatly help people who suffer from these illnesses. Nowadays, social media play an important role in peoples mental health. Therefore, the treatment of this information using NLP technologies can be applied to automatically detect mental problems such as eating disorders. In this paper, we describe our participation at CLEF eRisk 2019. In particular, we have participated in Task 1: Early Detection of Signs of Anorexia. We have developed three systems based on machine learning. Our main contribution is the use of external knowledge in our systems such as UMLS and similarity embeddings. Our results shown that the use of biomedical ontologies improve the accuracy of the systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Anorexia</kwd>
        <kwd>SVM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>TF-IDF Similarity Embeddings UMLS
Mental disorders are one of the diseases that most concern society today. They
embrace a wide range of problems with di erent symptoms. However, they are
usually characterized by a combination of abnormal thoughts, perceptions,
emotions, behaviour, and relationships with others. Examples are anxiety,
dissociative identity, depression, bipolar, schizophrenia or anorexia nervosa.</p>
      <p>According to a study of World Health Organization, 450 million people
suffer from a mental or behavioural disorder, one in four families has at least one
member a ected by a mental disorder and about 1 million people commit
suicide each year. Mental disorders often in uence other diseases such as cancer or
cardiovascular disease. Therefore, people with this type of problem have
disproportionately high rates of disability and mortality.</p>
      <p>
        Nowadays, social media play an important role in people's mental health
[
        <xref ref-type="bibr" rid="ref17 ref4">17, 4</xref>
        ]. The language and vocabulary that users use to express themselves in
social media may indicate feelings of guilt, helplessness, hatred or contempt for
themselves, which are some symptoms of depression [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. People su ering from
eating disorders, such as anorexia and bulimia, can often be identi ed through
the use of certain keywords that characterize and promote these disorders [
        <xref ref-type="bibr" rid="ref1 ref19">1,
19</xref>
        ].
      </p>
      <p>
        The burden of mental disorders continues to grow with signi cant impacts
on health and major social, human rights and economic consequences in all
countries of the world. Technology can be applied to develop systems for detect
mental disorders in social media. These models use features or variables that
have been extracted from labeled user-generated data [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To collect the data,
the most popular platforms are usually Twitter, Facebook or Reddit [
        <xref ref-type="bibr" rid="ref15 ref19 ref6">6, 15, 19</xref>
        ].
      </p>
      <p>The most common features used to build predictive models are those related
to the user texts such as: topics, frequencies of each word or multiple words,
features based on sentiment analysis to measure the subjectivity of a sentence and
features derived from lexicons like LIWC to measure the usage of self references,
social words and emotions.</p>
      <p>
        In this paper, we present the di erent systems we have developed as part of
our participation at CLEF eRisk 2019: Early risk prediction on the Internet [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
It gives three task. Task 1 is about early detection of signs of anorexia, task 2 is
about early Detection of Signs of Self-harm and the last one is about measuring
the severity of the signs of depression. Particularly, we have participated in Task
1. This task was introduced in 2018 and consists of sequentially processing pieces
of evidence and detect early traces of anorexia as soon as possible. The source
of data is also the same used for eRisk 2017 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and 2018 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It is a collection
of writings (post or comments) from a set of social media users. There are two
categories of users, anorexia and non-anorexia, and, for each user, the collection
contains a sequence of writings (in chronological order).
      </p>
      <p>The rest of the paper is structured as follows. In Section 2 we explain the
data used in our methods. Section 3 presents the details of the proposed systems.
In Section 4, we discuss the analysis and evaluation results for our systems. We
conclude in Section 5 with remarks and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>
        The dataset used in the eRisk 2019 early detection of signs of anorexia task has
the same format as the collection described in Losada [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The dataset for this
year contains the training and test data used in 2018 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The collection consist
of writings obtained from the social media platform Reddit.
      </p>
      <p>This task takes into account the timeline, so it is an early detection of signs
of anorexia. For that, we will obtain the writings of users by chunks and we must
go sent the answers to obtain the next writings. For example, in the rst step,
they will give us the rst writing of each user, we will send our answers for each
user and we will obtain the second set of writings, and so on.</p>
      <p>The training phase consists of all writings of all users explicitly indicating
which users are diagnosed with anorexia. On the other hand, the test collection
for 2019 is composed of 849 users and 2000 chunks of writings, and the messages
have dates after January 2011.</p>
      <p>We have obtained some statistics from the training corpus before starting to
develop the systems. These statistics are shown in Table 1, to obtain the tokens
of sentences and words we have used Natural Language Toolkit (NLTK) library
in Python.
In this section we will expose the systems created for this task. All our systems
are based on machine learning approaches, speci cally Support Vector Machine
(SVM).</p>
      <p>
        The architecture of the experiments carried out is shown in Figure 1. We can
see that we make use of external resources such as the Spacy library1 and UMLS
explained in Section 3.3 and Section 3.4, respectively.
In order to carry out the three experiments in the same way, we rst carry out a
pre-processing of the text using Natural Language Processing (NLP) tools and
techniques. Pre-processing method plays a very important role in text mining
techniques and applications. It is the rst step in the text mining process [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
1 https://spacy.io/
      </p>
      <p>Similarity
embeddings</p>
      <p>System 2</p>
      <p>System 1</p>
      <p>System 3</p>
      <p>UMLS
New
weight
matrix</p>
      <p>Train
Weight matrix</p>
      <p>Weight
matrix
SVM system</p>
      <p>Model</p>
      <p>For all our systems, we took into account the title and the text and we
created a new document joining the title and the text. Pre-processing for this
new document was as follows:
1. Change all words to lowercase.
2. Remove empty multi-lines from text.
3. Remove URLs from text.
4. Treat only words that contain alphanumeric characters.
3.2</p>
      <sec id="sec-2-1">
        <title>Baseline system</title>
        <p>In the rst system each sentence is represented as a vector of uni-grams choosing
the Term frequency - Inverse document frequency (TF-IDF) scheme and it is
used as feature for the classi cation using the SVM algorithm.</p>
        <p>
          SVM are supervised learning models with associated learning algorithms that
analyze data used for binary classi cation analysis. Many researchers have
reported that this classi er is perhaps the most accurate method for text classi
cation [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and speci cally in signs of anorexia there are several studies [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In
our case, we try to predict whether a text suggests signs of anorexia or not.
        </p>
        <p>TF-IDF is a numerical statistic which shows that a word is how important
to a document in a collection. This statistics is often used as a weighting factor
in text mining. The value of TF-IDF increases proportionally to the number of
times a word appears in the document, but is o set by the frequency of the word
in the dataset.</p>
        <p>The parameters used for TF-IDF are shown below:
{ min df = 3
{ max df = 0.
{ sublinear tf = True
{ stop words = english stopwords
{ use idf = True
{ tokenize = we use Spacy tokenizer with en core web md module
{ lowercase = True
{ ngram range = (1, 1)</p>
        <p>The next approaches described in the Section 3.3 and Section 3.4 are based
on the Baseline (SVM + TF-IDF) adding more relevant information to each
document.
3.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Similarity embeddings</title>
        <p>
          For the second system, we employ word embeddings for measuring similarity.
Semantic similarity is a measure of conceptual distance between two objects, based
on the correspondence of their meanings [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Distributional word vector models
capture some aspect of word co-occurrence statistics of the words in a language
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Therefore, word embeddings which are trained on word co-occurrence counts
can be used to capture semantic word similarity.
        </p>
        <p>The idea in this system was to modify the values of TF-IDF matrix for each
document taking into account the similarity of the word anorexia with the rest
of the words in the corpus. This idea arises because in the corpus vocabulary
we have observed many words related to anorexia, such as vomiting, appetites,
mismanagement, nutrition, illness, thinness, calories, bulimia, among others.</p>
        <p>To calculate the semantic similarity between two words, we employ word
vectors from the Spacy library available for Python language. Speci cally, we use
the available pre-trained statistical models for English "en core web md" wich
version is 1.2.0. It is composed of 685k keys, 20k unique vectors (300 dimensions)
and it was trained on OntoNotes, with GloVe vectors trained on Common Crawl.
To modify the TF-IDF matrix, in this system we apply the following steps:
1. Load the spacy model "en core web md".
2. Load the task dataset.
3. Pre-process the dataset following the pre-processing explained in the Section
3.1.
4. Get the similarity of each word in the document with the word anorexia
using the spacy model.
5. Modify the TF-IDF matrix for each document by multiplying the TF-IDF
value of a word by its similarity to the word anorexia.
6. Finally, we use as classi er the SVM.
3.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Related concepts in UMLS</title>
        <p>For the third experiment, we use external knowledge source related to the
medical domain to add new features to each word of the message.</p>
        <p>
          In this case, we will use Uni ed Medical Language System (UMLS) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. UMLS
is formed by three components: Metathesaurus, specialist lexicon and semantic
network [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Metathesaurus consists of terms and codes from many vocabularies,
including ICD-10-CM, LOINC, MeSH or SNOMED CT. The lexicon is large syntactic
lexicon of biomedical and general English and tools for normalizing strings,
generating lexical variants, and creating indexes. Last, the purpose of the semantic
network is to provide a consistent categorization of all concepts represented in
the Metathesaurus and to provide a set of useful relationships between these
concepts.</p>
        <p>With UMLS we can obtain the concepts related to the concept anorexia. In
UMLS the concept anorexia has the identi er C0003123, in this way, we just
have to extract all the relationships with that identi er.</p>
        <p>In English we get 285 relationships for the concept anorexia, each of these
concepts also has synonyms that we will also take into account. Some examples
are shown in Table 3 in it we can see the concept identi er, the term and its
synonyms.</p>
        <p>These words that we nd in the concepts and their synonyms are taken
into account to modify the TF-IDF matrix. The words of the concepts are
preprocessed in the following way:
1. Obtain tokens using the TweetTokenizer of NLTK library.
2. Change tokens to lowercase.
3. Remove tokens that are digits.
4. Remove tokens that are stopwords.
5. Remove tokens that are punctuation marks.
6. Remove tokens with length equal to 1</p>
        <p>This process will help to obtain only the relevant words from the
biomedical concepts giving them a greater weight in the matrix. A total of 525 tokens
were obtained after pre-processing and stored in a dictionary for later
reference. Some saved example tokens are: food, bulimia, disease, anemia, abdomen,
weight, appetite, anorexic, loss, appetites, mismanagement, nutrition, illness,
toxic, metabolism, etc.</p>
        <p>As we can see in the example of our dictionary, there are words that are more
related to anorexia, so we will try to give more attention.</p>
        <p>In the TF-IDF matrix all the weights of the words that are included in our
dictionary of relevant words by UMLS will be modi ed. Finally, we will obtain a
new matrix where the tokens included in our dictionary will have a value equal
to 1.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results analysis</title>
      <p>This section discusses the results we have obtained by our di erent systems.
During the pre-evaluation phase we carried out several experiments with the
training set using the 10-fold cross validation to evaluate our approaches. During
the evaluation phase, we used the training set to train our systems and the test
set to evaluated them.</p>
      <p>
        The o cial competition metric included in the experimental report are the
standard measures such as Precision (P), Recall (R) and the F-measure (F)
together with ERDE and latency. ERDE is the Early Risk Detection Error
measure proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Latency is an alternative evaluation metric for early risk
prediction is done by Sadeque and colleagues [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The latest measure taken into
account is speed. Speed is computed as follows:
speed = (1
medianfpenalty(ku) : u 2 U; du = gu = 1g)
      </p>
      <p>The results we have obtained by the three systems we carried out are shown
in Table 4. The 1 run refers to our baseline system described in Section 3.2, the
2 run is related to our similarity embeddings systems described in Section 3.3
and the 3 run is associated to our related concepts in UMLS described in Section
3.4.</p>
      <p>The results obtained by our team are not as expected. However, in Table 4
it should be noted that related to our systems, the third run has achieved the
best results obtained a 30% of F1-score outperforming our baseline system (21%
F1-score). We can also notice that the Recall measured obtained in all of our
runs is remarkably high in compared of the average achieved by the participants.
Nonetheless, the precision of our systems is very low so it penalizes the F1 score.</p>
      <p>As regards to the system corresponding to the run 2, it has not outperform the
results obtained by the baseline system. Perhaps, this is because the vocabulary
used in the embeddings is not appropriate for this task and can introduce noise
when obtaining the similarity between two words. For this reason, the 3 run
could be obtained better results with a specialized vocabulary related to anorexia
vocabulary. In this experiment, we can see that adding new sources of external
biomedical domain knowledge is a good option as we get better results. This
is because the terminology used this run is enriched with di erent ontologies.
These ontologies are made up of medical words providing extra information to
the message. In this way, we have obtained greater precision and improve the
nal result.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and future work</title>
      <p>In this paper, we presented our rst participation at CLEF eRisk 2019: Early
risk prediction on the Internet. Speci cally, we have participated in Task 1 called
Early Detection of Signs of Anorexia.</p>
      <p>All of our systems are based on machine learning approaches (SVM) taken
into account the TF-IDF weight matrix. The main hypothesis considered in the
experiments 2 and 3 was to modify the TF-IDF matrix with extra knowledge
obtained by similarity embeddings from a model of spacy and UMLS.</p>
      <p>In the evaluation phase, we realized that our systems were not
computationally fast. For this reason, we could only run 317 chunks of 2000.</p>
      <p>As regards to our results, we have not managed to surpass the average of
the results obtained by the other participants. However, we have succeeded in
overcoming our baseline system with a 19% of F1 in the case of the third system.</p>
      <p>A problem that we have found is that the training dataset contains many
messages from the same user diagnosed with anorexia, but not all messages
written by that user refer to this disease. Therefore, to improve the systems,
we consider it is very important that the dataset contain information about the
moment in which the user refers to anorexia.</p>
      <p>In order to perform a complete analysis of our systems, we will wait for the
task organizers to release the complete test dataset with its corresponding labels.</p>
      <p>As future work, we plan to improve the speed of our systems in order to
evaluate all the possible chunks. Also, we will explore other systems based on
deep learning and we will continue studying some resources for the purpose of
improve our results incorporating external knowledge.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by Fondo Europeo de Desarrollo
Regional (FEDER), REDES project (TIN2015-65136-C2-1-R) and LIVING-LANG
project (RTI2018-094653-B-C21) from the Spanish Government.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arseniev-Koehler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCormick</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title># proana: proeating disorder socialization on twitter</article-title>
          .
          <source>Journal of Adolescent Health</source>
          <volume>58</volume>
          (
          <issue>6</issue>
          ),
          <volume>659</volume>
          {
          <fpage>664</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The uni ed medical language system (umls): integrating biomedical terminology</article-title>
          .
          <source>Nucleic acids research 32(suppl 1)</source>
          ,
          <source>D267{D270</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>De Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Counts</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvitz</surname>
          </string-name>
          , E.:
          <article-title>Predicting depression via social media</article-title>
          .
          <source>In: Seventh international AAAI conference on weblogs and social media</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Guntuku</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yaden</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eichstaedt</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Detecting depression and mental illness on social media: an integrative review</article-title>
          .
          <source>Current Opinion in Behavioral Sciences</source>
          <volume>18</volume>
          ,
          <volume>43</volume>
          {
          <fpage>49</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keating</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hakonarson</surname>
          </string-name>
          , H.:
          <article-title>Machine learning derived risk prediction of anorexia nervosa</article-title>
          .
          <source>BMC medical genomics 9(1)</source>
          ,
          <volume>4</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollingshead</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Crazy mad nutters: the language of mental health</article-title>
          .
          <source>In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology</source>
          . pp.
          <volume>52</volume>
          {
          <issue>62</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dagan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Improving distributional similarity with lessons learned from word embeddings</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>3</volume>
          ,
          <issue>211</issue>
          {
          <fpage>225</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>An information-theoretic de nition of similarity</article-title>
          .
          <source>In: Icml</source>
          . vol.
          <volume>98</volume>
          , pp.
          <volume>296</volume>
          {
          <fpage>304</fpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A test collection for research on depression and language use</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>28</volume>
          {
          <fpage>39</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>Clef 2017 erisk overview: Early risk prediction on the internet: Experimental foundations</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>Overview of erisk: Early risk prediction on the internet</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>343</volume>
          {
          <fpage>361</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <source>Overview of eRisk</source>
          <year>2019</year>
          :
          <article-title>Early Risk Prediction on the Internet</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 10th International Conference of the CLEF Association</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2019</year>
          . Springer International Publishing, Lugano, Switzerland (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          :
          <article-title>The umls semantic network</article-title>
          .
          <source>In: Proceedings. Symposium on Computer Applications in Medical Care</source>
          . pp.
          <volume>503</volume>
          {
          <fpage>507</fpage>
          . American Medical Informatics Association (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Moraes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valiati</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neto</surname>
            ,
            <given-names>W.P.G.</given-names>
          </string-name>
          :
          <article-title>Document-level sentiment classi cation: An empirical comparison between svm and ann</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>40</volume>
          (
          <issue>2</issue>
          ),
          <volume>621</volume>
          {
          <fpage>633</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Prieto</surname>
            ,
            <given-names>V.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cacheda</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>Twitter: a good place to detect health conditions</article-title>
          .
          <source>PloS one 9</source>
          (
          <issue>1</issue>
          ),
          <year>e86191</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sadeque</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Measuring the latency of depression detection in social media</article-title>
          .
          <source>In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining</source>
          . pp.
          <volume>495</volume>
          {
          <fpage>503</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Seabrook</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rickard</surname>
            ,
            <given-names>N.S.:</given-names>
          </string-name>
          <article-title>Social networking sites, depression, and anxiety: a systematic review</article-title>
          .
          <source>JMIR mental health 3</source>
          (
          <issue>4</issue>
          ),
          <year>e50</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Vijayarani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilamathi</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nithya</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Preprocessing techniques for text mining-an overview</article-title>
          .
          <source>International Journal of Computer Science &amp; Communication Networks</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <volume>7</volume>
          {
          <fpage>16</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brede</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ianni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mentzakis</surname>
          </string-name>
          , E.:
          <article-title>Detecting and characterizing eatingdisorder communities on social media</article-title>
          .
          <source>In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining</source>
          . pp.
          <volume>91</volume>
          {
          <fpage>100</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>