<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bologna, Italy
$ gildo.fabregat@lsi.uned.es (H. Fabregat); aduque@lsi.uned.es (A. Duque); lurdes@lsi.uned.es (L. Araujo);
juaner@lsi.uned.es (J. Martinez-Romo)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>UNED-NLP at eRisk 2022: Analyzing gambling disorders in Social Media using Approximate Nearest Neighbors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hermenegildo Fabregat</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andres Duque</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lourdes Araujo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Martinez-Romo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IMIENS: Instituto Mixto de Investigación, Escuela Nacional de Sanidad</institution>
          ,
          <addr-line>Monforte de Lemos 5, Madrid 28019</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NLP &amp; IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED)</institution>
          ,
          <addr-line>Juan del Rosal 16, Madrid 28040</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes our proposal for tackling Task 1 (Early Detection of Signs of Pathological Gambling) from the CLEF 2022 eRisk Workshop. The challenge consists in the processing of messages written by Social Media users for the detection of early signs of pathological gambling. Our proposal is based on the calculation of Approximate Nearest Neighbors (ANN) performed on vectorial representations of the given messages. We introduce a relabeling process to modify the granularity of the labeling schema in the training dataset, thus converting it from the original user-based annotation to a message-based one. Our approach achieves the best average performance in the decision-based evaluation, as well as in the ranking-based evaluation. In addition, our system shows to be the fastest one in terms of time needed to process the whole test dataset. This indicates that the proposed relabeling scheme allows us to capture more easily the textual information that leads to a correct detection of pathological gambling.</p>
      </abstract>
      <kwd-group>
        <kwd>Pathological gambling detection</kwd>
        <kwd>Approximate Nearest Neighbors</kwd>
        <kwd>Vector representations</kwd>
        <kwd>Relabeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>based representations of users messages through sentence embeddings, for subsequently detect
positive messages using methods based on Approximate Nearest Neighbors (ANN) techniques.
Although ANNs can be seen as a simple machine learning technique, we show in the paper how
an adequate pre-processing of the training dataset based on the reduction of the original label
granularity allows us to obtain the best overall results in the competition.</p>
      <p>The rest of the paper is structured as follows: an overview of previous work related to the
task considered and the techniques used in this work is shown in Section 2. Section 3 is devoted
to describe the addressed task, including the available dataset and evaluation metrics, while
the developed system is presented in Section 4. The achieved results are shown, compared to
other participating systems, and discussed in Section 5. Finally, Section 6 presents the main
conclusions and future lines of work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Gambling disorder [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (GD) is characterized by a persistent and recurrent pattern of gambling
that is associated with significant distress or substantial upset. The prevalence of GD has been
estimated at 0.5% of the adult population in the United States, with comparable or even higher
estimates in other countries.
      </p>
      <p>
        People with GD are often not treated or even recognized as such. GD often co-occurs with
other psychiatric disorders. High rates of mood, anxiety, attention deficit disorders and substance
use disorders have been reported [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in people with GD. It is also often accompanied by a higher
rate of unemployment, economic dificulties, divorce, and poorer health. In addition, GD is
closely related to other addictive disorders, being the first non-substance addictive behavior to
be recognized [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Social networks are an excellent source of information where studies can be carried out for the
early detection of people with gambling problems. In this line, the eRisk competition considered
the problem of pathological gambling for the first time in 2021 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Several systems participated
in the shared task with diferent approaches: RELAI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], UPV-Symamnto [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], BLUE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], UNSL
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and CEDRI [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Considering the “test-only” nature of this first version of the task, several
of these participating systems [
        <xref ref-type="bibr" rid="ref10 ref6 ref7 ref8">6, 7, 8, 10</xref>
        ] used external resources, such as posts from Reddit
crawled by themselves, for training their systems. Most of them applied Transformer-based
architectures [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], as well as other types of neural networks. The UNSL team obtained the best
results using the Early Risk Detection Framework (ERD).
      </p>
      <p>
        This year we participated for the first time in the competition on gambling disorder. Our
system is based on a simple approach that has proven to be very efective. The idea is to carry out
a re-labeling of users’ messages using a method based on Approximate Nearest Neighbor (ANN)
search. The exact nearest neighbor search (NNS) for the point corresponding to a given query
is defined as the point corresponding to the shortest distance to the query. A generalization
of the nearest neighbor search is the k-nearest neighbor search (k-NNS), which targets the k
nearest vectors for the query. Due to the cost associated with dimensionality, many proposals
have been developed focusing on the approximate solution of the NNS and k-NNS problem.
A recent work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] has presented a comparison and evaluation of diferent approaches to the
problem. According to this work, state-of-the-art ANN methods can be classified into three
types: Hashing-based, Partition-based and Graph-based. Hashing-based methods transform
data points to a low-dimensional representation, where each point is represented by a short
code (hash code). Partition-based methods can be seen as the division of high-dimensional space
into multiple disjoint regions. The partitioning process is usually done recursively, hence these
methods often use a tree- or forest-based representation. We have used one of these methods
in this work, Annoy [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], a hyperplane partitioning method that recursively divides the space
by the hyperplane with random direction. Graph-based methods construct a proximity graph
in which each datum corresponds to a node and the edges connecting some nodes define the
neighborhood relationship. The main idea of these methods is that a neighbor’s neighbor is
likely to also be a neighbor. The search can be performed eficiently by iteratively extending
neighbors of neighbors in a best-first search strategy. Depending on the structure of the graph,
diferent graph-based methods can be distinguished. In this work we have used a method for
Hierarchical Navigable Small World graphs [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task 1: Early Detection of Signs of Pathological Gambling</title>
      <p>
        Task 1 of eRisk 2022 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is denoted “Early detection of signs of pathological gambling”. This is
the second edition of the task, which was first introduced in the CLEF 2021 eRisk Workshop [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
In this task, participating systems are asked to determine whether an individual can be classified
as a pathological gambler (positive users) or a non-pathological gambler (negative users) based
on the user’s Social Media messages. Systems must sequentially analyze chronological posts for
each user for detecting early traces of pathological gambling.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The dataset used in the task is composed of a set of XML documents, each of them containing
chronologically ordered Social Media posts belonging to a particular user. The training dataset
contains a total of 2,348 documents, each of them annotated as “1” (positive) if the user is labeled
as a pathological gambler, and “0” (negative) otherwise.</p>
        <p>The test dataset is provided through a server to which participants must connect to iteratively
receive user writings. The total number of test users is 2,079 (81 pathological gamblers and 1,998
control users), with a maximum number of user writings of 2,001, while the average number of
user writings is 495.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Metrics</title>
        <p>System evaluation is twofold:
• Decision-based evaluation: This first type of evaluation aims to analyze the performance
of the participating systems in terms of standard measures such as Precision, Recall
and F-Measure. However, other metrics are also introduced in this evaluation that take
into account the delay incurred by a system before it detects a true positive. Two of
these metrics, denoted  and  consider the number or the percentage of
messages that have to be processed before emitting an alert of positive user. In order to
overcome the low interpretability of these latter metrics, a latency-weighted F-Score is
also introduced by multiplying the standard F-Measure by a penalty factor based on the
median delay of true positive detection.
• Ranking-based evaluation: The second type of evaluation is a complementary approach
that requires the systems to provide a score indicating the risk of pathological gambling of
a user every time a new message is analyzed. Users are then ranked using this score and
standard ranking metrics such as  @ or  @ can be applied, with the parameter
 being the number of analyzed messages before evaluating the ranking.</p>
        <p>
          More information about the complete set of metrics employed in the evaluation can be found
in previous overviews of eRisk competitions [
          <xref ref-type="bibr" rid="ref15 ref5">15, 5</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Model</title>
      <p>Due to the large amount of information available in social networks, an approach based on
Approximate Nearest Neighbors (ANN) has been proposed, being its main benefit its eficiency
in processing large data collections. The following sections describe the main components of
the proposed model and the configurations that have been explored.</p>
      <sec id="sec-4-1">
        <title>4.1. Data representation</title>
        <p>
          We use Universal Sentence Encoder [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] to encode each user’s messages. Such models are
trained and optimized for encoding texts longer than words e.g. sentences, phrases or short
paragraphs. The model we use is trained with a deep average network [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] (DAN) using data
from diferent sources in English. Although DAN approaches produce unordered representations
of the information by averaging the terms in a given text, these models are able to capture
subtle diferences between similar texts. In short, for each message encoded by this model, a
512-dimensional vector is generated.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Approximate Nearest Neighbors</title>
        <p>
          Although nearest neighbor retrieval is a conceptually simple procedure, in domains such as
social networks, where a large amount of information is available, it is a dificult problem to
address. In this domain the use of brute force based search techniques is replaced by the use
of non-exact techniques based on the use of more complex structures e.g. graphs and trees.
Currently there are diferent tools and approaches that have proven to be very successful when
analyzing recall results and queries per second [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Due to their popularity and performance
we have explored the use of Annoy1 and Non-Metric Space Library [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] (NMSLIB):
• Annoy: This library uses tree-like structures for the representation of nodes and random
projections for the division of the subspace between adjacent nodes. To explore this
library, we have used a space generated by the inner-dot product of the 2 normalized
vectors generated by the Universal Sentence Encoder.
• NMSLIB: Library for approximate K-nearest neighbor search based on navigable
smallworld graphs with controllable hierarchy (Hierarchical NSW, HNSW). For the calculation
of similarity between instances NMSLIB supports the use of diferent metrics and data
formats. In this sense, we explored a dense 2 space.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Tag and scoring function</title>
        <p>Once the training set was transformed using Universal Sentence Encoder, and after generating
the nearest neighbor index using Annoy or NMSLIB libraries, we propose a labeling and scoring
approach based on the classes of the neighbors retrieved for each message in the test set. Given
a message  from a user  we classify  as positive if the 20 nearest neighbors retrieved
correspond to messages from positive users. Following the same idea, we considered as scoring
function the distance of  from the nearest recovered neighbors ( 1− ∑︀20=1 ( , )).
This number of  = 20 nearest neighbors was set from a previous parameter tuning evaluation
in which some diferent values of  were explored.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Relabeling process</title>
        <p>The corpus provided by the organizers presents a user-based labeling, i.e., each user is labeled as
positive if at least a positive message can be found within his/her posts, and negative otherwise.
However, positive/negative annotations for each message in the corpus are not provided. We
consider that the correct classification of positive and negative messages is crucial for achieving
a good performance in this task. Hence, we propose an approach to re-annotate the training
corpus in order to generate a message-level labeling. For this purpose, we first consider all
messages of a positive user to be positive, and all messages of a negative user to be negative.
Once the k-nearest neighbor query index is generated, we iteratively process each message from
each positive user of the training set, and re-annotate its class according to the above-mentioned
labeling algorithm. We assume that only positive users may contain negative messages, since if
negative users contained positive messages, they would have been labeled as positive. Hence,
in each iteration of the algorithm, the number of positive messages is reduced if the algorithm
re-labels them as negative. After processing the training set, if modifications have been made,
the same method is applied again until convergence is reached, this is, until there are no changes
in the training set labels.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Crawling new positive instances</title>
        <p>In order to reduce the impact on recall that the relabeling algorithm could have, the following
data were collected from gamblers’ help associations:
• Testimonial facts: A total of 234 testimonials were collected from websites 2 containing
information about pathological gamblers and their friends and family. Unlike the Reddit
posts, these new data are more carefully structured and contain longer texts.
2https://gamblershelp.com.au/learn-about-gambling/personal-stories/;
http://getgamblingfacts.ca/personalstories/; https://www.gamtalk.org/stories-of-hope/;
https://www.gamcare.org.uk/understanding-gamblingproblems/people-weve-helped/
• Forums: Messages from a forum devoted to help players3 were automatically collected
and those potentially positive messages were selected using the proposed system. Finally,
we included in the training set those messages classified as positive by the system. In
short, a total of 232 new instances were added.</p>
        <p>Analyzing the format of the corpus texts, the instances extracted from the forums present a
similar format and structure. No specific pre-processing techniques such as text size limitation
or language control have been added, e.g., no text size limitation, no language control.</p>
        <p>As shown in Table 1, we submitted 5 diferent configurations, in which we tried to explore
combinations of the previously mentioned diferent aspects of the proposed approach.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>The results obtained by our approach are shown and discussed below.</p>
      <p>Execution time: In order to avoid possible errors during the test phase due to power or
network failures, we processed the test data on a shared server with two Intel(R) Xeon(R)
CPUs E5-2630 v4 @ 2.20GHz and 64 GB of RAM. As can be seen in Table 2, the proposed
batch of experiments achieved the best execution times among the systems that processed
the whole test set. These results were influenced using non-exhaustive nearest-neighbor
recovery algorithms. Although we presented runs using diferent algorithms, all of
them are oriented to the processing of large datasets and include optimizations for this
purpose. While Annoy uses tree-like structures for the representation of nodes and
random projections for the division of the subspace between adjacent nodes, NMSLIB
uses a graph-based structure and the projection of the diferent nodes onto a skip-list.
Both algorithms include customizable parameters to optimize their performance, e.g.
number of trees (Annoy) or number of Zero node links (NMSLIB). Although we do not
perform an exhaustive study of these parameters, we try to limit their growth. The final
configuration for each of the algorithms is as follows:</p>
      <p>• Annoy
– Trees 24
• NMSLIB
– index_params {’M’: 200, ’efConstruction’: 1000, ’post’: 2}
– method ’hnsw’
– efSearch 100
Finally, although they are not included in this comparison, our system also achieved
execution time results that were below many systems that processed the test set only
partially.
Decision-based performance: Table 3 shows the results obtained during the decision-based
evaluation. This table shows the set of metrics analyzed by the task organizers: Precision,
Recall,  1, ERDE5, ERDE50, latency, speed and latency-weigthed  1. In addition to
the results of our runs, the best run of each team participating in the competition is
shown. As it can be seen in the table, considering the latency-weighted  1 metric as the
summary metric, our R4 configuration obtained the best results, achieving the highest
precision/recall ratio. If we analyze the achieved results in terms of latency, i.e., delay
shown by the system expressed as the median number of messages that need to be
processed before detecting a positive case, as we used the same inference process in
all the runs, no great diferences can be found between the diferent submitted runs.
However, if we compare runs R0 and R1, which are diferentiated by the application of
the relabelling process in R1, we find improvements in precision of around 27% with no
excessive penalization of other metrics such as recall. The relabeling process presents
a high impact on the corpus since the label of more than 90% of the positive instances
is modified after applying it. Considering the amount of discarded information and the
improvements obtained through this approach, the analysis of the filtered messages
can be of great value to achieve a better understanding of the problem. On the other
hand, and seeking to reduce the efect on recall produced by the relabelling process, the
inclusion of new data automatically collected was considered in the R2 and R3 runs. The
obtained results indicate that our approach to collect and process the new data was not
the most eficient one. Finally, R1 and R4 difer by the algorithm for nearest neighbor
retrieval used (R1: Annoy, R4: NMSLIB). These algorithms include a parameter space
that has not been studied in depth. For this reason, and although the NMSLIB algorithm
performs significatively better than Annoy, we consider that a more thorough study on
the parameters of the latter technique should be performed before discarding its use.
Ranking-based performance: Table 4 shows the results obtained in the ranking-based
evaluation. During this evaluation, the performance of the system is measured after processing
1, 100, 500 and 1000 messages. As shown in the Table, the R4 run obtains the best results
during this evaluation for all metrics in almost all stages. Comparing the diferences
between R4 and the best runs presented by BLUE and UNSL, our system outperforms
in most aspects except for NDCG@100 when analyzing 1 and 100 writings. This results
indicate that the scoring function described in Section 4.3 is an efective heuristic for
assessing the risk of pathological gambling after processing each user message.
Run 0
Run 1
Run 2
Run 3</p>
      <p>Run 4</p>
      <sec id="sec-5-1">
        <title>BLUE Run 1</title>
      </sec>
      <sec id="sec-5-2">
        <title>UNSL Run 0</title>
        <p>0.9
0.9
0.9
0.9
1
1
1
0.4
0.80
0.60
0.70
1
1
1
0.7
0.83
0.79
0.84
0.88
0.89
0.9
0.3
0.5
0.4
0.4
1
1
1
0.2
0.43
0.33
0.35
1
1
1
0
0
1
@
G
C
D
N
0.56
0.80
0.55
0.78
0.95
0.91
0.93
0
1
@
G
C
D
N
0.19
0.37
0.24
0.42
1
1
1
0
0
1
@
G
C
D
N
0.48
0.75
0.46
0.73
0.95
0.91
0.95</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>
        This article describes our proposed approach for early detection of signs of pathological
gambling addressed in Task 1 of eRisk 2022 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The main contributions presented in this work
include the use of Approximate Nearest Neighbor algorithms for retrieving subsets of similar
messages previously transformed into a vectorial space using sentence embeddings, as well as
the development of a relabeling technique successfully applied to the training set.
      </p>
      <p>The use of algorithms such as Annoy or NMSLIB for large scale nearest neighbor retrieval has
been of great help for the fast processing of the data. As shown in Table 2 and having processed
all the messages from the test set, our system obtained the best execution times. On the other
hand, as shown in Tables 3 and 4, our model has obtained the best results for the  1, ERDE50
and  -latency metrics in the decision-based evaluation, as well as the best overall results in
the ranking-based evaluation. Most of these results are due to the application of the iterative
re-labeling process of the corpus described in Section 4.4 and based on the use of the system
itself. Through this process we have also validated the use of the vector space generated by
Universal Sentence Encoder to analyze the similarity between messages of diferent classes.</p>
      <p>
        The following lines of future work are being currently considered: study of encoders based
on more complex approaches such as BERT [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], or trained with in-domain information; deeper
exploration of the parameters used for the construction of the ANN index; analysis of the impact
of diferent thresholds within the scoring function in the ranking-based evaluation (e.g. distance
of retrieved neighbors); and application of the proposed system to similar tasks.
      </p>
      <p>Finally, we believe that an analysis of the identified positive messages would be of great value.
Theoretically, these messages should exhibit easily identifiable features and characteristics that
can help in the profiling of this type of pathology.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the Spanish Ministry of Science and Innovation
within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32,
as well as project RAICES (IMIENS 2022) and the research network AEI RED2018-102312-T
(IA-Biomed).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2022:
          <article-title>Early risk prediction on the internet</article-title>
          .,
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association, CLEF</source>
          <year>2022</year>
          , Bologna, Italy (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Potenza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Balodis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Derevensky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Petry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Verdejo-Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Yip</surname>
          </string-name>
          , Gambling disorder,
          <source>Nature reviews Disease primers 5</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Potenza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Kosten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Rounsaville</surname>
          </string-name>
          , Pathological gambling,
          <source>Jama</source>
          <volume>286</volume>
          (
          <year>2001</year>
          )
          <fpage>141</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Rash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weinstock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Van</given-names>
            <surname>Patten</surname>
          </string-name>
          ,
          <article-title>A review of gambling disorder and substance use disorders</article-title>
          ,
          <source>Substance abuse and rehabilitation 7</source>
          (
          <year>2016</year>
          )
          <article-title>3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk at CLEF 2021:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          ,
          <source>Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum</source>
          , Bucharest, Romania,
          <year>2021</year>
          2936 (
          <year>2021</year>
          )
          <fpage>864</fpage>
          -
          <lpage>887</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-72.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Maupomé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Armstrong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rancourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Soulas</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Meurs</surname>
          </string-name>
          ,
          <article-title>Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks</article-title>
          ,
          <source>Proceedings of the Working Notes of CLEF</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chinea-Rios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-S.</given-names>
            <surname>Uban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rössler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yenikent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulví</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco-Salvador</surname>
          </string-name>
          ,
          <article-title>Upv-symanto at erisk 2021: Mental health author profiling for early risk prediction on the internet</article-title>
          , Working Notes of CLEF (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>A.-M. Bucur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Cosma</surname>
            ,
            <given-names>L. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dinu</surname>
          </string-name>
          ,
          <article-title>Early risk detection of pathological gambling, self-harm and depression using bert</article-title>
          , Working Notes of CLEF (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Loyola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Burdisso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cagnina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Errecalde</surname>
          </string-name>
          , Unsl at erisk
          <year>2021</year>
          :
          <article-title>A comparison of three early alert policies for early risk detection</article-title>
          ,
          <source>in: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum</source>
          , Bucarest, Romania,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Lopes</surname>
          </string-name>
          , Cedri at erisk
          <year>2021</year>
          :
          <article-title>A naive approach to early detection of psychological disorders in social media</article-title>
          ,
          <source>in: CEUR Workshop Proceedings, CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>981</fpage>
          -
          <lpage>991</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          ). URL: http: //arxiv.org/abs/1706.03762. arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>32</volume>
          (
          <year>2019</year>
          )
          <fpage>1475</fpage>
          -
          <lpage>1488</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bernhardsson</surname>
          </string-name>
          , Annoy: Approximate Nearest Neighbors in C++/Python,
          <year>2018</year>
          . URL: https://pypi.org/project/annoy/,
          <source>python package version 1.13.0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Malkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Yashunin</surname>
          </string-name>
          ,
          <article-title>Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs</article-title>
          ,
          <source>CoRR abs/1603</source>
          .09320 (
          <year>2016</year>
          ). URL: http: //arxiv.org/abs/1603.09320. arXiv:
          <volume>1603</volume>
          .
          <fpage>09320</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of erisk at CLEF 2020:
          <article-title>Early risk prediction on the internet (extended overview</article-title>
          ),
          <source>Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum</source>
          , Thessaloniki, Greece,
          <year>2020</year>
          2696 (
          <year>2020</year>
          ). URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_253.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Limtiaco</surname>
          </string-name>
          ,
          <string-name>
            R. S. John,
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          , M. GuajardoCespedes, S. Yuan,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          , Universal sentence encoder, CoRR abs/
          <year>1803</year>
          .11175 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1803</year>
          .11175. arXiv:
          <year>1803</year>
          .11175.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Manjunatha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boyd-Graber</surname>
          </string-name>
          , H.
          <string-name>
            <surname>Daumé</surname>
            <given-names>III</given-names>
          </string-name>
          ,
          <article-title>Deep unordered composition rivals syntactic methods for text classification</article-title>
          ,
          <source>in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Beijing, China,
          <year>2015</year>
          , pp.
          <fpage>1681</fpage>
          -
          <lpage>1691</lpage>
          . URL: https://aclanthology.org/P15-1162. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>P15</fpage>
          -1162.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aumüller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bernhardsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Faithfull</surname>
          </string-name>
          , Ann-benchmarks:
          <article-title>A benchmarking tool for approximate nearest neighbor algorithms</article-title>
          , CoRR abs/
          <year>1807</year>
          .05614 (
          <year>2018</year>
          ). URL: http: //arxiv.org/abs/
          <year>1807</year>
          .05614. arXiv:
          <year>1807</year>
          .05614.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://doi.org/10.18653/v1/n19-
          <fpage>1423</fpage>
          . doi:
          <volume>10</volume>
          .18653/v1/n19-
          <fpage>1423</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>