<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.18653/v1/n19-1423</article-id>
      <title-group>
        <article-title>NLP-UNED-2 at eRisk 2023: Detecting Pathological Gambling in Social Media through Dataset Relabeling and Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hermenegildo Fabregat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andres Duque</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lourdes Araujo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Martinez-Romo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Avature Machine Learning, Marqués de Valdeiglesias</institution>
          ,
          <addr-line>3, Madrid 28004</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IMIENS: Instituto Mixto de Investigación, Escuela Nacional de Sanidad</institution>
          ,
          <addr-line>Monforte de Lemos 5, Madrid 28019</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NLP &amp; IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED)</institution>
          ,
          <addr-line>Juan del Rosal 16, Madrid 28040</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>18</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This paper describes our participation in Task 2 (Early Detection of Signs of Pathological Gambling) from the CLEF 2023 eRisk Workshop, addressed to detecting early signs of pathological gambling in messages written by Social Media users. Since the original dataset is annotated at user level, we perform a relabeling process based on Approximate Nearest Neighbors (ANN) on vectorial representations of the messages, in order to produce a dataset annotated at message level. Then, diferent neural network architectures are tested using the re-labeled training dataset in order to develop models for classifying test instances. Our system obtains the second best performance in the decision-based evaluation, and is one of the best performing techniques in the ranking-based evaluation. Hence, the combination of the re-labeling technique with neural architectures leads to an accurate detection of signs of pathological gambling.</p>
      </abstract>
      <kwd-group>
        <kwd>Pathological gambling detection</kwd>
        <kwd>Approximate Nearest Neighbors</kwd>
        <kwd>Relabeling</kwd>
        <kwd>Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Research on potential health risks through social media analysis has emerged as a captivating
research domain in the last few years. Within this field of study, the scientific community has
undertaken various initiatives, including the eRisk workshop, which has been a recurring event
in the Conference Labs of the Evaluation Forum (CLEF) since 2017. This workshop serves as a
collaborative platform for the development of methodologies and practical approaches aimed at
the early detection of diverse health risks, such as eating disorders, self-harm, and depression.
By analysing the textual content of social media posts and messages, valuable insights can be
gained to identify individuals at risk.</p>
      <p>
        In this paper we present a system for tackling Task 2 of the eRisk 2023 Workshop: Early
Detection of Signs of Pathological Gambling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is a continuation from Task 1 of the
eRisk 2021 Workshop [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Task 1 of the eRisk 2022 Workshop [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our system is also an
improved version of the one that we presented as the “UNED-NLP” team in the 2022 edition of
the competition [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The proposed approach first transforms each message in the dataset into a vector-based
representation through sentence embeddings. Then, we use an Approximate Nearest Neighbor
(ANN) technique for relabeling the original dataset, annotated at user level, and generating a
new dataset annotated at message level. Finally, we propose diferent techniques for employing
this relabeled dataset in the final classification of the test instances: as a baseline, we propose
an ANN-based technique similar to the one employed in the relabeling of the dataset. We also
propose two techniques for using the relabeled training dataset as input of a Recurrent Neural
Network (RNN). Finally, the neural models trained in the previous step are also employed for
generating alternative versions of the relabeled dataset and testing whether their use allows us
to improve the obtained results.</p>
      <p>The rest of the paper is structured as follows: Section 2 ofers a summary of related works
and systems participating in previous competitions. A brief description of the proposed task,
the available dataset and the metrics employed in the evaluation is presented in Section 3. Our
system proposal is described in Section 4 and the obtained results in Section 5. Finally, some
conclusions and future lines of work are depicted in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Gambling disorder (GD) refers to a persistent and recurrent gambling behavior that causes
significant distress [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In the United States, GD is estimated to afect approximately 0.5% of
the adult population, with comparable or potentially higher rates observed in other nations.
However, individuals with GD often go untreated and frequently remain unrecognized.
Moreover, GD frequently co-occurs with other psychiatric disorders, with notable prevalence rates
reported for mood disorders, anxiety disorders, attention deficit disorders, and substance use
disorders among individuals with GD [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Furthermore, GD is often accompanied by a higher
incidence of unemployment, economic hardships, divorce, and compromised health. Notably,
GD holds close associations with other addictive disorders and stands as the first non-substance
addictive behavior to be oficially acknowledged [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The advent of social networks has presented a valuable source of information for studying
and identifying individuals with gambling problems at an early stage. In alignment with this
perspective, the eRisk competition incorporated the issue of pathological gambling into its
agenda for the first time in 2021 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. No training data was provided to the participants in that
ifrst edition, hence many of the systems used external resources for training their systems, such
as Reddit posts crawled by themselves [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ]. Transformer-based architectures [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] were
selected for classification by some of the systems [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ]. Other architectures such as LSTM
networks were also used by other participants [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], while an Embedding Topic Model (ETM)
was used by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for modeling users and similarity measures with other external resources such
as gambling testimonials and questionnaires were then employed for determining those users
more likely to be positive. The best performing system was presented by the UNSL team [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
which tested diferent representation techniques (Bag of words, Doc2Vec) and classification
methods (LSTMs, SVMs) on a dataset generated from Reddit posts.
      </p>
      <p>
        In the 2022 edition the participants had access to labeled training data. The UNSL team
improved its method with new policies within their classification techniques [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], also obtaining
interesting results on the task. Other teams such as SINAI [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] employed Transformer-based
methods for obtaining the representation of the messages in the dataset, and applied regresion
techniques on diferent features (volumetry or lexical diversity, among others). Deep learning
models were also employed by teams like NLPGroup-IISERB [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], BLUE [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or BioNLP-UniBuc
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Other teams selected diferent classification models such as SVMs or XGBoost, after
extracting Glove features from user posts [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Our proposal based on dataset relabeling from
user-level annotations to message-level annotations through Approximate Nearest Neighbors
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] obtained the best results in the competition, and hence the research presented in this paper
is an extension of that particular work.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task 2: Early Detection of Signs of Pathological Gambling</title>
      <p>
        Task 2 of eRisk 2023 is denoted “Early detection of signs of pathological gambling” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This
is the third edition of the task, which was first introduced in the CLEF 2021 eRisk Workshop
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and had a second edition in the CLEF 2022 eRisk Workshop [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this task, participating
systems are asked to determine whether an individual can be classified as a pathological gambler
(positive users) or a non-pathological gambler (negative users) based on the user’s Social Media
messages. Systems must sequentially analyze chronological posts for each user for detecting
early traces of pathological gambling.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The dataset used in the task is composed of a set of XML documents, each of them containing
chronologically ordered Social Media posts belonging to a particular user. Each document is
annotated as “1” (positive) if the user is labeled as a pathological gambler, and “0” (negative)
otherwise.</p>
        <p>Table 1 shows the main statistics of the dataset. Column “Pathological gamblers” indicates
those users marked as positive in the dataset, while “Control” users represent negative users.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Metrics</title>
        <p>System evaluation is twofold:
• Decision-based evaluation: This first type of evaluation aims to analyze the performance
of the participating systems in terms of standard measures such as Precision, Recall
and F-Measure. However, other metrics are also introduced in this evaluation that take
into account the delay incurred by a system before it detects a true positive. Two of</p>
        <p>Num. subjects
Num. submissions (post &amp; comments)
Avg num. of submissions per subject
Svg num. of days from first to last submission
Avg num. of words per submission
these metrics, denoted  and  consider the number or the percentage of
messages that have to be processed before emitting an alert of positive user. In order to
overcome the low interpretability of these latter metrics, a latency-weighted F-Score is
also introduced by multiplying the standard F-Measure by a penalty factor based on the
median delay of true positive detection.
• Ranking-based evaluation: The second type of evaluation is a complementary approach
that requires the systems to provide a score indicating the risk of pathological gambling of
a user every time a new message is analyzed. Users are then ranked using this score and
standard ranking metrics such as  @ or  @ can be applied, with the parameter
 being the number of analyzed messages before evaluating the ranking.</p>
        <p>
          More information about the complete set of metrics employed in the evaluation can be found
in previous overviews of eRisk competitions [
          <xref ref-type="bibr" rid="ref2 ref20 ref3">2, 3, 20</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Model</title>
      <p>In this section we define the configuration of the diferent techniques that have been tested in
this research, all of them based on the idea of relabeling the original dataset for generating a
new training dataset annotated at message level.</p>
      <sec id="sec-4-1">
        <title>4.1. Baseline</title>
        <p>
          The baseline system for our participation in the task is the same system that obtained the
best results in the 2022 edition of the task [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In this system, we use Universal Sentence
Encoder [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] to encode each user’s messages, which are transformed into 512-dimensional
vectors. Then, an Approximate Nearest Neighbor technique is employed for modeling the
training dataset. In particular, and considering the results obtained in the 2022 task, we employed
the Hierarchical Navigable Small World (HNSW) graphs method, implemented in the
NonMetric Space Library [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. The main idea behind this method is to build a proximity graph
in which each datum corresponds to a node and the edges connecting some nodes define the
neighborhood relationship. In this representation, a neighbor’s neighbor is likely to also be a
neighbor. Search can be eficiently performed by iteratively extending neighbors of neighbors
in a best-first based search strategy.
        </p>
        <p>Once the ANN search index is built, we perform the relabeling process on the training dataset:
in the original corpus, each user is labeled as positive if at least a positive message can be found
within his/her posts, and negative otherwise. Hence, we first consider all messages of a positive
user to be positive, and all messages of a negative user to be negative. From the training set, we
consider only those messages that contain title information. This indicates that the message
represents the opening of a Reddit thread. Through this initial filtering, we intend to give
preference to those discussions originally initiated by the subject user (e.g. calls for help or
topic-related questions). We iteratively process each message from each positive user of the
training set, and re-annotate its class according to the similarity with the nearest neighbors
of the considered message, through the following process: given a message  from a user 
we consider  to be positive if the  nearest neighbors retrieved include at least  positive
messages. We explored diferent values of these parameters  and  in order to guarantee the
convergence of the algorithm on a non-zero set of positive training instances after applying
the relabeling process. In this step, the best values for these parameters were  = 10,  = 8.
After the whole training dataset has been relabeled, this method is repeated until convergence
is reached, this is, until there are no changes in the training set labels.</p>
        <p>For the final classification step, a diferent set of values for parameters  and  was determined
after a tuning evaluation stage. In this step,  = 18 and  = 18, this is, the 18 nearest neighbors
of a test message are retrieved from the training dataset, and all of them must be positive
in order to classify this test message as positive. Regarding the ranking-based evaluation,
we also use a scoring function for calculating the risk of pathological gambling of a user
given a message, which is the mean distance from  to the nearest recovered neighbors:
(1 − 1 ∑︀=1 ( , )), where  is each of the retrieved nearest messages. As the
scoring function is only used in the final classification step for the ranking-based evaluation,
the values of the mentioned parameters are  = 18,  = 18. In this case, the scoring function
is calculated only for those messages classified as positive, this is, those messages whose 18
nearest neighbors are all positive. Otherwise, the scoring function will return zero. The risk of
pathological gambling of a particular user, each time a new message is analysed, is the maximum
scoring obtained by any of the (positive) messages of that user, from those messages processed
up to that point. We denote this configuration as “Baseline” in our experiments.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Neural Models for Classification</title>
        <p>One of the main objectives in our research for this task is to determine whether the use of neural
models in our pipeline has a significant impact on the overall results. For this purpose, we have
developed two architectures for performing the final classification of the test messages. The
ifrst architecture, denoted “RNN_Base” is a simple network with a RNN layer composed of 64
neurons with ReLU activation function, and a final layer with a single neuron with sigmoid
activation function. We use the relabeled training dataset, annotated at message level, as input
for training our model. In inference time, we define a decision threshold for the sigmoid function
of 0.9, this is, only messages classified as positive with a 90% confidence level are considered
to be positive. The scoring function in this case is the direct output of the sigmoid-activated
neuron in the last layer of the network, which always lies in the range from 0 to 1.</p>
        <p>The second neural architecture, named “RNN_Sim” in our experiments, only presents
modifications in the input part of the neural network, while the number and type of layers and
neurons remain the same. In this architecture, we intend to keep studying those approaches
based on similarity, but avoiding the huge search space derived from the use of ANN techniques.
For this purpose, we define a similarity matrix by taking a number  of random positive
messages from the re-labeled dataset and calculating the cosine distance of each input message
in the network with all those positive messages in the matrix. Then, we build a distance vector
with all the cosine similarities. Hence, the dimension of this vector will also be  . This distance
vector is the final input to the network. In order to explore all the positive messages within
the training dataset, the similarity matrix is updated in each epoch of the training phase, with
a new subset of  random positive messages. Finally, the last subset used for the training
phase is then employed as the similarity matrix for performing the final classification of test
instances. After some tests on the development dataset, we have defined  = 50 as the size of
the similarity matrix.</p>
        <p>Finally, some adjustments have been included in both neural architectures for optimization:
the number of epochs is 20, with early stopping policies (after 4 epochs with no validation loss
improvement) and a dropout layer after the RNN layer (dropout rate = 0.25).</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Neural Models for Relabeling</title>
        <p>Once that the neural models have been developed, we are interested in testing whether they
can also be used for improving the relabeling process described in Section 4.1. We develop
two additional configurations of our system for this purpose: the first configuration, named
“RNN_Relabel_Base”, uses the trained “RNN_Sim” model described in Section 4.2 for
relabeling the original training dataset. Once the training dataset has been relabeled, the final
classification step is performed as described in Section 4.1, this is, through the use of the
NMSLIB library for extracting the nearest neighbors for each test message and applying the
aforementioned tagging and scoring functions.</p>
        <p>Finally, the last configuration of the proposed system, denoted “RNN_Relabel_Refined”
is a combination of all techniques explored in this work. In this configuration, we consider
a training message to be positive if both the original “Baseline” relabeling method and the
“RNN_Relabel_Base” relabeling method annotated it as positive. Hence, only those messages
considered as positive by the ANN-based and the neural-based relabeling techniques will be
maintained as positive in this case. The final classification step is also performed as in the
“RNN_Relabel_Base” configuration.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>In this section the main results obtained by the proposed configurations of our system are shown
and compared to other participants in the task. Table 2 shows the summarized results for the
decision-based evaluation. The five configurations of our systems are depicted first, while only
the best run from each of the participating systems is included, ordered by the latency-weighted
F1 measure.</p>
      <p>Our best performing configuration, “RNN_Sim", is able to achieve the second best result in
terms of latency-weighted F1, five points behind from the best performing system, EliRF-UPV.</p>
      <p>ERDE5 ERDE50 latencyTP speed
Our system ofers good values for all the proposed metrics, although the latency and speed
values are slightly high (this is, the system needs to process a relatively high number of messages
before determining that a user is positive), which is detrimental when it comes to the final
latency-weighted F1 value.</p>
      <p>Regarding our diferent configurations, the techniques that perform relabeling through the
use of the neural architectures obtain latency-weighted values very similar to the “RNN_Sim"
configuration, although their performance is better in terms of precision and worse in terms of
recall. This, along with the results of the “Baseline” configuration, indicates that ANN-based
classification has a higher impact on precision, while neural-based classification improves the
recall of the system.</p>
      <p>Table 3 ofers the main results of the ranking-based evaluation, in terms of Precision@10,
NDCG@10 and NDCG@100 when the system has already processed 1, 100, 500 and 1,000
writings. As before, we show in the table the results of our five configurations, together with
the best result of the systems participating in the competition.</p>
      <p>As shown in the Table, “RNN_Sim” is our best performing configuration, obtaining
perfect scores for P@10 and NDCG@10 after 1, 100, 500 and 1000 writings, and high values of
NDCG@100 in all the cases. The main diferences between the diferent configurations can
be seen after processing 1 message: those techniques that perform RNN-based classification
(“RNN_Base” and “RNN_Sim”) achieve better values of the NDCG@100 metric. In these cases,
the final score of the user comes directly from the activation value of the neuron in the last
layer of the network, and hence a score (risk of being a pathological gambler) is given to the
user even if the network decides that the processed message is negative, while the same case
would ofer a score of 0 in the ANN-based classification. Therefore, assigning some score even
to negative users allows the system to generate a better ranking based on the risk of being a
pathological gambler.</p>
      <p>Regarding the comparison between our best configuration and the best runs of the
participating systems, our system achieves the best results for 10 out of the 12 proposed metrics, which
also indicates the robustness of this particular scoring function.</p>
      <p>Finally, Table 4 shows the execution times required by the systems for processing the test set.
Systems that processed the test set only partially have not been included in the table.</p>
      <p>
        Although we are aware that the “Baseline” configuration is quite fast in processing the test
dataset, obtaining the best execution times in the 2022 edition of the competition [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we can
observe in the table that our system is not among the fastest participating teams. This is probably
due to the use of neural-based inference in the “RNN_Base” and “RNN_Sim” configurations,
which are more likely to involve a much longer execution time than searches on the indexes
generated by ANN algorithms. It would be useful to separate the execution times of each run
submitted by the systems in order to better visualize the trade-of between time consumption
and overall results of each system.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>This paper describes our participation in Task 2 of eRisk 2023: Early detection of signs of
pathological gambling. We have made further developments on the system introduced in the
2022 edition which obtained the best overall results. In particular, in this research we explore
the use of neural architectures and their combination with Approximate Nearest Neighbor
techniques used for performing dataset relabeling, from user level annotations to message level
annotations.</p>
      <p>Two diferent neural architectures have been proposed: a simple RNN architecture receiving
vectorial representations of the input messages obtained through Universal Sentence Encoders,
and a second similar architecture which uses as input the similarity of each input message with
a reference similarity matrix containing positive messages from the training dataset. These
architectures have been tested for both performing the final message classification, and also for
testing whether they can be also useful for relabeling the training dataset (original or already
relabeled). We have obtained the second best results in the decision-based evaluation and the
best overall results in the ranking-based evaluation. Although the relabeling process already
proposed in the past edition of the competition is probably the main reason for achieving such
good results, the introduction of neural architectures within our pipeline allows us to obtain
some improvements on the final results.</p>
      <p>
        As future lines of work for this research, we consider the use of Transformer-based neural
architectures [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] such as BERT [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], and a deeper exploration of both the parameters in the
ANN algorithms and the hyperparameters of the employed neural networks. Pre-trained models
built with in-domain information could also be useful for better represent the knowledge that
we try to model for detecting the risk of pathological gambling. Finally, the application of the
proposed system to similar tasks, such as detecting the risk of anorexia, depression or self-harm
behaviours is also being currently considered.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the Spanish Ministry of Science and Innovation
within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32
and OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and NextGenerationEU/PRTR)
under Grant TED2021-130398B-C21 as well as project RAICES (IMIENS 2022).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2023:
          <article-title>Early risk prediction on the internet</article-title>
          .,
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 14th International Conference of the CLEF Association, CLEF</source>
          <year>2023</year>
          , Thessaloniki, Greece (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk at CLEF 2021:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          ,
          <source>Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum</source>
          , Bucharest, Romania,
          <year>2021</year>
          2936 (
          <year>2021</year>
          )
          <fpage>864</fpage>
          -
          <lpage>887</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-72.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2022:
          <article-title>Early risk prediction on the internet</article-title>
          .,
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association, CLEF</source>
          <year>2022</year>
          , Bologna, Italy (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Duque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martínez-Romo</surname>
          </string-name>
          ,
          <article-title>UNED-NLP at erisk 2022: Analyzing gambling disorders in social media using approximate nearest neighbors</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>894</fpage>
          -
          <lpage>904</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-71.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Potenza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Balodis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Derevensky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Petry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Verdejo-Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Yip</surname>
          </string-name>
          , Gambling disorder,
          <source>Nature reviews Disease primers 5</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Potenza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Kosten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Rounsaville</surname>
          </string-name>
          , Pathological gambling,
          <source>Jama</source>
          <volume>286</volume>
          (
          <year>2001</year>
          )
          <fpage>141</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Rash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weinstock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Van</given-names>
            <surname>Patten</surname>
          </string-name>
          ,
          <article-title>A review of gambling disorder and substance use disorders</article-title>
          ,
          <source>Substance abuse and rehabilitation 7</source>
          (
          <year>2016</year>
          )
          <article-title>3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chinea-Rios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-S.</given-names>
            <surname>Uban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rössler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yenikent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulví</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco-Salvador</surname>
          </string-name>
          ,
          <article-title>Upv-symanto at erisk 2021: Mental health author profiling for early risk prediction on the internet</article-title>
          , Working Notes of CLEF (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Maupomé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Armstrong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rancourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Soulas</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Meurs</surname>
          </string-name>
          ,
          <article-title>Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks</article-title>
          ,
          <source>Proceedings of the Working Notes of CLEF</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>A.-M. Bucur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Cosma</surname>
            ,
            <given-names>L. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dinu</surname>
          </string-name>
          ,
          <article-title>Early risk detection of pathological gambling, self-harm and depression using bert</article-title>
          , Working Notes of CLEF (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Lopes</surname>
          </string-name>
          , Cedri at erisk
          <year>2021</year>
          :
          <article-title>A naive approach to early detection of psychological disorders in social media</article-title>
          ,
          <source>in: CEUR Workshop Proceedings, CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>981</fpage>
          -
          <lpage>991</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          ). URL: http: //arxiv.org/abs/1706.03762. arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>J. M. Loyola</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Burdisso</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Cagnina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Errecalde</surname>
          </string-name>
          , Unsl at erisk
          <year>2021</year>
          :
          <article-title>A comparison of three early alert policies for early risk detection</article-title>
          ,
          <source>in: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum</source>
          , Bucarest, Romania,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>J. M. Loyola</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Burdisso</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Errecalde</surname>
          </string-name>
          , UNSL at erisk 2022:
          <article-title>Decision policies with history for early classification</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>947</fpage>
          -
          <lpage>960</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-75. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>A. M. Mármol-Romero</surname>
            ,
            <given-names>S. M. J.</given-names>
          </string-name>
          <string-name>
            <surname>Zafra</surname>
            ,
            <given-names>F. M. P.</given-names>
          </string-name>
          <string-name>
            <surname>del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D.</surname>
            Molina-González,
            <given-names>M. T. M.</given-names>
          </string-name>
          <string-name>
            <surname>Valdivia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo-Ráez</surname>
          </string-name>
          , SINAI at erisk@clef
          <year>2022</year>
          :
          <article-title>Approaching early detection of gambling and eating disorders with natural language processing</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>961</fpage>
          -
          <lpage>971</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-76.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. N.</surname>
          </string-name>
          <article-title>S, S</article-title>
          . S, T. Basu,
          <article-title>Nlp-iiserb@erisk2022: Exploring the potential of bag of words, document embeddings and transformer based framework for early prediction of eating disorder, depression and pathological gambling over social media</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>972</fpage>
          -
          <lpage>986</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-77.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>An end-to-end set transformer for user-level classification of depression and gambling disorder</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>851</fpage>
          -
          <lpage>863</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-67.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dumitrascu</surname>
          </string-name>
          , CLEF erisk
          <year>2022</year>
          :
          <article-title>Detecting early signs of pathological gambling using ML and DL models with dataset chunking</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>883</fpage>
          -
          <lpage>893</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-70.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Stalder</surname>
          </string-name>
          , E. Zankov, ZHAW at erisk 2022:
          <article-title>Predicting signs of pathological gambling - glove for snowy days</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , M. Potthast (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          , Bologna, Italy, September 5th - to - 8th,
          <year>2022</year>
          , volume
          <volume>3180</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>987</fpage>
          -
          <lpage>994</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3180</volume>
          /paper-78.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of erisk at CLEF 2020:
          <article-title>Early risk prediction on the internet (extended overview</article-title>
          ),
          <source>Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum</source>
          , Thessaloniki, Greece,
          <year>2020</year>
          2696 (
          <year>2020</year>
          ). URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_253.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Limtiaco</surname>
          </string-name>
          ,
          <string-name>
            R. S. John,
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          , M. GuajardoCespedes, S. Yuan,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          , Universal sentence encoder, CoRR abs/
          <year>1803</year>
          .11175 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1803</year>
          .11175. arXiv:
          <year>1803</year>
          .11175.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Malkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Yashunin</surname>
          </string-name>
          ,
          <article-title>Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs</article-title>
          ,
          <source>CoRR abs/1603</source>
          .09320 (
          <year>2016</year>
          ). URL: http: //arxiv.org/abs/1603.09320. arXiv:
          <volume>1603</volume>
          .
          <fpage>09320</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , NAACL-HLT
          <year>2019</year>
          , Min-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>