<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Natural Language Processing Based Risk Prediction Framework for Pathological Gambling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Abu Talha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tanmay Basu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Data Science and Engineering, Indian Institute of Science Education and Research</institution>
          ,
          <addr-line>Bhopal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Task 2 of eRisk 2023 shared-tasks challenge at Conference and Labs of the Evaluation Forum (CLEF) was to focus on the early detection of pathological gambling via sequential text processing over social media conversations. The challenge organizers have released diferent datasets, consisting of social media posts and questionnaires, for all three tasks. The BioNLP research group at the Indian Institute of Science Education and Research Bhopal (IISERB) participated in task 2 of the challenge and submitted ifve runs that for five diferent text mining frameworks. The performance of diferent text classification frameworks and their efectiveness in detecting signs of pathological gambling are demonstrated in this paper. Several classifiers and feature engineering schemes are combined to build individual frameworks. The features from free text are generated following the bag of words model and transformer based embeddings methods. Subsequently, adaptive boosting, logistic regression, support vector machine, and transformer based classifiers were used to identify the signs of pathological gambling from the social media posts. The experimental analysis demonstrates that the support vector machine and adaptive boosting classifier respectively using the entropy and TF-IDF weighting scheme of the bag of words outperforms the other methods on the training set. Furthermore, the adaptive boosting classifier following TF-IDF-based weighting scheme achieves the best precision score among all other submissions of task 2 of erisk 2023. However, the rest of the frameworks could not achieve reasonable performance which needs to be introspected in future.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bio NLP</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Mental Health</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The area of research related to the early prediction of signs of mental illness through social
media analysis is fascinating and demanding in the Internet age. Pathological Gambling or
Gambling Disorder (GD), is a condition that is marked by persistent and repetitive gambling
habits, causing notable distress or disruption [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The estimated prevalence of GD is around 0.5
% of the adult population in the United States, while other countries have similar or potentially
higher numbers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Individuals with pathological gambling are neither frequently identified
nor treated for their condition. It is common for pathological gambling to coincide with other
psychiatric disorders, such as mood, anxiety, attention deficit, and substance use disorders [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Moreover, pathological gambling is closely linked to other forms of addiction, as it was the
ifrst non-substance addiction to be oficially recognized [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. With the advent of social media,
we interact a lot on diferent social media for diferent purposes. Hence various social media
like Reddit, Twitter, Facebook, etc. have become valuable resources for conducting research to
identify the signs of various mental illnesses [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. The CLEF eRisk group has been organizing
various shared tasks over the last few years for early prediction of risks of various disorders
using the conversations of diferent subjects over Reddit 1, a popular social media [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7, 8, 9, 10</xref>
        ].
The eRisk 2023 lab [11] had announced three tasks, where the second task is an early prediction
of signs of pathological gambling. The data used for the same shared task last year in eRisk 2021
and 2022 has been released as the training data for task 2 of eRisk 2023. This paper introduce
ifve diferent frameworks that were developed to address the issue of early prediction of signs
of pathological gambling using conversations over social media.
      </p>
      <p>We have explored various frameworks by combining diferent feature engineering schemes
and text classification techniques and evaluated their performance on the given training data.
Therefore the best five frameworks were implemented on the given test data submitted to the
task organizers for evaluation. The goal of these frameworks is to analyze the conversations of
individual subjects in the given training corpus to train a classifier to classify the subjects into
pathological gambling and control groups. The performance of a text classification technique
is highly dependent on significant text features and their relationship to derive a semantic
interpretation. Hence both the conventional bag of words model and the latest transformer-based
models were used to generate text features. The term frequency and inverse frequency (TF-IDF)
based term weighting scheme [12, 13] and entropy based term weighting scheme [14, 15, 16, 17]
have been used for the bag of words model. Moreover, three diferent attention layer based
transformer models, viz., BERT (Bidirectional Encoder Representations from Transformers)
[18], Longformer[19], RoBERTa[20] were used to generate semantic features from the given
training data. Subsequently, the performance of adaptive boosting (Ada-Boost) [21], logistic
regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN) and support vector machine
(SVM) [22] classifiers have been explored on the training corpus using the bag of words features
following both TF-IDF and entropy-based weighting schemes. Furthermore, the performance
of individual transformer-based embeddings has been tested using the pre-trained sequence
classification framework of individual models.</p>
      <p>The results show that the SVM classifier using the entropy based term weighting scheme
of the bag of words model outperforms all other frameworks on the training data in terms of
precision, recall and F1 score. The Ada-Boost classifier using entropy based weighting scheme
of bag of words achieved the highest precision score among all submissions for Task 2 of eRisk
2023 challenge on the test data, but it could not perform well on the training set. However,
few of the proposed models e.g., Ada-Boost using entropy achieved decent recall and F1 score
on the training set, but they could not achieve reasonable performance on test corpus, which
needs to be examined in future. The paper is organized as follows. Section 2 explains the
proposed frameworks for identifying pathological gambling over social media conversations.
The experimental evaluation is presented in section ??. The conclusions and significant findings
are described in section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Frameworks</title>
      <p>Several text classification frameworks have been explored to identify pathological gambling
using the conversation of individual subjects over social media data from Reddit. The corpora
released by the organizers consist of documents containing the posts of Reddit users over a
period along with corresponding dates and titles in XML format [11]. Note that conversations of
each subject are composed in one XML file along with other information. All conversations of a
subject are extracted from an XML file and merged together disregarding the timestamp and title.
Hence the corpus used in the proposed frameworks to train individual classifiers only contains
free text conversations of diferent users. We have combined diferent feature engineering and
text classification techniques to identify pathological gamblers using the training corpus.</p>
      <sec id="sec-2-1">
        <title>2.1. Feature Engineering</title>
        <p>We have used both the classical bag of words model and transformer architecture based models
for generating features from the conversations of individual Reddit users.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Features Generated by Bag Of Words Model</title>
          <p>The bag of Words (BOW) model is a classical text feature extraction model, which considers
each unique term of a corpus as a feature and finds the weight of a term following diferent
schemes [23]. Thus each document in a corpus is generally represented by a vector, whose
length is equal to the number of unique terms, also known as vocabulary [23]. Two diferent
terms weighting schemes are used here, viz, term frequency and inverse document frequency
(TF-IDF) based term weighting scheme [12, 23] and Entropy-based term weighting scheme [15]
as these methods had performed well in similar tasks [24, 13, 16, 17]. TF-IDF weighting scheme
assigns the weight of the term as follows:</p>
          <p>TF-IDF(term) =  × log
︂(  )︂

document is defined by the Entropy 2 [15, 16] model as follows:
where N is the total number of documents in the corpus,  is the frequency of ℎ term in the
ℎ document of the corpus, and  is known as document frequency of the ℎ term, which
is the number of documents in which the ℎ term appears [23]. Many researchers use the
entropy-based term weighting technique to form a term-document matrix from a text collection
[13, 15, 16, 17]. This method is developed in the spirit that the more important term is the more
frequent one that occurs in fewer documents, taking the distribution of the term over the corpus
into account [15]. The weight of a term in a document is determined by the entropy of the
term frequency of the term in that document [15]. The weight ( ) of the ℎ term in the ℎ
 = log (︀  + 1)︀ ×
∑︀  log  )︃
log( + 1)


∑︀ 
=1
2https://radimrehurek.com/gensim/models/logentropy_model.html
(1)
(2)
Here N is the total number of documents in the corpus and  is the frequency of ℎ term in
the ℎ document of the corpus. Generally BOW model generates a lot of terms which makes the
term-document matrix sparse and high dimensional which can badly afect the performance of
the text classifiers [ 17]. Hence  2-statistic-based term selection technique was used to identify
essential terms from the term-document matrix, which is a widely used technique for term
selection [13, 16, 25]. We have used diferent thresholds to choose the best terms following the
 2-statistic and evaluated the performance of individual classifiers using this set of terms on
the training corpus. The best set of terms for individual classifiers is used for experiments on
the test data. The  2-statistic for term selection is used for both TF-IDF and Entropy-based
term weighting schemes in the experiments in section 4.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Features Generated by Transformer Based Architecture</title>
          <p>Some transformer architecture-based methods were used for identifying the signs of
pathological gamblers by generating relevant embeddings using the conversations of individual users.
We need to capture the long range dependency and context of the conversations efectively.
Bidirectional Encoder Representations from Transformers (BERT) is a contextualized word
representation model based on a masked language model and pre-trained using bidirectional
transformers on general domain corpora i.e., English Wikipedia and books [18]. Moreover, we
have used RoBERTa [20] model, an extension of BERT which was initially trained on a news
corpus by fixing some specific parameters and training strategies of BERT [ 20]. The Longformer
model [19] has significant advantages over BERT to identify long-term dependency in the given
texts [17]. As the conversations of individual subjects are recorded over a period of time in the
training corpus, Longformer may be useful to identify long-term dependency between diferent
conversations. The parameters of the pre-trained BERT, RoBERTa, and Longformer models
were fine-tuned using the given training corpus to generate the embeddings.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Methods for Text Classification</title>
        <p>Adaptive Boosting (Ada-Boost), Logistic Regression, Random Forest, K-Nearest Neighbors, and
Support Vector Machine (SVM) classifiers were implemented to identify pathological gamblers
by using the conversations in the training corpus. It should be noted that these classification
methods were implemented using bag of words following both Entropy and TF-IDF-based
weighting schemes. In order to identify significant parameters for individual classifiers, the
grid search technique3 was used following 10-fold cross-validation method on the training
corpus. Furthermore, we have explored the performance of pre-trained BERT, RoBERTa, and
Longformer models from the Hugging Face repository4 using the training corpus.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <p>The training data for individual subjects were released in XML format with identity, timestamp,
and postings with the ground truth. The proposed frameworks were evaluated using the training
3https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
4https://huggingface.co/
data. In the test corpus, 103 users were marked as pathological gamblers and 2071 users were
marked as the control group [11]. These training and test corpora statistics clearly indicate that
the control group has a very high number of instances compared to the pathological gamblers.</p>
      <p>Runs Frameworks
BioNLP-IISERB 0 Entropy Based Features and SVM Classifier
BioNLP-IISERB 1 TF-IDF Based Features and SVM Classifier
BioNLP-IISERB 2 Entropy Based Features and AdaBoost Classifier
BioNLP-IISERB 3 TF-IDF Based Features and AdaBoost Classifier
BioNLP-IISERB 4 Longformer Model</p>
      <p>We have submitted five runs using the following five diferent frameworks as mentioned in
Table 1. The performance of diferent feature engineering techniques and the classifiers were
evaluated following 10 fold cross-validation method on the training corpus. Based on these
performances, the five best frameworks were chosen, which were applied to the test corpus
and submitted as five runs in Table 2. Scikit-learn 5 libraries were used to implement AdaBoost,
K-Nearest Neighbor, Logistic Regression, Random Forest, and SVM classifiers. The balanced
weighting scheme of individual classes was used for each classifier to overcome the efect of the
control group over the pathological gambling class. This weighting scheme as implemented in
Scikit-learn automatically adjusts weights of individual classes inversely proportional to the
class frequencies in the training data6 [17]. The pretrained embeddings of BERT7, Longformer8,
and RoBERTa9 were used from the HuggingFace library.</p>
      <p>
        The performance of the proposed frameworks was evaluated in terms of precision, recall,
and f-measure using the training corpus [13]. The performance of the best five frameworks
using the test corpus was evaluated by the organizers in terms of precision, recall, f-measure,
and ERDE5 [26], ERDE50[26], latency  [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], speed [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and latency-weighted F1 score [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>The performances of classifiers using diferent feature engineering schemes are presented in
Table 2 in terms of precision, recall, and F1 scores. These results demonstrate valuable insights
for evaluating the efectiveness of diferent frameworks. Subsequently, the top three frameworks
were determined based on f-measure scores from Table 2. These frameworks were implemented
on the test corpus and the corresponding results are reported in Table 3 and Table 4 as published
by the organizers [11]. Analysis of Table 2 reveals that SVM using both entropy-based and
TF-IDF based term weighting schemes outperform the other frameworks in terms of precision,
recall and f-measure. Table 2 also shows that Ada-Boost using both entropy-based and TF-IDF
5http://scikit-learn.org/stable/supervised_learning.html
6https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html
7https://huggingface.co/bert-base-uncased
8https://huggingface.co/allenai/longformer-base-4096
9https://huggingface.co/roberta-base</p>
      <p>Classifier
AdaBoost
Logistic Regression
Support Vector Machine
Random Forest
Decision Tree
AdaBoost
Logistic Regression
Support Vector Machine
K-Nearest Neighbor(KNN)
Decision Tree
BERT
RoBERTa
Longformer</p>
      <sec id="sec-4-1">
        <title>Bag of words following TF-IDF</title>
        <p>based term weighting scheme
(Using given training data)</p>
      </sec>
      <sec id="sec-4-2">
        <title>Features based on transformer</title>
        <p>(Using given training data)
based term weighting schemes performs reasonably well than all other classifiers except SVM
on the training corpus. Hence these four frameworks have been executed on the test corpus.
Note that none of the transformer based models i.e., BERT, RoBERTa and Longformer could
even identify a single pathological gambler in the training corpus, however, we still implement
the Longformer model on the test corpus as it generally identify long term dependencies in text.
In fact, its performance was improved when performed on the test corpus, which can be see
from Table 3. Note that the hyper-parameters of only the final layer of BERT, RoBERTa and
Longformer models are fine-tuned using the given training corpus due to time constraints. As
the pathological gambling cases are very few in compare to the control group, the transformer
based models which are pre-trained on Books and Wikipedia, could not identify the semantic
interpretation of the conversations of the pathological gamblers. This may be the reason that
the transformer based models could not identify even a single case of pathological gambling
from the validation data and hence their precision, recall and F1-score are 0 in Table 2. Table 3</p>
        <p>BioNLP</p>
        <p>IISERB0
(Entropy +</p>
        <p>SVM)
0.40
0.60
0.14
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00</p>
        <p>BioNLPIISERB1
(TFIDF +</p>
        <p>SVM)
0.00
0.00
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
and Table 4 [27] respectively show the decision based and ranking based results of five runs
on the test corpus as released by the organizers [11]. The ranking based results of five of our
runs are poor as we could not submit results for many stages due to a technical constraint. The
decision based results are presented in Table 3 in terms of precision, recall, f-measure, 5,
and 50, latency  , latency weighted F1 and speed. It can be see from Table 3 that SVM
using entropy based term weighting scheme for bag of words performs best among all five runs,
however, this model could not achieve a place among the top five runs of eRisk 2023 in terms of
all evaluation techniques. Ada-Boost classifier achieve the best precision and latency   scores
and speed among all the runs in the shared task, but this method could not perform well in
terms of recall, f-measure and ERDE. The Longformer model also achieve the best precision
score among all the runs in the shared task, but it could not perform well in terms of other
evaluation techniques. It can be observed from Table 3 that the precision scores of all five runs
are sound, whereas the recall scores are not reasonable. This indicate that our frameworks have
produced many false negative cases, that means, many pathological gambling cases are wrongly
identified as control group, which is a limitation of the proposed frameworks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The objective of Task 2 in the eRisk 2023 challenge is to create text-mining tools for the early
detection of signs of pathological gambling on social media. In order to achieve this goal,
various text mining frameworks have been constructed by using diferent types of text feature
engineering schemes. Based on the experimental results, the Ada-Boost classifier using the
bag of words following the conventional TF-IDF weighting scheme performed better than all
other runs in the shared tasks in terms of precision. It is worth noting that pre-trained BERT,
RoBERTa, and Longformer models were further trained using the given training corpus, which
is reasonably small to properly fine-tune necessary parameters. Hence the performance of
the transformer-based model is not reasonably well like the classical bag of words model. In
the future, we plan to develop transformer-based embeddings from scratch by collecting huge
amounts of conversations over social media, which can be further tuned for a downstream
task, like pathological gambling. Moreover, none of the proposed frameworks consider the
timestamps of individual posts of individual users and hence they could not capture the temporal
information of individual conversations, which may be another reason for the poor performance
of most of these models. We aim to incorporate the temporal information as an input to be used
to train a classification model as a future plan.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>Abu Talha and Tanmay Basu acknowledge the support of the seed funding (PPW/R&amp;D/2010006)
provided by Indian Institute of Science Education and Research Bhopal, India.
[11] J. Parapar, P. Martín-Rodilla, D. E. Losada, F. Crestani, Overview of erisk 2023: Early
risk prediction on the internet, in: Proceedings of Experimental IR Meets Multilinguality,
Multimodality, and Interaction: 14th International Conference of the CLEF Association,
Thessaloniki, Greece, Springer, 2023.
[12] G. Salton, M. J. McGill, Introduction to Modern Information Retrieval, McGraw Hill, 1983.
[13] T. Basu, S. Goldsworthy, G. V. Gkoutos, A sentence classification framework to identify
geometric errors in radiation therapy from relevant literature, Information 12 (2021) 139.
[14] A. Selamat, S. Omatu, Web page feature selection and classification using neural networks,</p>
      <p>Information Sciences 158 (2004) 69–88.
[15] T. Sabbah, A. Selamat, M. H. Selamat, F. S. Al-Anzi, E. H. Viedma, O. Krejcar, H. Fujita,
Modified frequency-based term weighting schemes for text classification, Applied Soft
Computing 58 (2017) 193–206.
[16] T. Basu, G. V. Gkoutos, Exploring the performance of baseline text mining frameworks
for early prediction of self harm over social media., in: Proceedings of International
Conference of CLEF Association, 2021, pp. 928–937.
[17] H. Srivastava, N. S. Lijin, S. Sruthi, T. Basu, Nlp-iiserb@erisk2022: Exploring the
potential of bag of words, document embeddings and transformer based framework for early
prediction of eating disorder, depression and pathological gambling over social media,
in: Proceedings of Experimental IR Meets Multilinguality, Multimodality, and Interaction:
13th International Conference of the CLEF Association, Bologna, Italy, 2022.
[18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[19] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The long-document transformer, arXiv
preprint arXiv:2004.05150 (2020).
[20] Y. e. a. Liu, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
arXiv:1907.11692 (2019).
[21] Y. Freund, R. Schapire, N. Abe, A short introduction to boosting, Journal-Japanese Society</p>
      <p>For Artificial Intelligence 14 (1999) 1612.
[22] S. Tong, D. Koller, Support vector machine active learning with applications to text
classification, Journal of Machine Learning Research 2 (2001) 45–66.
[23] C. D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval, Cambridge</p>
      <p>University Press, New York, 2008.
[24] T. Basu, S. Kumar, A. Kalyan, P. Jayaswal, P. Goyal, S. Pettifer, S. R. Jonnalagadda, A
novel framework to expedite systematic reviews by automatically building information
extraction training corpora, arXiv preprint arXiv:1606.06424 (2016).
[25] T. Basu, C. Murthy, A supervised term selection technique for efective text categorization,</p>
      <p>International Journal of Machine Learning and Cybernetics 7 (2016) 877–892.
[26] D. E. Losada, F. Crestani, A test collection for research on depression and language
use, in: International Conference of the Cross-Language Evaluation Forum for European
Languages, CLEF, 2016, pp. 28–39.
[27] J. Parapar, P. Martín-Rodilla, D. E. Losada, F. Crestani, erisk 2023: Depression, pathological
gambling, and eating disorder challenges, in: European Conference on Information
Retrieval, Springer, 2023, pp. 585–592.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Potenza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M.</given-names>
            <surname>Balodis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Derevensky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Petry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Verdejo-Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Yip</surname>
          </string-name>
          , Gambling disorder,
          <source>Nature reviews Disease primers 5</source>
          (
          <year>2019</year>
          )
          <fpage>51</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Potenza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Kosten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Rounsaville</surname>
          </string-name>
          , Pathological gambling,
          <source>Jama</source>
          <volume>286</volume>
          (
          <year>2001</year>
          )
          <fpage>141</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Rash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weinstock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Van</given-names>
            <surname>Patten</surname>
          </string-name>
          ,
          <article-title>A review of gambling disorder and substance use disorders, Substance abuse and rehabilitation (</article-title>
          <year>2016</year>
          )
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. De Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Counts</surname>
          </string-name>
          , E. Horvitz,
          <article-title>Predicting depression via social media</article-title>
          .,
          <source>ICWSM</source>
          <volume>13</volume>
          (
          <year>2013</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>M. De Choudhury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Counts</surname>
          </string-name>
          , E. Horvitz,
          <article-title>Social media as a measurement tool of depression in populations</article-title>
          ,
          <source>in: Proceedings of the 5th Annual ACM Web Science Conference</source>
          , ACM,
          <year>2013</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>L. G.</surname>
          </string-name>
          et al.,
          <article-title>Machine learning and natural language processing in mental health: systematic review</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          ,
          <volume>23</volume>
          (
          <year>2021</year>
          )
          <article-title>e15708</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of erisk 2019:
          <article-title>Early risk prediction on the internet</article-title>
          ,
          <source>in: Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>340</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of erisk 2020:
          <article-title>Early risk prediction on the internet</article-title>
          ,
          <source>in: Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>272</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martin-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of erisk 2021:
          <article-title>Early risk prediction on the internet</article-title>
          ,
          <source>in: Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2022:
          <article-title>Early risk prediction on the internet</article-title>
          ,
          <source>in: Proceedings of Experimental IR Meets Multilinguality, Multimodality, and Interaction: 13th International Conference of the CLEF Association</source>
          , Bologna, Italy, September 5-
          <issue>8</issue>
          ,
          <year>2022</year>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>