<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LudoTrack: Web Mining, Search Technologies and Natural Language Processing for the Early Detection of Pathological Gambling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David E. Losada</string-name>
          <email>david.losada@usc.gal</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nelly Condori-Fernández</string-name>
          <email>nelly.condori@usc.gal</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Fernández-Pichel</string-name>
          <email>marcos.fernandez.pichel@usc.gal</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela</institution>
          ,
          <addr-line>Santiago de Compostela, 15782</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This document summarises the project entitled “Web Mining, Search Technologies and Natural Language Processing for the Early Detection of Pathological Gambling”. This is a research project funded by “Dirección General de Ordenación del Juego - Ministerio de Consumo” (Government of Spain). The project started in early 2024 and will run until the end of 2024. The funding for this project was given in the context of a national call of Ministerio de Consumo ( “Convocatoria de subvenciones, durante el ejercicio 2023, para el desarrollo de actividades de investigación relacionadas con la prevención de los trastornos del juego, con los efectos derivados de dichos trastornos o con los riesgos asociados a esta actividad”).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Pathological Gambling</kwd>
        <kwd>Web and Text Mining</kwd>
        <kwd>Search Technologies</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Gambling disorders were incorporated by the World Health Organisation (WHO) into the International
Classification of Diseases ICD-11 (published in 2018, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), responding to the growing international
concern in this area. Already in 2013, Internet gaming disorder had also been included in the Diagnostic
and Statistical Manual of Mental Disorders (DSM-5) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as a condition requiring specific study.
      </p>
      <p>
        Patterns associated with gaming can lead to dysfunction and psychological distress for some players
and, in various countries, this problem has generated significant public health concerns [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Despite
the severity of these disorders, in many cases, individuals do not receive treatment or receive it late.
There are well-documented limitations of existing preventive tools and a need for new instruments that
distinguish across the spectrum of gaming behaviours, such as "regular and healthy gaming behaviours",
"hazardous gaming," or "gaming disorder" [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The non-identification or late identification of signs
of gaming disorders leads to serious social, health, and economic costs. This also has a significantly
worrying impact on the adolescent population.
      </p>
      <p>
        Language is a powerful indicator of personality traits, emotions, and provides valuable clues about
mental health and disorders [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We can find distinctive psychological patterns in people not only
by analysing the topics they talk about but also by studying the way they use language connectives
such as prepositions or pronouns. Our research project aims to develop the necessary computational
technologies and models to perform large-scale natural language analysis. It involves designing and
implementing new monitoring and analytics tools that, using publicly available information on the
web, can mine content to extract traces and evidence related to gaming disorders. More specifically, our
main goal is to study the way people use language to reveal early signs of gaming-related disorders.
To this end, the most advanced language analysis models would be used, such as deep neural network
architectures based on "transformers" and recent large language models like BERT, ChatGPT, or GPT-4
[
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
        ].
      </p>
      <p>This project does not aim to develop automatic diagnostic technology. In fact, we believe that
diagnostic tasks performed by medical professionals cannot be carried out by completely automated
means. Here, we pursue the more realistic goal of, for example, designing methods that detect the
emergence of initial signs of gaming disorders and understand the evolution of a person from the
initial stages (e.g., mood changes, lack of sleep) to severe stages (e.g., pressing financial problems
or suicidal thoughts). This information would be valuable, for example, for public institutions that
could receive alerts about growing risks in specific population segments (e.g., to incite preventive
measures). These new monitoring tools could extract and present evidence of the emergence and
temporal development of gaming-related disorders that could be exploited by clinical professionals.
This would enrich their current sources of evidence (usually focused on direct interaction with patients,
surveys, and conventional clinical instruments).</p>
      <p>The large volume of interactions and publications available on the Internet and social networks allows
for massive analysis of psychological traits related to various disorders. It is common for individuals
sufering from psychological disorders, such as those related to gaming, to interact with other individuals,
express their concerns, share their experiences, and receive online help from specialised professionals.
However, the analysis of online users presents challenges in several areas: filtering and searching for
information (to find relevant excerpts from users that are pertinent for analysing a given psychological
problem), linguistic text analysis and psycholinguistics, estimation of content quality and reputation
(for example, with the purpose of recommending reputable information for people sufering from a
certain disorder), and massive data processing (distributed computing methods that are scalable and
operate in real-time).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Objectives of the Project</title>
      <p>
        The main scientific hypothesis is that natural language reveals signs of diferent psychological disorders,
particularly addictive disorders related to gambling, and that we can develop early detection technology
based on the analysis of texts published by individuals. The strong relationship between the use of
natural language and diferent psychological conditions has been demonstrated in the past [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and
furthermore, there is multiple evidence that social and web media can provide significant data on
various disorders [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. Our main goal is to carry out this type of analysis on a large scale and
incorporate extraction and search methods that are efective for identifying addictive disorders related
to gambling. This represents a significant advancement because, in general, research in this area has
been limited to small-scale studies (for example, essays written by a small number of already diagnosed
patients) and has consistently ignored the temporal component, which is essential for analysing the
evolution of disorders and performing early detection.
      </p>
      <p>
        It is feasible to have data for a project like this, and we already have experience in extracting and
analysing information online. There are public and freely available contents (open forums, social
networks, etc.) where people openly discuss their problems related to their addiction and tell others that
they have been diagnosed with a gambling addictive disorder or that they are beginning to develop it.
This includes conventional social networks, with open public groups [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], platforms specialised in
gambling addiction (ludopatia.org [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], vidasinjuego [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and other sources [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]), and personal support
platforms [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. There are also online recovery programs, surveys, and other analytical instruments in
that can be exploited for this project. This provides valuable resources to understand the problems these
people sufer, categorise them, automatically extract topics (concerns, psychological impact, personality
efects, emotions, among others) and obtain reputable contents and recommendations related to these
disorders (for example, support programs, opinion surveys, or useful questions/answers). In addition,
the established clinical criteria from Diagnostic Manuals for other disorders (for example, for depression:
mood changes, loss of interest, etc.) can also be useful for studying the evolution of language use
and concerns expressed over time (and how they reflect symptoms of anxiety, depression, etc.). In
this context, we will develop predictive technology demonstrators aimed at relevant stakeholders (for
example, health professionals, psychologists, and the Ministry of Consumer Afairs, funder of this
project).
      </p>
      <p>
        The automatic analysis of texts and web pages will focus on public data made freely available online
by users or Internet platforms. The developed algorithms will be evaluated with standardised and
curated collections. We will not work with personal data, and therefore, the usual guidelines on
privacy are not applicable to our project. In any case, we will use appropriate anonymisation strategies
to remove proper names in the texts, user account identifiers, and other elements that could reveal
any information about the subjects. Moreover, the design of the experiments and, in particular, the
construction of the test collections, will carefully follow the recent recommendations on ethical aspects
in the design of natural language analysis experiments [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. It will be necessary to avoid demographic
biases, misrepresentation, or exclusion of certain population groups in the training collections (these
biases threaten the universality and objectivity of the extracted knowledge); it is also necessary to avoid
over-generalisation and overexposure or underexposure (as much as possible avoiding, for example,
that the constructed resources are oriented to a single language) and identify possible fraudulent or
unethical uses of these technologies.
      </p>
      <p>
        All activities carried out within the framework of the project will take special care to ensure that
the models and solutions do not incorporate any type of gender bias (or other types of biases). The
creation of datasets and algorithmic design will follow existing guidelines and recommendations [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
aimed at working with online data and avoiding biases and methodological deficiencies. We will also
comply with rigorous ethical practices. We will document in detail the process by which datasets and
models are created, and we will critically examine this process. We will extend studies on online data
to diferent platforms, themes, moments, and subpopulations, to determine how results vary across,
for example, diferent cultural, demographic, and behavioural contexts. We will enable transparency
mechanisms that allow auditing the developed software and evaluating the biases of the data at the
source. It is also relevant to comment that in the activity of this project, there would be no contact or
interaction with people sufering from disorders or making publications on the Internet. Therefore, the
project is exempt from IRB approval.
      </p>
      <p>The specific objectives of this project are:
• O1. Develop new methods and resources that generate useful evidence for the monitoring and
prevention of compulsive gambling disorder problems.</p>
      <p>
        This project addresses an innovative area where there are few test collections and polished open
data. Additionally, new metrics for evaluation and early detection measures are needed. It will be
necessary to create new test collections, oriented to diferent use cases related to disorders derived
from gambling. For example, by retrieving, processing, and extracting data on the Internet related
to the diferent phases of the problem, ranging from regular and not particularly harmful gambling
behaviour to dangerous gambling phases or gambling disorder. To that end, we need to compile
on-topic evidence about diferent issues or related themes (mental health, emotional impact,
psychological efects, financial dificulties, academic, work or social problems, legal issues, etc.).
The team that leads this project has extensive experience in creating new datasets and reference
collections [
        <xref ref-type="bibr" rid="ref21">21, 22, 23, 24, 25, 26, 27</xref>
        ]. On the other hand, we will work on defining appropriate
evaluation metrics to determine the quality of early detection systems. Here, it is necessary
to take into account multiple dimensions, such as the relevance of the extracted information
(regarding the area of gambling disorders), the computational eficiency of the extraction methods,
scalability, and the validity of these collections to promote the development of intelligent early
detection solutions (where the temporal dimension is fundamental).
• O2. Define efective methods of text search and filtering and apply them for the identification
of high-quality textual sources relevant to the diferent information needs related to gambling
disorders. Define models for analysing themes related to this risk and their temporal evolution.
It will be necessary to manage large volumes of data and filter out irrelevant information (contents
not related to the type of risk to be monitored and studied). We will work here on eficient and
efective methods of searching and filtering relevant information: automatic query generation (on
compulsive gambling themes), query expansion, sentence/passage retrieval, relevance feedback
and identification of user profiles related to a certain type of risk. Diferent domain resources (for
example, specialised vocabularies and medical terminologies, such as those recently incorporated
in the ICD-11 related to gambling) will facilitate the extraction of key terms or expressions for the
generation/identification of key passages. We will also address data fusion and topic extraction
and analysis. We have extensive experience in these areas: identification of queries on health or
nutrition [28], query generation and expansion [29], sentence retrieval [30], and ranking fusion
[31].
• O3. Develop linguistic resources, train language models and related technologies focused on
managing multiple profiles of interest related to gambling disorders and their addictions.
Implement advanced natural language processing and linguistic analysis for monitoring content
related to these risks.
      </p>
      <p>We will work on the automatic elaboration of lexicographic resources adapted to the domain of
searching for signs of risk of pathological gambling. We will consider some existing multilingual
analytical extraction tools, such as LinguaKit [32] or the well-known Stanford NLP Toolkit, whose
linguistic modules can be improved and adapted to this project. Also, recent large language
models developed by Open AI (ChatGPT, GPT4) and Google (Bard), among others, will also
be used to take advantage of their advanced language capabilities which, connected with the
appropriate information for this project, can automatically perform tasks such as automatic
cataloguing or summary generation.
• O4. Develop flexible and eficient solutions for the massive processing of data from multiple
online sources, including social media, and implement real-time analysis of online content.
We have experience in designing and implementing Big Data solutions for early risk detection
[33]. We have also developed publicly available tools for real-time processing of social media
data [34]. However, there are a number of challenges when designing Big Data solutions in the
context of this project. For example, we need to process information in real-time to extract web
content and analyse user-generated publications related to gambling.
• O5. Define methods for analysing results, generating conclusions, and exploiting expert
knowledge (for example, from psychologists, medical specialists, or communication professionals).
Determine the ways in which expert knowledge can guide the identicfiation of reputable content
and how to adapt communication measures or public preventive strategies to the psychological
or risk profile of users.</p>
      <p>The validation of the solutions developed under this project must be carried out by experts in
relevant sectors, and the team of the project includes two professionals specialised in Psychology
and Communication, respectively. Their participation will allow to inject expert knowledge
that can guide the identification of relevant elements (problem phases, psychological impact,
prominent themes, etc.), and to validate the results and determine efective exploitation
strategies. Likewise, it will also be necessary to determine, when appropriate, timely communication
strategies to, for example, configure recommendations for preemptive programs, and suggest
reputable and high quality support content for people at risk. In this sense, it will be necessary to
advance in understanding how diferent users react to diferent preventive or risk communication
campaigns and, also, help them to identify toxic, false information or harmful
recommendations (for example, commercial sites that incite them to consume and play online). To propose
recommendations related to various disorders, it is necessary to study social, personality, and
psychological dimensions [35].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Expected Results and Exploitation</title>
      <p>The project has the potential to produce high-quality results at regional, national, and international
levels. Our previous activity in early risk detection and monitoring signs of depression, anorexia,
and other concerning disorders has been very well received internationally and further exploited by
psychologists in our environment. For example, some of our Artificial Intelligence methods have led,
in collaboration with psychologists from the University of A Coruña and experts from the University
of Notre Dame in the USA, to suggestions for improving current monitoring tools for adolescents
at risk due to problematic family situations (see [36], a collaborative work between USC, UDC, and
the University of Notre Dame aimed at applying Machine Learning to predict risk in adolescents
with problematic family situations). Therefore, we have experience in exploiting results with clinical
professionals and are in the best position to do so with the results of this new project. On the other
hand, the data and resources we have generated in the past (test collections, exploratory experimental
challenges, etc.) have had a high global impact and represent a valuable resource to boost research in
these areas.</p>
      <p>Regarding this project, the new collections, resources, and experimental methods have the potential
to produce a global impact and will be distributed publicly and openly so that many other teams
can advance in this field. We expect to build large-scale reusable test collections that will become
an international reference for studying the interactions between diferent problems and aspects of
addictive gambling disorder and the use of natural language. We also hope to lay the experimental
foundations (new performance metrics, new computationally eficient ways to create resources) for the
early prediction of addictive gambling risks. These innovative evaluation methods have the potential to
be useful not only for this project but also as a way to assess early risk prediction in other domains (for
example, identifying sexual predators, cyberbullying, terrorism, etc.).</p>
      <p>The new search methods, language technologies, and communication and recommendation models
tailored to the case of addictive gambling disorders are also highly innovative developments and can
lead to pioneering methods. For example, we hope to propose creative ways to automatically search for
signs of disorders and their underlying problems (personal health, emotional and psychological impact,
ifnancial, work, academic, social, or legal dificulties, etc.). We will investigate new search methods,
based on the automatic construction of queries from expert knowledge, and these information filtering
and selection strategies can have an impact beyond this project (for example, to support health-related
searches in clinical repositories). The recommendation of content associated with gambling disorders
and the related communication strategies will be another highly innovative area of our activity whose
results have the potential to contribute beyond the scope of the project. Likewise, the large-scale
processing technology aims to serve not only this project but also the international community (for
example, in projects and activities requiring real-time processing or analysis on social networks).
Moreover, our results have great potential to be published in high-impact international venues and
disseminated to society.</p>
      <p>In terms of social impact, the participation of experts in Psychology, as a fundamental part of the
project, ensures great potential for the exploitation of the results. In this sense, we will propose use
cases (individualised analysis of subjects, study of disorders in communities or population groups, etc.)
that can help psychologists, educators, and teachers in their daily activity. Indeed, the project can
generate new valuable knowledge (for example, providing data on the evolution of diferent dimensions
associated with disorders: disafection, sleep problems, weight loss, etc.). This result is useful in itself
for diferent professionals interested in addictive gambling disorders.</p>
      <p>The project is also promising for sparking new prevention campaigns and communication strategies
about gambling addictions. The team of the project incorporates an expert professor in Communication.
We expect to obtain substantial evidence about the disorders, their appearance and evolution, and, thus,
instigate recommendations to institutions (Ministry of Consumer Afairs and Ministry of Health, in
addition to other related agencies and councils at the regional level) on public communication and
preemptive policies. This is crucial, as numerous studies warn about the growth of gambling problems
in our society. For example, according to 2021 data from the Spanish General Directorate for Gambling
Regulation (DGOJ), we had in Spain an estimated number of 1,400,000 online gamblers; and these
ifgures have been growing over the last decade. Although many of these active players do not have a
pathological disorder, it is essential to analyse this population and identify the potential emergence of
problems. We will work on communicating the project’s results to relevant public institutions and, in
fact, we also have experience in this area.</p>
      <p>We will also consider, when appropriate, the use of certain project results for signing exploitation
agreements with third parties. This project has the potential to transfer results, and we included in
the task plan the academic formalisation of problems of interest (for example, extraction, analysis, or
search tools) for potential stakeholders. The team has experience in executing R&amp;D contracts with
companies. The planned approach for this project consists of, where appropriate, registering the
intellectual property of the software so that universities can exploit it through contracts or exploitation
agreements. This is compatible with sharing other project results with the scientific community (for
example, datasets, linguistic resources, and algorithms to favour reproducibility). This is something we
have been doing regularly.</p>
    </sec>
    <sec id="sec-4">
      <title>4. The relevance of the project to RCIS</title>
      <p>Our project is aligned and relevant to the following key topics of the RCIS conference:
• Information Search and Discovery: By mining publicly available web content, the project aims
to discover and extract traces and evidence related to gambling disorders, contributing to the
improvement of information search and discovery techniques in this domain.
• Big Data &amp; Business Analytics: Through the application of data science techniques and analytics,
the project addresses the challenge of processing vast amounts of web data to identify linguistic
patterns indicative of gambling disorders.
• Digital Transformation: The project represents a digital transformation initiative aimed at
leveraging advanced technologies to address societal issues such as gambling disorders through
innovative approaches in web mining and natural language processing.
• Social Computing: Understanding the way people use language and examining the evolution
of language usage patterns in relation to gambling disorders aligns with the principles of social
computing, ofering insights into multiple user profiles of interest and their interactions in online
environments.
• Health Informatics and E-Health: Given the focus on early detection of pathological gambling,
the project intersects with the e-health domain, where technological advancements play a crucial
role in preventing psychological disorders related to gambling in individuals.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper we have presented the project entitled “Web Mining, Search Technologies and Natural
Language Processing for the Early Detection of Pathological Gambling”, funded by Ministerio de
Consumo, Subdirección General de Regulación del Juego (Government of Spain).</p>
      <p>This project focuses on technologies and computational models that perform large-scale natural
language analysis. The aim is to design and implement new monitoring and analytical tools that, from
publicly available information on the web, mine contents and extract traces and evidence related to
gambling disorders. More specifically, the main goal is to study the way people use language (and the
evolution of the use of language) to reveal early signs of gambling disorders.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was funded by Ministerio de Consumo, Subdirección General de Regulación del Juego, from
the Government of Spain, under grant number SUBV23/00002 (Project entitled “Detección Temprana de
Riesgos de Aparición de Trastornos Adictivos de Juego mediante Minería Web con Modelos Avanzados
de Búsqueda y Procesamiento de Lenguaje Natural ”). Website of the project.</p>
      <p>The authors also thank the financial support supplied by the Consellería de Cultura, Educación,
Formación Profesional e Universidades (accreditation 2019-2022 ED431G-2019/04, ED431C 2022/19)
and the European Regional Development Fund, which acknowledges the CiTIUS-Research Center
in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the
Galician University System.
International conference of the cross-language evaluation forum for European languages, Springer,
2016, pp. 28–39.
[22] D. E. Losada, F. Crestani, J. Parapar, erisk 2017: Clef lab on early risk prediction on the internet:
experimental foundations, in: Experimental IR Meets Multilinguality, Multimodality, and
Interaction: 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September
11–14, 2017, Proceedings 8, Springer, 2017, pp. 346–360.
[23] D. E. Losada, F. Crestani, J. Parapar, Overview of erisk: early risk prediction on the internet, in:
Experimental IR Meets Multilinguality, Multimodality, and Interaction: 9th International
Conference of the CLEF Association, CLEF 2018, Avignon, France, September 10-14, 2018, Proceedings 9,
Springer, 2018, pp. 343–361.
[24] D. E. Losada, F. Crestani, J. Parapar, Overview of erisk 2019 early risk prediction on the internet,
in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 10th International
Conference of the CLEF Association, CLEF 2019, Lugano, Switzerland, September 9–12, 2019,
Proceedings 10, Springer, 2019, pp. 340–357.
[25] D. E. Losada, F. Crestani, J. Parapar, Overview of erisk at clef 2020: Early risk prediction on the
internet (extended overview)., CLEF (Working Notes) (2020).
[26] J. Parapar, P. Martín-Rodilla, D. E. Losada, F. Crestani, Overview of erisk at clef 2021: Early risk
prediction on the internet (extended overview)., CLEF (Working Notes) (2021) 864–887.
[27] P. Martın-Rodilla, D. E. Losada, F. Crestani, Overview of erisk 2022: Early risk prediction on
the internet, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 13th
International Conference of the CLEF Association, CLEF 2022, Bologna, Italy, September 5–8, 2022,
Proceedings, volume 13390, Springer Nature, 2022, p. 233.
[28] D. E. Losada, M. Herrmann, D. Elsweiler, Cost-efective identification of on-topic search queries
using multi-armed bandits, in: Proceedings of the 36th Annual ACM Symposium on Applied
Computing, 2021, pp. 645–654.
[29] J. M. Chenlo, J. Parapar, D. E. Losada, Comments-oriented query expansion for opinion retrieval
in blogs, in: Advances in Artificial Intelligence: 15th Conference of the Spanish Association
for Artificial Intelligence, CAEPIA 2013, Madrid, Spain, September 17-20, 2013. Proceedings 15,
Springer, 2013, pp. 32–41.
[30] L. Azzopardi, R. T. Fernández, D. E. Losada, Improving sentence retrieval with an importance prior,
in: Proceedings of the 33rd international ACM SIGIR conference on Research and development in
information retrieval, 2010, pp. 779–780.
[31] D. E. Losada, J. Parapar, A. Barreiro, A rank fusion approach based on score distributions for
prioritizing relevance assessments in information retrieval evaluation, Information Fusion 39
(2018) 56–71.
[32] P. Gamallo, M. Garcia, Linguakit: uma ferramenta multilingue para a análise linguística e a
extração de informação, Linguamática 9 (2017) 19–28.
[33] R. Martínez-Castaño, J. C. Pichel, D. E. Losada, A big data platform for real time analysis of signs
of depression in social media, International journal of environmental research and public health
17 (2020) 4752.
[34] R. Martínez-Castaño, J. C. Pichel, D. E. Losada, Building python-based topologies for massive
processing of social media data in real time, in: Proceedings of the 5th Spanish Conference on
Information Retrieval, 2018, pp. 1–8.
[35] S. E. Baumgartner, T. Hartmann, The role of health anxiety in online health information search,</p>
      <p>Cyberpsychology, behavior, and social networking 14 (2011) 613–618.
[36] S. Lopez-Larrosa, V. Sánchez-Souto, D. E. Losada, J. Parapar, A. Barreiro, A. P. Ha, E. M. Cummings,
Using machine learning techniques to predict adolescents’ involvement in family conflict, Social
Science Computer Review 41 (2023) 1581–1607.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] ICD-11: International classification of diseases (11th revision)</source>
          ,
          <year>2022</year>
          . URL: https://icd.who.int/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>American Psychiatric Association</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Association</surname>
          </string-name>
          , et al.,
          <article-title>Diagnostic and statistical manual of mental disorders: DSM-5</article-title>
          , volume
          <volume>5</volume>
          , American psychiatric association Washington, DC,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Humphreys</surname>
          </string-name>
          ,
          <article-title>Sharpening the focus on gaming disorder</article-title>
          ,
          <source>World Health Organization. Bulletin of the World Health Organization</source>
          <volume>97</volume>
          (
          <year>2019</year>
          )
          <fpage>382</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Billieux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Flayelle</surname>
          </string-name>
          , H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Rumpf</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>High involvement versus pathological involvement in video games: A crucial distinction for ensuring the validity and utility of gaming disorder</article-title>
          ,
          <source>Current Addiction Reports</source>
          <volume>6</volume>
          (
          <year>2019</year>
          )
          <fpage>323</fpage>
          -
          <lpage>330</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Pennebaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Mehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Niederhofer</surname>
          </string-name>
          ,
          <article-title>Psychological aspects of natural language use: Our words, our selves</article-title>
          ,
          <source>Annual review of psychology 54</source>
          (
          <year>2003</year>
          )
          <fpage>547</fpage>
          -
          <lpage>577</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <article-title>Pretrained transformers for text ranking: Bert and</article-title>
          beyond, Springer Nature,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Minaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kalchbrenner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cambria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chenaghlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Deep learning-based text classification: a comprehensive review, ACM computing surveys (CSUR) 54 (</article-title>
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bubeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chandrasekaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          , E. Kamar,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          , et al.,
          <source>Sparks of artificial general intelligence: Early experiments with gpt-4</source>
          , arXiv preprint arXiv:
          <volume>2303</volume>
          .12712 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Ríssola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>A survey of computational methods for online mental state assessment on social media</article-title>
          ,
          <source>ACM Transactions on Computing for Healthcare</source>
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <article-title>Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years of the ERisk Project</article-title>
          , volume
          <volume>1018</volume>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>Nature</given-names>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] Ludopatía/adicción al juego. Grupo público de Facebook.,
          <year>2011</year>
          . URL: https://www.facebook.com/ groups/253782884636115, [accessed February 13,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ludopatia</surname>
          </string-name>
          .org.
          <article-title>Por la rehabilitación de jugadores patológicos y otras adicciones</article-title>
          . Foro de discusión.,
          <year>2010</year>
          . URL: https://www.ludopatia.org/forum/default.asp,
          <source>[accessed February 13</source>
          ,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <article-title>Vida sin Juego</article-title>
          . Foro de Discusión.,
          <year>2009</year>
          . URL: https://vidasinjuego.forosactivos.net/,
          <source>[accessed February 13</source>
          ,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <article-title>Ludopatía, adicción y problemas con el juego</article-title>
          . Foro de discusión.,
          <year>2008</year>
          . URL: http://foroapuestas. forobet.
          <article-title>com/ludopatia-adiccion-y-problemas-con-el-juego/</article-title>
          ,
          <source>[accessed February 13</source>
          ,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ludopatía</surname>
          </string-name>
          . Foro de discusión.,
          <year>2022</year>
          . URL: https://www.forolinternas.com/viewtopic.php?f=
          <volume>16</volume>
          &amp;t= 17505, [accessed February 13,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <article-title>Ludopatía online</article-title>
          . Plataforma de ayuda personal.,
          <year>2024</year>
          . URL: https://ludopatiaonline.com/ foro-ludopatia/?amp,
          <source>[accessed February 13</source>
          ,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <article-title>Jugadores anónimos</article-title>
          . Plataforma de apoyo.,
          <year>2024</year>
          . URL: https://www.jugadoresanonimos.org/,
          <source>[accessed February 13</source>
          ,
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Spruit</surname>
          </string-name>
          ,
          <article-title>The social impact of natural language processing</article-title>
          ,
          <source>in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>591</fpage>
          -
          <lpage>598</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Olteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Diaz</surname>
          </string-name>
          , E. Kıcıman,
          <article-title>Social data: Biases, methodological pitfalls, and ethical boundaries, Frontiers in big data 2 (</article-title>
          <year>2019</year>
          )
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>A test collection for research on depression and language use</article-title>
          , in:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>