=Paper= {{Paper |id=Vol-3224/paper05 |storemode=property |title=Big Hug: Artificial intelligence for the protection of digital societies |pdfUrl=https://ceur-ws.org/Vol-3224/paper05.pdf |volume=Vol-3224 |authors=Arturo Montejo-Ráez,María Teresa Martín-Valdivia,Luis Alfonso Ureña López,Manuel Carlos Díaz-Galiano,Miguel Ángel García Cumbreras,Manuel García Vega,Fernando Martínez Santiago,Flor Miriam Plaza del Arco,Salud M. Jiménez Zafra,María Dolores Molina-González,Luis-Joaquín García-López,María Belén Díez Bedmar |dblpUrl=https://dblp.org/rec/conf/sepln/Montejo-RaezMLD22 }} ==Big Hug: Artificial intelligence for the protection of digital societies== https://ceur-ws.org/Vol-3224/paper05.pdf
Big Hug: Artificial intelligence for the protection of digital
societies
Big Hug: Inteligencia artificial para la protección de la sociedad digital

Arturo Montejo-Ráez1 , María Teresa Martín-Valdivia1 , L. Alfonso Ureña-López1 ,
Manuel Carlos Díaz-Galiano1 , Miguel Ángel García-Cumbreras1 , Manuel García-Vega1 ,
Fernando Martínez-Santiago1 , Flor Miriam Plaza-del-Arco1 ,
Salud María Jiménez-Plaza1 , María Dolores Molina-González1 ,
Luis-Joaquin García-López2 and María Belén Díez-Bedmar3
1
  Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Campus Las
Lagunillas, 23071, Jaén, Spain
2
  Department of Psychology, Universidad de Jaén, Campus Las Lagunillas, 23071, Jaén, Spain
3
  Department of English Studies, Universidad de Jaén, Campus Las Lagunillas, 23071, Jaén, Spain


                                           Abstract
                                           In this paper, we present the Big Hug Project, which aims to claim protect vulnerable citizens and help them
                                           and their families to feel more confident when using social media communication platforms. To this end, it
                                           proposes activities for building quality data, research in new algorithms to adapt current solutions to the
                                           changing nature of colloquial and informal communication, the evaluation of techniques and methods and
                                           the development of demonstrators. This project presents an interdisciplinary approach to early detection of
                                           young people at high-risk emotional problems. The involvement of colleagues from the Clinical Psychology
                                           and Corpus Linguistics fields, furthermore, provides the project with the necessary interdisciplinary to obtain
                                           robust results which may be significant to society.

                                           Keywords
                                           Natural Language Processing, NLP, sentiment analysis, Clinical Psychology, early detection.



                                                                                              1. Introduction
SEPLN-PD 2022. Annual Conference of the Spanish
                                                                                              Human language is the main transmission medium
Association for Natural Language Processing 2022:
Projects and Demonstrations, September 21-23, 2022, A                                         involved in social interaction. There are revolution-
Coruña, Spain                                                                                 ary Natural Language Processing (NLP) algorithms
$ amontejo@ujaen.es (A. Montejo-Ráez); maite@ujaen.es                                         that can provide means to prevent and predict risky
(M. T. Martín-Valdivia); laurena@ujaen.es                                                     interactions, protecting the most fragile members of
(L. A. Ureña-López); mcdiaz@ujaen.es
                                                                                              our digital societies. Children and adolescents have
(M. C. Díaz-Galiano); magc@ujaen.es
(M. García-Cumbreras); mgarcia@ujaen.es                                                       been identified by the World Health Organization
(M. García-Vega); dofer@ujaen.es (F. Martínez-Santiago);                                      as being at particular risk of psychological distress
fmplaza@ujaen.es (F. M. Plaza-del-Arco); sjzafra@ujaen.es                                     in these media1 .
(S. M. Jiménez-Plaza); mdmolina@ujaen.es                                                         Human Language Technologies (HLT) can help us
(M. D. Molina-González); ljgarcia@ujaen.es
                                                                                              build more confident environments. Thanks to NLP,
(L. García-López); belendb@ujaen.es (M. B. Díez-Bedmar)
 0000-0002-8643-2714 (A. Montejo-Ráez);                                                      artificial intelligence solutions are able to model hu-
0000-0002-2874-0401 (M. T. Martín-Valdivia);                                                  man language and use learned models to extract
0000-0001-9752-2830 (L. A. Ureña-López);                                                      information and understand the meaning of text
0000-0001-9298-1376 (M. C. Díaz-Galiano);                                                     flowing through social networks. The combination
0000-0003-1867-9587 (M. García-Cumbreras);
                                                                                              of deep learning algorithms with linguistic resources
0000-0003-2850-4940 (M. García-Vega);
0000-0002-1480-1752 (F. Martínez-Santiago);                                                   and tools, enable the construction of monitoring
0000-0002-3020-5512 (F. M. Plaza-del-Arco);                                                   systems for the early detection of signs of misbe-
0000-0003-3274-8825 (S. M. Jiménez-Plaza);                                                    haviours like eating disorders, depression, bullying
0000-0002-8348-7154 (M. D. Molina-González);                                                  or suicide tendencies over social media[1, 2].
0000-0003-0446-6740 (L. García-López);
                                                                                                 To this end, the project proposes two years of ac-
0000-0001-9250-2224 (M. B. Díez-Bedmar)
                                       © 2022 Copyright for this paper by its authors. Use
                                       permitted under Creative Commons License Attribu-
                                       tion 4.0 International (CC BY 4.0).                       1
                                       CEUR Workshop Proceedings (CEUR-                            https://www.who.int/news-room/fact-sheets/detail/
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073




                                       WS.org)                                                adolescent-mental-health




                                                                                             18
tivities for building quality data, research in new al-   disorder, which also caused anxiety, self-harming
gorithms to adapt current solutions to the changing       and a high risk of suicide. May studies have tackled
nature of colloquial and informal communication,          this fact from psychometrics, but better tools for
the evaluation of techniques and methods and the          modeling the language used would help [7], even
development of demonstrators to leverage human-           more when eating disorders are rising all around
centered solutions that will protect vulnerable citi-     the world. Emotional disorders, like depression and
zens and help them and their families to feel more        anxiety, affect a quarter of our population during
confident when using social media communication           their lifetime [8]. Depression can be studied and
platforms. Besides, this project presents an inter-       identified by monitoring users’ posts and activity
disciplinary approach to early detection of young         [1].
people at high-risk emotional problems. By indi-             In Spain there are 10 suicides a day, twice as
cated prevention, scientific community has agreed         many people die by suicide as by traffic accidents,
to name to high-risk individuals who are identified       11 times more than by homicide and 80 times more
as having some detectable symptoms of emotional           than by gender violence. A very complete overview
disorders but who do not meet criteria or a diagnosis     on how computers and algorithms can help in pre-
at the current time. The collaboration of colleagues      venting or detecting suicide risk is the one recently
from the Clinical Psychology and Corpus Linguis-          published by Ji [9]. Recent studies have found that
tics fields, furthermore, provides the project with       automatic processing of social media communica-
the necessary interdisciplinary approach to obtain        tions is an effective way to detect suicidal ideation
robust results which may be significant to society.       by applying emotion and sentiment analysis over
   Joint efforts of NLP with Corpus Linguistics and       textual messages [10].
Clinical Psychology are sought in this project with a        NLP techniques are being applied to the analysis
two-fold purpose: a) to analyse the results obtained      of social media textual data to face new problems
from the linguistic point of view to fine-tune and        like fake-news detection [11], offensive language iden-
complement the NLP findings; and b) to contrast           tification [12], sentiment analysis [13], opinion min-
the results with the scientific literature on these       ing and emotion detection [14]. Social Big Textual
disorders in Clinical Psychology.                         Data is challenging, because language varies across
                                                          time and space, language register is informal, collo-
                                                          quial and full of idioms compared to formal forms
2. Participants and project funding                       of text. Artificial Intelligence has gained a lot of
                                                          popularity in recent years thanks to advent of Deep
The project brings together 3 partners from Uni-
                                                          Learning techniques [15]. Nevertheless, many of the
versity of Jaén: SINAI group from Advanced Stud-
                                                          applications and problems overcome where already
ies Center in ICT (CEATIC), Department of Psy-
                                                          attempted with traditional algorithms in machine
chology and Department of English Studies. This
                                                          learning, heuristic approaches or knowledge-based
project has been supported by the grant P20_00956
                                                          systems. The big difference to previous approaches
(PAIDI 2020) funded by the Andalusian Regional
                                                          is that current proposals are data-driven: they are
Government.
                                                          able to learn from large amounts of data and build
                                                          models to perform different tasks with a level of
3. State of the art                                       success never reached by other solutions.
                                                             This shift has been especially dramatic for NLP.
It is estimated 24 million children and young people      Linguistic-based methods have been surpassed by
in the EU suffer from bullying every year, which          end-to-end architectures, where no prior knowledge
means that 7 out of 10 suffer some form of ha-            on language is needed [16], but massive amounts
rassment or intimidation, whether verbal, physi-          of data are required. During the last two years
cal or through new communication technologies [3].        we have witnessed the birth of amazing models
Navarro-Gómez [4] stated that social networks allow       like BERT [17], GPT-2 [18] or Transformer-XL [19],
the viral diffusion of degrading contents. Cyber-         with impressive results in many different tasks. New
bullying or electronic aggression has already been        models seem to learn language linguistic nature from
designated as a serious public health threat and          data.
has elicited warnings to the general public from the         The gross research on NLP is turning towards
Centers for Disease Control and Prevention (CDC)          Transformer based models and exploring how far
[5].                                                      these architectures are able to learn and perform
   In another study [6], approximately 1 out of 10        in human related tasks, being sentiment analysis,
people were found to develop some sort of eating          emotion detection and hate-speech identification,




                                                      19
among them.                                                project avoids the problems of fragmentation by
   There are previous projects in the pursuit of sim-      co-ordinating and developing joint activities related
ilar goals, like the STOP project [20] or MENHIR           to early identification in order to coordinate high
[21]. The Big Hug project is not only focused in           quality transnational research. The different per-
exploring algorithm and models for early detection         spectives and especially the different qualifications
of disorders, but also in finding effective ways to        of mental-health, applied linguistics and Informa-
transfer these systems to real world applications.         tion and Communication of Technologies (ICT) spe-
                                                           cialists working in academia could stimulate the
                                                           discovery of new and creative solutions. Apart from
4. Objectives of the project                               multidisciplinarity, there are relevant transversal
                                                           aspects in the project.
The main objective is clear: a multidisciplinary
project for the research on methods and algorithms
to analyse textual streams across time and discover        References
patterns for an early detection of potential harmful
situations or behaviours. This global goal can be           [1] D. E. Losada, F. Crestani, J. Parapar,
divided into the following sub-objectives:                      Overview of erisk 2019 early risk prediction on
                                                                the internet, in: International Conference of
    1. To identify valid technologies for “listening”
                                                                the CLEF for European Languages, Springer,
       the interactions in digital environments.
                                                                2019, pp. 340–357.
    2. To model different forms of aggressive com-          [2] J. Parapar, P. Martín-Rodilla, D. E. Losada,
       munication or risky situations.                          F. Crestani, eRisk 2021: pathological gambling,
    3. To identify young people at high risk, but               self-harm and depression challenges, in: ECIR,
       by the very first time, via a screening of               Springer, 2021, pp. 650–656.
       altogether big data, psychological, linguistic       [3] E. Cross, R. Piggin, T. Douglas, J. Vonkaenel-
       variables.                                               Flatt, Virtual violence ii: Progress and chal-
    4. To facilitate the replication of the screening           lenges in the fight against cyberbullying, Lon-
       protocol based on a well-defined methodology             don: Beatbullying (2012).
       and analysis plan, if the previous objective         [4] N. Navarro-Gómez, El suicidio en jóvenes en
       is met.                                                  españa: cifras y posibles causas. análisis de los
    5. To enhancement of our capabilities to feed               últimos datos disponibles, Clínica y Salud 28
       these artificial intelligences with quality data         (2017) 25–31.
       by means of new techniques and methods               [5] E. Aboujaoude, M. W. Savage, V. Starcevic,
       to process informal language or colloquial               W. O. Salame, Cyberbullying: Review of an
       expressions.                                             old problem gone viral, Journal of adolescent
    6. To adapt human language technologies also                health 57 (2015) 10–18.
       to the specific one that is usually used to          [6] E. Stice, M. J. Van Ryzin, A prospective test
       make apologia of those scenarios.                        of the temporal sequencing of risk factor emer-
    7. To explore practical solutions which may be              gence in the dual pathway model of eating
       integrated in the real world.                            disorders., Journal of Abnormal Psychology
                                                                128 (2019) 119.
                                                            [7] T. Wang, M. Brede, A. Ianni, E. Mentzakis,
5. Conclusion                                                   Detecting and characterizing eating-disorder
                                                                communities on social media, in: Proceedings
Dispositions for eating, anxiety and depressive dis-            of the Tenth ACM International conference on
orders, are multifactorial. Big Hug represents a                web search and data mining, 2017, pp. 91–100.
novel approach for mental disorders, integrating            [8] J. Wang, X. Wu, W. Lai, E. Long, X. Zhang,
mental health, big data and linguistics measures as             W. Li, Y. Zhu, C. Chen, X. Zhong, Z. Liu,
predictive measures for early diagnosis.                        et al., Prevalence of depression and depressive
   Research on mental health, for the early diag-               symptoms among outpatients: a systematic
nosis and treatment of emotional mental health                  review and meta-analysis, BMJ open 7 (2017)
problems in the young is fragmented as researchers              e017173.
have traditionally worked in isolation and few stud-        [9] S. Ji, S. Pan, X. Li, E. Cambria, G. Long,
ies examined the same or more than a limited set                Z. Huang, Suicidal ideation detection: A re-
of risk factors, neglecting novel stratification strate-        view of machine learning methods and appli-
gies and development of algorithms. The Big Hug




                                                       20
     cations, IEEE Transactions on Computational        the Development of a Trustworthy Chatbot for
     Social Systems 8 (2020) 214–226.                   Mental Health Applications, in: MultiMedia
[10] J. J. Glenn, A. L. Nobles, L. E. Barnes, B. A.     Modeling, Springer, 2021, pp. 354–366.
     Teachman, Can text messages identify suicide
     risk in real time? a within-subjects pilot ex-
     amination of temporally sensitive markers of
     suicide risk, Clinical Psychological Science 8
     (2020) 704–722.
[11] F. Monti, F. Frasca, D. Eynard, D. Mannion,
     M. M. Bronstein, Fake news detection on social
     media using geometric deep learning, arXiv
     preprint arXiv:1902.06673 (2019).
[12] M. Zampieri, S. Malmasi, P. Nakov, S. Rosen-
     thal, N. Farra, R. Kumar, Semeval-2019 task 6:
     Identifying and categorizing offensive language
     in social media (offenseval), arXiv preprint
     arXiv:1903.08983 (2019).
[13] E. Martínez-Cámara, M. T. Martín-Valdivia,
     L. A. Urena-López, A. R. Montejo-Ráez, Sen-
     timent analysis in twitter, Natural Language
     Engineering 20 (2014) 1–28.
[14] F. M. Plaza-del Arco, M. T. Martín-Valdivia,
     L. A. Ureña-López, R. Mitkov, Improved
     emotion recognition in spanish social media
     through incorporation of lexical knowledge, Fu-
     ture Generation Computer Systems 110 (2020)
     1000–1008.
[15] J. Dean, D. Patterson, C. Young, A new golden
     age in computer architecture: Empowering the
     machine-learning revolution, IEEE Micro 38
     (2018) 21–29.
[16] T. Young, D. Hazarika, S. Poria, E. Cambria,
     Recent trends in deep learning based natural
     language processing, ieee Computational intel-
     ligenCe magazine 13 (2018) 55–75.
[17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
     Bert: Pre-training of deep bidirectional trans-
     formers for language understanding, arXiv
     preprint arXiv:1810.04805 (2018).
[18] A. Radford, J. Wu, R. Child, D. Luan,
     D. Amodei, I. Sutskever, et al., Language
     models are unsupervised multitask learners,
     OpenAI blog 1 (2019) 9.
[19] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V.
     Le, R. Salakhutdinov, Transformer-xl: Atten-
     tive language models beyond a fixed-length con-
     text, arXiv preprint arXiv:1901.02860 (2019).
[20] D. Ramírez-Cifuentes, A. Freire, R. Baeza-
     Yates, J. Puntí, P. Medina-Bravo, D. A. Ve-
     lazquez, J. M. Gonfaus, J. Gonzàlez, et al.,
     Detection of suicidal ideation on social media:
     multimodal, relational, and behavioral analy-
     sis, Journal of medical internet research 22
     (2020) e17758.
[21] M. Kraus, P. Seldschopf, W. Minker, Towards




                                                   21