=Paper=
{{Paper
|id=Vol-3224/paper05
|storemode=property
|title=Big Hug: Artificial intelligence for the protection of digital societies
|pdfUrl=https://ceur-ws.org/Vol-3224/paper05.pdf
|volume=Vol-3224
|authors=Arturo Montejo-Ráez,María Teresa Martín-Valdivia,Luis Alfonso Ureña López,Manuel Carlos Díaz-Galiano,Miguel Ángel García Cumbreras,Manuel García Vega,Fernando Martínez Santiago,Flor Miriam Plaza del Arco,Salud M. Jiménez Zafra,María Dolores Molina-González,Luis-Joaquín García-López,María Belén Díez Bedmar
|dblpUrl=https://dblp.org/rec/conf/sepln/Montejo-RaezMLD22
}}
==Big Hug: Artificial intelligence for the protection of digital societies==
Big Hug: Artificial intelligence for the protection of digital societies Big Hug: Inteligencia artificial para la protección de la sociedad digital Arturo Montejo-Ráez1 , María Teresa Martín-Valdivia1 , L. Alfonso Ureña-López1 , Manuel Carlos Díaz-Galiano1 , Miguel Ángel García-Cumbreras1 , Manuel García-Vega1 , Fernando Martínez-Santiago1 , Flor Miriam Plaza-del-Arco1 , Salud María Jiménez-Plaza1 , María Dolores Molina-González1 , Luis-Joaquin García-López2 and María Belén Díez-Bedmar3 1 Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Campus Las Lagunillas, 23071, Jaén, Spain 2 Department of Psychology, Universidad de Jaén, Campus Las Lagunillas, 23071, Jaén, Spain 3 Department of English Studies, Universidad de Jaén, Campus Las Lagunillas, 23071, Jaén, Spain Abstract In this paper, we present the Big Hug Project, which aims to claim protect vulnerable citizens and help them and their families to feel more confident when using social media communication platforms. To this end, it proposes activities for building quality data, research in new algorithms to adapt current solutions to the changing nature of colloquial and informal communication, the evaluation of techniques and methods and the development of demonstrators. This project presents an interdisciplinary approach to early detection of young people at high-risk emotional problems. The involvement of colleagues from the Clinical Psychology and Corpus Linguistics fields, furthermore, provides the project with the necessary interdisciplinary to obtain robust results which may be significant to society. Keywords Natural Language Processing, NLP, sentiment analysis, Clinical Psychology, early detection. 1. Introduction SEPLN-PD 2022. Annual Conference of the Spanish Human language is the main transmission medium Association for Natural Language Processing 2022: Projects and Demonstrations, September 21-23, 2022, A involved in social interaction. There are revolution- Coruña, Spain ary Natural Language Processing (NLP) algorithms $ amontejo@ujaen.es (A. Montejo-Ráez); maite@ujaen.es that can provide means to prevent and predict risky (M. T. Martín-Valdivia); laurena@ujaen.es interactions, protecting the most fragile members of (L. A. Ureña-López); mcdiaz@ujaen.es our digital societies. Children and adolescents have (M. C. Díaz-Galiano); magc@ujaen.es (M. García-Cumbreras); mgarcia@ujaen.es been identified by the World Health Organization (M. García-Vega); dofer@ujaen.es (F. Martínez-Santiago); as being at particular risk of psychological distress fmplaza@ujaen.es (F. M. Plaza-del-Arco); sjzafra@ujaen.es in these media1 . (S. M. Jiménez-Plaza); mdmolina@ujaen.es Human Language Technologies (HLT) can help us (M. D. Molina-González); ljgarcia@ujaen.es build more confident environments. Thanks to NLP, (L. García-López); belendb@ujaen.es (M. B. Díez-Bedmar) 0000-0002-8643-2714 (A. Montejo-Ráez); artificial intelligence solutions are able to model hu- 0000-0002-2874-0401 (M. T. Martín-Valdivia); man language and use learned models to extract 0000-0001-9752-2830 (L. A. Ureña-López); information and understand the meaning of text 0000-0001-9298-1376 (M. C. Díaz-Galiano); flowing through social networks. The combination 0000-0003-1867-9587 (M. García-Cumbreras); of deep learning algorithms with linguistic resources 0000-0003-2850-4940 (M. García-Vega); 0000-0002-1480-1752 (F. Martínez-Santiago); and tools, enable the construction of monitoring 0000-0002-3020-5512 (F. M. Plaza-del-Arco); systems for the early detection of signs of misbe- 0000-0003-3274-8825 (S. M. Jiménez-Plaza); haviours like eating disorders, depression, bullying 0000-0002-8348-7154 (M. D. Molina-González); or suicide tendencies over social media[1, 2]. 0000-0003-0446-6740 (L. García-López); To this end, the project proposes two years of ac- 0000-0001-9250-2224 (M. B. Díez-Bedmar) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). 1 CEUR Workshop Proceedings (CEUR- https://www.who.int/news-room/fact-sheets/detail/ CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 WS.org) adolescent-mental-health 18 tivities for building quality data, research in new al- disorder, which also caused anxiety, self-harming gorithms to adapt current solutions to the changing and a high risk of suicide. May studies have tackled nature of colloquial and informal communication, this fact from psychometrics, but better tools for the evaluation of techniques and methods and the modeling the language used would help [7], even development of demonstrators to leverage human- more when eating disorders are rising all around centered solutions that will protect vulnerable citi- the world. Emotional disorders, like depression and zens and help them and their families to feel more anxiety, affect a quarter of our population during confident when using social media communication their lifetime [8]. Depression can be studied and platforms. Besides, this project presents an inter- identified by monitoring users’ posts and activity disciplinary approach to early detection of young [1]. people at high-risk emotional problems. By indi- In Spain there are 10 suicides a day, twice as cated prevention, scientific community has agreed many people die by suicide as by traffic accidents, to name to high-risk individuals who are identified 11 times more than by homicide and 80 times more as having some detectable symptoms of emotional than by gender violence. A very complete overview disorders but who do not meet criteria or a diagnosis on how computers and algorithms can help in pre- at the current time. The collaboration of colleagues venting or detecting suicide risk is the one recently from the Clinical Psychology and Corpus Linguis- published by Ji [9]. Recent studies have found that tics fields, furthermore, provides the project with automatic processing of social media communica- the necessary interdisciplinary approach to obtain tions is an effective way to detect suicidal ideation robust results which may be significant to society. by applying emotion and sentiment analysis over Joint efforts of NLP with Corpus Linguistics and textual messages [10]. Clinical Psychology are sought in this project with a NLP techniques are being applied to the analysis two-fold purpose: a) to analyse the results obtained of social media textual data to face new problems from the linguistic point of view to fine-tune and like fake-news detection [11], offensive language iden- complement the NLP findings; and b) to contrast tification [12], sentiment analysis [13], opinion min- the results with the scientific literature on these ing and emotion detection [14]. Social Big Textual disorders in Clinical Psychology. Data is challenging, because language varies across time and space, language register is informal, collo- quial and full of idioms compared to formal forms 2. Participants and project funding of text. Artificial Intelligence has gained a lot of popularity in recent years thanks to advent of Deep The project brings together 3 partners from Uni- Learning techniques [15]. Nevertheless, many of the versity of Jaén: SINAI group from Advanced Stud- applications and problems overcome where already ies Center in ICT (CEATIC), Department of Psy- attempted with traditional algorithms in machine chology and Department of English Studies. This learning, heuristic approaches or knowledge-based project has been supported by the grant P20_00956 systems. The big difference to previous approaches (PAIDI 2020) funded by the Andalusian Regional is that current proposals are data-driven: they are Government. able to learn from large amounts of data and build models to perform different tasks with a level of 3. State of the art success never reached by other solutions. This shift has been especially dramatic for NLP. It is estimated 24 million children and young people Linguistic-based methods have been surpassed by in the EU suffer from bullying every year, which end-to-end architectures, where no prior knowledge means that 7 out of 10 suffer some form of ha- on language is needed [16], but massive amounts rassment or intimidation, whether verbal, physi- of data are required. During the last two years cal or through new communication technologies [3]. we have witnessed the birth of amazing models Navarro-Gómez [4] stated that social networks allow like BERT [17], GPT-2 [18] or Transformer-XL [19], the viral diffusion of degrading contents. Cyber- with impressive results in many different tasks. New bullying or electronic aggression has already been models seem to learn language linguistic nature from designated as a serious public health threat and data. has elicited warnings to the general public from the The gross research on NLP is turning towards Centers for Disease Control and Prevention (CDC) Transformer based models and exploring how far [5]. these architectures are able to learn and perform In another study [6], approximately 1 out of 10 in human related tasks, being sentiment analysis, people were found to develop some sort of eating emotion detection and hate-speech identification, 19 among them. project avoids the problems of fragmentation by There are previous projects in the pursuit of sim- co-ordinating and developing joint activities related ilar goals, like the STOP project [20] or MENHIR to early identification in order to coordinate high [21]. The Big Hug project is not only focused in quality transnational research. The different per- exploring algorithm and models for early detection spectives and especially the different qualifications of disorders, but also in finding effective ways to of mental-health, applied linguistics and Informa- transfer these systems to real world applications. tion and Communication of Technologies (ICT) spe- cialists working in academia could stimulate the discovery of new and creative solutions. Apart from 4. Objectives of the project multidisciplinarity, there are relevant transversal aspects in the project. The main objective is clear: a multidisciplinary project for the research on methods and algorithms to analyse textual streams across time and discover References patterns for an early detection of potential harmful situations or behaviours. This global goal can be [1] D. E. Losada, F. Crestani, J. Parapar, divided into the following sub-objectives: Overview of erisk 2019 early risk prediction on the internet, in: International Conference of 1. To identify valid technologies for “listening” the CLEF for European Languages, Springer, the interactions in digital environments. 2019, pp. 340–357. 2. To model different forms of aggressive com- [2] J. Parapar, P. Martín-Rodilla, D. E. Losada, munication or risky situations. F. Crestani, eRisk 2021: pathological gambling, 3. To identify young people at high risk, but self-harm and depression challenges, in: ECIR, by the very first time, via a screening of Springer, 2021, pp. 650–656. altogether big data, psychological, linguistic [3] E. Cross, R. Piggin, T. Douglas, J. Vonkaenel- variables. Flatt, Virtual violence ii: Progress and chal- 4. To facilitate the replication of the screening lenges in the fight against cyberbullying, Lon- protocol based on a well-defined methodology don: Beatbullying (2012). and analysis plan, if the previous objective [4] N. Navarro-Gómez, El suicidio en jóvenes en is met. españa: cifras y posibles causas. análisis de los 5. To enhancement of our capabilities to feed últimos datos disponibles, Clínica y Salud 28 these artificial intelligences with quality data (2017) 25–31. by means of new techniques and methods [5] E. Aboujaoude, M. W. Savage, V. Starcevic, to process informal language or colloquial W. O. Salame, Cyberbullying: Review of an expressions. old problem gone viral, Journal of adolescent 6. To adapt human language technologies also health 57 (2015) 10–18. to the specific one that is usually used to [6] E. Stice, M. J. Van Ryzin, A prospective test make apologia of those scenarios. of the temporal sequencing of risk factor emer- 7. To explore practical solutions which may be gence in the dual pathway model of eating integrated in the real world. disorders., Journal of Abnormal Psychology 128 (2019) 119. [7] T. Wang, M. Brede, A. Ianni, E. Mentzakis, 5. Conclusion Detecting and characterizing eating-disorder communities on social media, in: Proceedings Dispositions for eating, anxiety and depressive dis- of the Tenth ACM International conference on orders, are multifactorial. Big Hug represents a web search and data mining, 2017, pp. 91–100. novel approach for mental disorders, integrating [8] J. Wang, X. Wu, W. Lai, E. Long, X. Zhang, mental health, big data and linguistics measures as W. Li, Y. Zhu, C. Chen, X. Zhong, Z. Liu, predictive measures for early diagnosis. et al., Prevalence of depression and depressive Research on mental health, for the early diag- symptoms among outpatients: a systematic nosis and treatment of emotional mental health review and meta-analysis, BMJ open 7 (2017) problems in the young is fragmented as researchers e017173. have traditionally worked in isolation and few stud- [9] S. Ji, S. Pan, X. Li, E. Cambria, G. Long, ies examined the same or more than a limited set Z. Huang, Suicidal ideation detection: A re- of risk factors, neglecting novel stratification strate- view of machine learning methods and appli- gies and development of algorithms. The Big Hug 20 cations, IEEE Transactions on Computational the Development of a Trustworthy Chatbot for Social Systems 8 (2020) 214–226. Mental Health Applications, in: MultiMedia [10] J. J. Glenn, A. L. Nobles, L. E. Barnes, B. A. Modeling, Springer, 2021, pp. 354–366. Teachman, Can text messages identify suicide risk in real time? a within-subjects pilot ex- amination of temporally sensitive markers of suicide risk, Clinical Psychological Science 8 (2020) 704–722. [11] F. Monti, F. Frasca, D. Eynard, D. Mannion, M. M. Bronstein, Fake news detection on social media using geometric deep learning, arXiv preprint arXiv:1902.06673 (2019). [12] M. Zampieri, S. Malmasi, P. Nakov, S. Rosen- thal, N. Farra, R. Kumar, Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval), arXiv preprint arXiv:1903.08983 (2019). [13] E. Martínez-Cámara, M. T. Martín-Valdivia, L. A. Urena-López, A. R. Montejo-Ráez, Sen- timent analysis in twitter, Natural Language Engineering 20 (2014) 1–28. [14] F. M. Plaza-del Arco, M. T. Martín-Valdivia, L. A. Ureña-López, R. Mitkov, Improved emotion recognition in spanish social media through incorporation of lexical knowledge, Fu- ture Generation Computer Systems 110 (2020) 1000–1008. [15] J. Dean, D. Patterson, C. Young, A new golden age in computer architecture: Empowering the machine-learning revolution, IEEE Micro 38 (2018) 21–29. [16] T. Young, D. Hazarika, S. Poria, E. Cambria, Recent trends in deep learning based natural language processing, ieee Computational intel- ligenCe magazine 13 (2018) 55–75. [17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional trans- formers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [18] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., Language models are unsupervised multitask learners, OpenAI blog 1 (2019) 9. [19] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, R. Salakhutdinov, Transformer-xl: Atten- tive language models beyond a fixed-length con- text, arXiv preprint arXiv:1901.02860 (2019). [20] D. Ramírez-Cifuentes, A. Freire, R. Baeza- Yates, J. Puntí, P. Medina-Bravo, D. A. Ve- lazquez, J. M. Gonfaus, J. Gonzàlez, et al., Detection of suicidal ideation on social media: multimodal, relational, and behavioral analy- sis, Journal of medical internet research 22 (2020) e17758. [21] M. Kraus, P. Seldschopf, W. Minker, Towards 21