1. Introduction

LLaMAntino against Cyber Intimate Partner Violence

Pierpaolo Basile

Marco de Gemmis

Marco Polignano

Giovanni Semeraro

Lucia Siciliani

Vincenzo Tamburrano

Fabiana Battista

Rosa Scardigno

1 0 University of Bari Aldo Moro, Dept. of Computer Science , Via E. Orabona 4, Bari, 70125 , Italy 1 University of Bari Aldo Moro, Dept. of Education Science , Psychology, Communication Science, Via Scipione Crisanzio 42, Bari, 70122 , Italy

Intimate Partner Violence refers to the abusive behaviours perpetrated on their own partner. This social issue has witnessed an increase over time, particularly after Covid-19. IPV can be circumscribed into two broad categories known as Intimate Partner Violence (IPV) and Cyber Intimate Partner Violence (C-IPV). Social Media and technologies can exacerbate these types of behaviours, but some “digital footprints”, such as textual conversations, can be exploited by Artificial Intelligence models to detect and, in turn, prevent them. With this aim in mind, this paper describes a scenario in which the Italian Language Model family LLAmAntino can be exploited to explain the presence of toxicity elements in conversations related to teenage relationships and then educate the interlocutor to recognize these elements in the messages received.

eol>Natural Language Processing Abusive Language Large Language Models

1. Introduction

models to identify potential violence-related behaviours is essential, and often, it provides the only means to Research indicates that the most prevalent form of vi- act promptly and in real-time. Having such a tool can olence is that directed toward one’s partner, known as serve as a preventive measure against the escalation of Intimate Partner Violence (IPV). Early detection of these harmful situations, for example, by integrating it into behaviours can be instrumental in mitigating their oc- instant messaging apps and raising alerts where harmful currence. One of the most critical aspects of this kind of content is detected. behaviour is that victims often face challenges in identi- In this paper, we aim to utilize Large Language Models fying harmful behaviours due to their close relationship (LLMs) as tools that can not only identify but also explain with the perpetrator. Misconceptions about romantic re- toxic elements in intimate conversations. More speciflationships, often due to old cultural stereotypes, such as ically, we use a dataset of conversations about teenage the belief that certain behaviours are normal or accept- relationships written in Italian that has been accurately able, can further complicate the recognition of harmful annotated by human experts. Given LLMs’ capability to actions. In today’s society, the widespread use of social tackle several downstream tasks, our goal is to explore media and digital platforms has evolved this issue into the impact of diferent kinds of prompts on the generaCyber Intimate Partner Violence (C-IPV) and often allows tion of precise explanations. the perpetrators to gain greater control over their victims The paper is structured as follows: in Section 2, we by constantly monitoring their locations or interactions provide a frame of what is intimate partner violence, with other people. the diferent forms, and the deleterious intra and inter

Contrary to common belief, these technologies can be personal consequences. Moreover we also provide an used to address the issue of violence. In fact, building AI overview of the methods proposed in the literature. Section 3 focuses on the task of explaining toxic language CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, in the context of IPV. We describe the dataset and the Dec 04 — 06, 2024, Pisa, Italy diferent types of annotations provided by researchers in *$Coprireersppaoonlod.binagsialeu@thuonr.iba.it (P. Basile); marco.degemmis@uniba.it General Psychology, as well as the prompting strategy (M. d. Gemmis); marco.polignano@uniba.it (M. Polignano); adopted to instruct the language model. Finally, in Secgiovanni.semeraro@uniba.it (G. Semeraro); lucia.siciliani@uniba.it tion 4, we draw some conclusions and discuss directions (L. Siciliani); vincenzo.tamburrano@uniba.it (V. Tamburrano); for the continuation of the work. fabiana.battista@uniba.it (F. Battista); rosa.scardigno@uniba.it (R. Scardigno)

0000-0002-0545-1105 (P. Basile); 0000-0002-2007-9559 2. Background and related work (M. d. Gemmis); 0000-0002-3939-0136 (M. Polignano); 0000-0001-6883-1853 (G. Semeraro); 0000-0002-1438-280X IPV is defined as any abuse or aggression by one partner (0L0.0S0i-c0i0li0a3n-i4)0;8060-0793-90X00(7F-.3B80a2tt-i8s4ta2)X; 0(0V0.0T-a0m00b2u-5rr7a2n5o-6);483 against the other [ 1 ]. It afects individuals regardless of (R. Scardigno) their gender or sexual orientation [ 2 ]. According to [ 1, 3 ], © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License IPV includes four main categories which involve distinct Attribution 4.0 International (CC BY 4.0). violent behaviours that can vary in duration and severity: could be useful to provide a "technical" explanation, as if were given by a professional expert in the subject, such • Physical violence: The use of force to harm or as a psychologist. The aim is to provide explanations, injure a partner; well grounded on relevant CIPV literature, that point out • Sexual violence: Non-consensual sexual acts or the elements of toxicity in the conversation.

advances; We started from a dataset available on HuggingFace • Psychological violence: Harmful communication [ 8 ]. The chosen dataset collected Spanish sentences from aimed at afecting the partner’s mental and emo- a group of students (4 girls and 4 boys) aged 15-19 with tional well-being and asserting control; previous training on toxic relationships. For 2 weeks, this • Stalking, monitoring, and control: Persistent, un- group of teenagers analyzed phrases that had occurred wanted attention that induces fear or concern for in their environment (social media, direct communicapersonal safety. tion) or that they themselves produced, classifying them as toxic or healthy and collecting them through a form.

The rise in technology use has exacerbated these be- Afterwards, the examples given by each student were haviours, leading to the emergence of Cyber Intimate discussed and evaluated by the others using peer evalPartner Violence (C-IPV) [ 4 ]. C-IPV retains the charac- uation. The classification was also ratified by two speteristics of IPV but occurs via digital platforms. Common cialists in the field. The original dataset consists of 334 behaviours of this kind include: sentences. As the manual annotation of the sentences is a time-consuming task, for our preliminary experiments we selected only some of them, as described in the following subsection. • Cyber sexual violence: Pressuring for sexual content, coercing sexual acts, or sending unwanted sexual content. • Cyber psychological violence: Using technology 3.1. Dataset and Annotations to cause emotional harm, such as spreading rumours or sending insulting messages. In the original dataset, 165 sentences are classified as • Cyberstalking, monitoring, and control: Unautho- toxic. We selected 42 of them, equally divided between rized access to devices and accounts to monitor CIPV and IPV, with the idea of using 2 of them for fewthe partner. shot prompting and the remaining ones for testing. The selected sentences have been translated into Italian by using two translation services (Google and DeepL) and annotated. We perform this translation step as we want to test the ability of LLaMAntino to detect IPV and CIPV in Italian sentences. We added 5 annotations:

Previous studies have provided valuable insights into the prevalence, characteristics, and individual diferences associated with both in-person and C-IPV, as well as their harmful consequences for victims [ 5, 6, 7 ]. Given these detrimental impacts, early detection of IPV and C-IPV is crucial to prevent their escalation. However, victims often struggle to recognize these behaviours due to their emotional attachment to the perpetrator.

This is the main motivation for our work: we propose the adoption of an LLM as an “assistant” who can explain why a message can be toxic in an intimate relationship.

The explanation makes partners aware of the fact that violence is being committed or sufered and describes the reasons for this happening, as well as the consequences (for example, emotional sufering), with the hope that it can act as a deterrent.

3. Explanations for Toxic Conversations

The idea is to create a dataset of toxic conversations annotated with information about the type of violence (e.g., physical, cyberstalking, cyber sexual violence), the presence of aggressive communication, the adoption of abusive language and, in general, with information that • the type of violence: physical or cyber; • the type of behaviour that led to the physical violence, e.g. sexual assault, stalking; • the type of cyber behaviour that led to the violence, e.g. cyber stalking; • the type of communication: aggressive or non-aggressive; • the type of aggressive communication: e.g., use of abusive language.

As for physical violence, the experts distinguished 4 annotations [ 5 ]: 1. Physical violence: the voluntary use of force that potentially causes harm and injury to the partner; 2. Sexual violence: sexual acts without the partner’s consent, even if only attempted; 3. Psychological aggression: communicating with the intention of negatively influencing the mental and emotional state of the partner and wanting to control him or her;

4. Stalking, monitoring and control: series of recur

ring and unwanted attentions and communications that create fear or apprehension and put the partner’s safety at risk.

As for cyber violence, the experts distinguished 3 annotations [ 7 ]: 1. Cyber sexual violence: requesting or pressuring the partner to send sexual content against his or her will, pressuring the partner to engage in sexual acts; 2. Cyber psychological violence, aggression: behaviour to cause emotional distress to the partner; may include behaviours such as spreading gossip on social media, repeatedly insulting the partner via messages, even spreading videos or photos that cause emotional distress; 3. Cyber stalking, monitoring, and control: using and accessing technological devices and accounts without the partner’s consent, use of technology to get information about your partner, in general, any behaviours that aim at increasing control within the relationship). It includes fraping, that is the alteration of the partner’s information on social profiles.

As for aggressive communication, the experts distinguished 5 annotations [ 9 ]:

1. Curses;

2. Ridiculousness or derision; 3. Bad language; 4. Threat; 5. Attack on the person (on competence, character, background, physical appearance).

At the end of the annotation phase, we had each toxic sentence annotated with information well-grounded in the scientific literature about intimate partner violence. An example of a toxic sentence that reveals IPV is:

"Se sono così geloso è perché ti amo e ci tengo a te." ("If I’m so jealous, it’s because I love you and care about you.", in English)

That sentence has been annotated in the dataset as follows: • type of violence: physical

• type of behaviour: psychological aggression • aggressive communication: no An example of a toxic sentence that reveals CIPV is: "Se non hai nulla da nascondere e c’è fiducia tra di noi, dammi le tue password" ("If you have nothing to hide and we trust each other, give me your passwords", in English) which has been annotated in the dataset as follows: • type of violence: cyber • type of behaviour: cyber

monitoring, and control • aggressive communication: yes • type of aggressive communication: attack on the person stalking,

In order to understand the dificulties of the annotation

task from the human point of view, we used the Cohen’s Kappa score to measure the level of agreement between the annotators who classified a sentence as an example of cyberviolence or not. The observed value, 0.503, revealed moderate agreement. We measured also Cohen’s Kappa score on the agreement on the type of communication (aggressive or not). The observed value, 0.281, revealed fair, acceptable agreement, but at the same time showed that it is more dificult to recognize the use of aggressive language when a bad word is not explicitly used.

The annotations will be exploited by a Large Language Model to generate explanations and raise awareness of the violent behaviour. In the next subsection, we describe how annotations are turned into examples for few-shot prompting. 3.2. Few-Shot Prompting to explain

toxicity in conversations The two toxic sentences mentioned in the previous subsection were used for few-shot prompting. The corresponding annotations were turned into natural language explanations used to build prompts for in-context learning. For instance, the explanation for the previous sentence “If you have nothing to hide and we trust each other, give me your passwords” is: “The sentence is toxic because it is an example of cyber violence. The behaviour falls in the category cyber stalking, monitoring, and control since the aim is to obtain information on the partner’s life and establish a dynamic of control in the couple. Furthermore, the communication is aggressive because it reveals the intimidating intent of attacking the partner to violate his or her privacy.”

A 2-shot prompt is built by including: • the description of the task: “Given a sentence from a conversation between partners in an intimate relationship, say whether it is a case of cyber or other types of violence and explain the reasons why the sentence expresses toxic language. The explanation should be similar to the examples below. (Data una frase di una conversazione tra partner in una relazione sentimentale, dire se è 2. give LLaMAntino-3-ANITA-8B and ChatGPT un caso violenza cyber o di altro tipo e spiegare 3.5 20 IPV toxic sentences in a 0-shot and a 2-shot i motivi per cui la frase esprime un linguaggio setting and record the explanations. tossico. La spiegazione deve essere simile a quella After the generation step, for each test toxic sentence, degli esempi che seguono.)”; we had 4 explanations: LLaMAntino-3-ANITA-8B 0• 2 training toxic sentences, one example of IPV shot, LLaMAntino-3-ANITA-8B 2-shot, ChatGPT 3.5 and one example of CIPV, with corresponding 0-shot, ChatGPT 3.5 2-shot. As for RQ1, results of explanations; classification accuracy are reported in Tables 1-4. • 1 test toxic sentence (without explanation) for The main outcome is that we observed a significant which we want the model to generate an expla- improvement in the accuracy of both models when using nation. 2-shot prompting for recognizing C-IPV. As regards IPV, The 0-shot prompt contained only the task description both models, even with just 0-shot prompting, correctly and the test toxic sentence. In other words, the anno- classified almost all the testing instances: 18 out of 20 tations associated with a toxic sentence are the canvas for LLaMAntino-3-ANITA-8B 0-shot, 19 out of 20 for for writing the explanation included in the prompt. In ChatGPT 3.5 2-shot. This is a clear indication that both the 0-shot and 2-shot settings, we used only one the annotations are mainly useful for C-IPV recognition. generation per prompt, as the model produced consistent Another interesting outcome concerns the percentage of outputs despite the inherent stochasticity of the models. C-IPV sentences for which LLaMAntino-3-ANITA-8B does not recognize the presence of violence at all. With 3.3. Experimental Session 0-shot prompting, this result is 35% (7 out of 20), while with 2-shot prompting it drops to 15% (3 out of 20). We The main aim of the experiment was to assess whether believe that is an important result because it shows that the annotations are actually useful in training the model when the model makes an error in classifying C-IPV, it to give scientifically based explanations, even with few at least acknowledges the presence of violence, even if it examples. The model adopted in the experiment was: does not capture the technological aspect of the abuse. LLaMAntino-3-ANITA-8B [ 10, 11 ]1. Therefore, we want to assess whether the models learn how to perform the task by providing just two examples. Two research questions were issued: ANITA-0shot IPV No violence 13 7 18 2

1. give LLaMAntino-3-ANITA-8B and ChatGPT

3.5 20 C-IPV toxic sentences in a 0-shot and a 2-shot setting and record the explanations;

1LLaMAntino ANITA Web Interface - https://chat.llamantino.it/

2OpenAI ChatGPT [Large Language Model] version 3.5 https://chat. openai.com/chat

As for RQ2, an example of explanation provided by the models is given in appendix A. For the evaluation

Chat-GPT-2shot

IPV No violence 5 0 20 0 we used two metrics: BertScore [ 12 ] and ROUGE [ 13 ], in order to assess both semantic and syntactic similarity among generated explanations and the “gold standard” given by the explanations built according to the codebook.

For each testing sentence, we computed BertScore 0 guage processing, particularly in applications where the between the explanation provided by LLaMAntino-3- model’s output is expected to be accurate, informative, ANITA-8B 0-shot and the codebook explanation. Then, and free from biases. we computed BertScore 2 between the explanation provided by LLaMAntino-3-ANITA-8B 2-shot and the codebook explanation. We compared 0 with 2 4. Conclusions and Future Work in order to choose the most similar explanation to the “gold standard”. Results obtained as the average of the In this paper, we presented our proposal to adopt our BertScore and ROUGE metric are shown in table 5. We LLM to identify and describe toxic elements in discusobserved that for both C-IPV and IPV, all the explanations sions concerning teenage relationships. In particular, given by LLaMAntino-3-ANITA-8B 2-shot were better the LLM was used to generate explanations that describe than those given by 0-shot prompting. The same result why a sentence, in the context of an intimate relationship, was observed for ChatGPT 3.5. The ROUGE metrics gave can be toxic and constitute abuse. The main outcome of similar results: for both C-IPV and IPV, in 90% of test- our preliminary investigation is that, even with few-shot ing sentences, the explanations given by LLaMAntino- prompting, the LLM learns to provide good explanations 3-ANITA-8B 2-shot were found to be more similar to that adhere to a standard provided by expert psycholthe “gold standard” than those given by LLaMAntino-3- ogists. By exploiting LLMs’ proficiency in processing ANITA-8B 0-shot. For ChatGPT 3.5, the 2-shot prompt- and understanding human language, our approach seeks ing gave always better results than 0-shot prompting. to go beyond just detection, aiming to grasp underlying These results led us to give a positive answer to RQ2. motivations and factors contributing to the emergence In general, even with 2-shot prompting, our model was of harmful behaviours. In future works, we intend to able to provide explanations similar to those given by perform fine-tuning steps to better adapt LLMs to the psychology experts. specific task at hand. We also plan to investigate how

The significant improvement in explanation quality diferent pre-training techniques and architectures can when using 2-shot prompting, as measured by both be leveraged to enhance model performance. Supervised BertScore and ROUGE, is a crucial finding in this study. ifne-tuning [ 14 ], for instance, is a technique that can be It suggests that the LLM can learn and adapt to the task employed to adapt the LLM to a specific task, such as of generating explanations for abusive language, given a generating explanations for abusive language, by using small set of examples or prompts. This adaptability is a a labelled dataset. This approach can help the model key characteristic of a well-designed LLM, as it enables to learn from its mistakes and to correct its biases, ultithe model to generalize and improve its performance on mately leading to improved performance. In the context a specific task with limited training data. The results also of our study, supervised fine-tuning could be used to raise important questions about the potential of LLMs train the LLM on a dataset of abusive language explanain applications where they are expected to provide nu- tions, to reduce the model’s error rate and increase the anced and accurate explanations of complex phenom- quality of its responses. Direct Preferences Optimization ena, such as abusive language. While LLaMAntino-3- (DPO) [ 15 ] is another strategy that can be used to imANITA-8B 2-shot was able to generate explanations that prove the performance of the LLM. DPO is a technique were deemed more accurate by the metrics, it is essential that allows the model to be trained directly on a set of to note that the quality of the explanations was still not user-provided preferences, such as the quality of the exon par with those provided by human experts in the field planations it generates. This approach can be particularly of psychology. This study’s findings have implications efective in domains like abusive language, where the for the development of LLMs in the domain of natural lan- quality of the explanations is critical to ensure that the model does not perpetuate harmful biases. To ensure the efectiveness of our approach, we intend to confront our methodology with other models and incorporate further annotations to enhance the robustness and efectiveness of our methodology. This involves comparing the performance of our LLMs with other state-of-the-art models.

Moreover, thanks to the collaboration with expert psychologists who are experts in the field to explore the application of Chain-of-Thought prompting techniques.

Acknowledgments We acknowledge the support of the PNRR project FAIR Future AI Research (PE00000013), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU.

This Publication was produced with the co-funding of the European Union - Next Generation EU: NRRP Initiative, Mission 4, Component 2, Investment 1.3 - Partnerships extended to universities, research centres, companies and research D.D. MUR n. 341 del 15.03.2022 – Next Generation EU (PE0000014 - ”SEcurity and Rights In the CyberSpace - SERICS” - CUP: H93C22000620001).

[7]

Watkins ,

Benedicto , D. DiLillo, The cyber aggression in relationships scale: A new multidimensional measure of technology-based intimate partner aggression , Assessment 25 ( 2018 ) 608 - 626 . doi: 10 .1177/1073191116665696.

[8]

Margarita

Martínez Gabaldón , toxicteenage-relationships (revision 5ce5df0) , 2023 . URL: https://huggingface.co/datasets/ marmarg2/toxic-teenage-relationships. doi:10 .57967/hf/0972.

[9]

D. A.

Infante , C. J. W. III, Verbal aggressiveness: An interpersonal model and measure , Communication Monographs 53 ( 1986 ) 61 - 69 . doi: 10 .1080/ 03637758609376126.

[10]

Polignano ,

Basile , G. Semeraro, Advanced natural-based interaction for the italian language: Llamantino-3-anita , 2024 . arXiv: 2405 . 07101 .

[11]

Polignano ,

Basile , G. Semeraro, Advanced natural-based interaction for the italian language: Llamantino-3-anita , CoRR abs/2405 .07101 ( 2024 ). URL: https://doi.org/10.48550/arXiv.2405.07101. doi: 10 .48550/ARXIV.2405.07101. arXiv: 2405 . 07101 .

[12]

Zhang ,

Kishore ,

Wu ,

K. Q.

Weinberger ,

Artzi , Bertscore: Evaluating text generation with BERT , in: 8th International Conference on Learning Representations, ICLR 2020 ,

Addis

Ababa , Ethiopia, April 26-30 , 2020 , OpenReview.net, 2020 . URL: https: //openreview.net/forum?id=SkeHuCVFDr.

[13] C.-Y. Lin , Rouge: A package for automatic evaluation of summaries , in: Text summarization branches out, 2004 , pp. 74 - 81 .

[14]

Han ,

Gao , J. Liu,

S. Q.

Zhang , et al., Parametereficient fine-tuning for large models: A comprehensive survey , arXiv preprint arXiv:2403.14608 ( 2024 ).

[15]

Rafailov ,

Sharma , E. Mitchell, C. D. Manning , S.

Ermon , C.

Finn , Direct preference optimization: Your language model is secretly a reward model , Advances in Neural Information Processing Systems 36 ( 2024 ).

[1]

M. E.

Bagwell-Gray ,

J. T.

Messing , A. BaldwinWhite, Intimate partner sexual violence: A review of terms, definitions, and prevalence , Trauma, Violence, and Abuse 16 ( 2015 ) 316 - 335 .

[2]

L. C.

Butler ,

E. R.

Fissel ,

Gildea , B. S. Fisher, Understanding intimate partner cyber abuse across partnership categories based on gender identity and sexual orientation , in: Vulnerable Victimizations, Routledge, 2023 , pp. 77 - 100 .

[3]

Spluska , L. Tanczer, Threat Modeling Intimate Partner Violence: Tech Abuse as a Cybersecurity Challenge in the Internet of Things , Emerald Publishing Limited, 2021 , pp. 663 - 688 .

[4]

Gilbert ,

Zhang ,

Basile ,

Breiding , M. - j. Kresnow , Intimate partner violence and health conditions among u.s. adults -national intimate partner violence survey, 2010 - 2012 , Journal of In- A. Explanations terpersonal Violence 38 ( 2023 ) 237 - 261 .

[5]

Breiding ,

K. C.

Basile ,

S. G.

Smith ,

M. C.

Black , An example of explanations given by the human expert R . R. Mahendra, Intimate partner violence surveil- and the selected models is the following. lance : uniform definitions and recommended data Initial Sentence: He asks for my cell phone to see elements . version 2 .0, 2015 . URL: https://stacks.cdc. who I am talking to . gov/view/cdc/31292. LLaMAntino-3-ANITA 0-shot: Anita at your dis-

[6]

K. N.

Duerksen ,

E. M.

Woodin , Cyber dating abuse posal! After analyzing the sentence, I can say that this victimization: Links with psychosocial function- is not a case of cyber violence, but rather an example of ing ., Journal of Interpersonal Violence 36 ( 2021 ) control and jealousy in a dating relationship context. The NP10077-NP10105. sentence expresses toxic language for the following reasons: Controllism: the request to see who you are talking