Exploring the Relationship between News Reliability and Violent Comments in Digital Media Beatriz Botella-Gil1 , Alba Bonet-Jover1 , Robiert Sepúlveda-Torres1 , Patricio Martínez-Barco1 and Estela Saquete1 1 Department of Software and Computing Systems, University of Alicante, Spain Abstract Natural Language Processing (NLP) has become an essential tool for the automatic detection of violent language and unreliable information. The misuse of Information and Communication Technologies (ICTs) fosters the generation of disinformation and digital violence, thus polarising society. Furthermore, the lack of reliable, neutral and accurate language when presenting news can trigger an increase in negative and violent users’ reactions. It is necessary to find a linkage between disinformation and violent discourse in order to moderate online content, facilitate early detection of both phenomena and ensure healthy online behaviour. This research uses NLP techniques to explore the reliability of news headlines and their correlation with violent language generated in comments by users. In addition, the generation of a novel resource annotated in Spanish is created to jointly address the automatic detection of violent language and reliability. Keywords Natural Language Processing, Violent language, Reliability detection, Data analysis 1. Introduction other hand, addressing the viral spread of violent content can mitigate significant consequences on a much broader In our society, digitalisation exerts a profound influence population, since violent discourse, similar to disinfor- with the Internet fundamentally reshaping how commu- mation, impacts reality perception and fosters a harmful nication occurs and knowledge is obtained. This revo- atmosphere. lution has not only changed the way social connections The notion of online violence extends beyond direct are made but also transform the process of accessing new interactions and can manifest through information ma- knowledge and staying informed in real-time. This in- nipulation. The presence of digital violence in news can formation revolution has also boosted the risk of misuse compromise their neutrality, resulting in the production of Information and Communication Technology (ICT) of polarised and biased content that adversely impacts tools. users. Such violence can contribute to the proliferation If negative interactions are already experienced in the of unreliable information, casting doubt on reality and physical world, they can spread even more easily in the fostering mistrust in crucial areas such as public health virtual world due to the viral nature of content. This phe- or ideology. Research in these domains emerges as an nomenon has led to the proliferation of digital violence essential tool to counteract these phenomena and fos- and the propagation of unreliable information. ter a digital environment that is both safer and more It is crucial to comprehend the relationship between trustworthy. violence and disinformation in the digital environment. In [1] states that “in the civilised world, a language Disinformation and violence discourse have become es- is a tool that enables the exchange of information and sential areas of research. On one hand, combating the views, connects people of different nationalities, crosses dissemination of fake news in the virtual realm is cru- religious and cultural barriers, allows knowledge, under- cial to maintain information integrity and prevent the standing, and helps build unity”. However, if the language creation of societal alarms based on false facts. On the employed exhibits bias, manipulation, subjectivity, inac- curacy, or even aggression, the opposite effect may occur, SEPLN-2024: 40th Conference of the Spanish Society for Natural potentially leading to societal fragmentation rather than Language Processing. Valladolid, Spain. 24-27 September 2024. cohesion. $ beatriz.botella@dlsi.ua.es (B. Botella-Gil1 ); alba.bonet@dlsi.ua.es (A. Bonet-Jover1 ); rsepulveda@dlsi.ua.es Research has examined the detrimental impact of hate (R. Sepúlveda-Torres1 ); patricio@dlsi.ua.es (P. Martínez-Barco1 ); speech and disinformation on society, resulting in po- stela@dlsi.ua.es (E. Saquete1 ) larisation of society. By polarisation we mean “a social  0009-0004-7094-6632 (B. Botella-Gil1 ); 0000-0002-7172-0094 phenomenon characterised by the fragmentation of soci- (A. Bonet-Jover1 ); 0000-0002-2784-2748 (R. Sepúlveda-Torres1 ); ety into antagonistic factions with vehemently opposed 0000-0003-4972-6083 (P. Martínez-Barco1 ); 0000-0002-6001-5461 (E. Saquete1 ) values and identities that prevent cooperation and the © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). search for a common good” [2]. Digital media have be- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings come ideal venues for the propagation of violent and false main research carried out in the fields of news reliability content that generates group-based divisions in society. and language employed in disinformation, on the one As stated by [1], hate speech is based on the divide hand, and violent language, on the other; in Section 3, et impera concept, which intends to group society and the methodology employed for this exploratory study is turn different groups against one another. When society outlined; Section 4 delves into the exploratory analysis to becomes polarised, disinformation can proliferate more determine the relationship between disinformation and easily. In particular, as reflected in [3]’s research, Spain violence; Section 5 discusses the findings of this research, stands out as the most polarised country in Europe. and finally, in Section 6, conclusions and limitations for The intersection of violent language and news disin- future work are discussed. formation in the digital world constitutes a crucial area of today’s research. Addressing these issues will not only aid in identifying the reliability of information but also 2. State of the Art contribute to fostering a safer and more equitable online The scientific community has approached the issue of society. This research aims to conduct an exploratory online violence from various fields of knowledge, driven study on the prevalence of violence in news comments, by the alarming prevalence of violent content in digital examining both the source and the subject matter. Ad- media. Efforts have been made to address issues such as ditionally, it seeks to determine whether a correlation hate speech, cyberbullying, and toxicity, aiming to early exists between the violent content generated by readers detect these problems and propose solutions. and the reliability of that content, achieved through an Regarding the disinformation phenomenon, its objec- analysis of language usage. The objective is to determine tive is to disseminate misleading or unreliable content, whether the language used in a news article influences which undermines societal trust by instilling insecurity users to employ a higher level of violent language in the and doubts regarding the authenticity of the information comments pertaining to the news item in question. received. One of the limitations in both the task of detecting This section aims to present relevant research related violent discourse and the task of detecting disinformation to both the violent discourse and the disinformation tasks, is the scarcity of resources in Spanish to train models in as well as to define the main important concepts of these Natural Language Processing (NLP). For that reason, this lines of research, specially related to violent and reliable research focuses on the analysis of a set of news items language. and comments in Spanish, in order to analyse violent and unreliable language in the Spanish language and propose a resource for both tasks. 2.1. Language in violence discourse The main novelty of our proposal consists of examin- Language, as a means of communication, should be neu- ing the correlation between violent discourse and disin- tral. Still, its use can promote any ideology and incite formation in Spanish by means of NLP tools. To achieve hatred and violence [1]. this objective, the following contributions to this research An example of the power of language is evident in area have been provided: contexts of warfare, as seen in historical scenarios like • A new resource consisting of headlines and news the Cold War, the World War I, and the Vietnam and Iraq comments, annotated with the reliability and vio- wars [4]. Similarly, in political discourses such as the lence criteria. Reliability was manually annotated, 2016 US presidential elections or the Brexit referendum while violence was automatically annotated by [5], language has played a significant role, particularly means of an existing automatic violence classifier in addressing the issue of disinformation. originally created for tweets annotation that is In the scientific field, research on linguistic violence applied in this research to news comments anno- has been approached from various angles. It is important tation. to acknowledge the challenge in defining what consti- • An exploratory analysis of how source profiles tutes violent content. Studies have aimed to delineate and topics can influence the generation of digital different forms of violent language prevalent in ICTs, violence and exacerbate hostility among users. including: • An assessment of how language used in news ar- ticles can impact the occurrence of hate speech, • Ciberbullying: the deliberate and repetitive use and more specifically, how the language used in of specific ICTs like email, mobile phone mes- news headlines may either incite readers to ex- sages, instant messaging, and personal online hibit varying levels of violence when expressing defamatory behaviour by an individual or group, their opinions on an issue. with the intention of hostilely causing harm to another party [6]. The article is structured as follows: Section 2 shows the • Hate speech: any form of communication that 2.2. Language in disinformation discriminates against an individual or group Language also plays a very important role in disinfor- based on characteristics such as race, ethnicity, mation detection and there are key linguistic indicators gender, sexual orientation, nationality, religion, that are characteristic of news items. As described by or other identifying features [7]. [4], the essence of fake news or deceptive information • Toxicity: this term is defined by [8] and [9] as is the news text, and, consequently, the language: “the those messages that include unacceptable, rude, news text is the basic communicative unit of journalism and disrespectful comments. They are messages and consequently the basic unit of analysis in most re- of contempt that are part of what is called toxic search on fake news”. First of all, it is important to define discourse or toxic language, which either invite several key concepts in this field of research: fake news, other users to leave the discussion or use lan- disinformation, misinformation, reliability and veracity. guage at the same level as the discussion. • Offensiveness: in [10] asserts that this term is • Fake news: even if there is no universal defini- commonly defined as hurtful, derogatory, or ob- tion, it is commonly defined as “any news that is scene remarks directed from one person to an- suspected to be inaccurate, biased, misleading, or other1 . fabricated” and is understood as a “product of a range of practices that are related to the validity In addition, we also find other works that further spec- of information being shared by the news media” ify the type of violence they study, such as in the case of [4]. misogyny or racism [11]. Our research will use the terms • Disinformation vs. Misinformation: in [1] “violent language” or “violence” to refer to any type of states that “the information shared with malicious message containing violent content. intent is recognised as disinformation, whereas Given the vast volume of data present in the virtual the same information shared by a poorly in- environment, manual detection of violent messages by formed party is considered as misinformation”. humans is impractical. This underscores the significance The main difference lies in the intention: disinfor- of NLP tools in our research. From a NLP standpoint, mation refers to that content which is intention- detecting hate speech can be conceptualised as text clas- ally and deliberately created with the intention to sification. As stated by [12], “the automatic detection of deceive while misinformation is inaccurate infor- this kind of speech is usually addressed as a classifica- mation that can results from an honest mistake, tion task, and it is related to a family of other tasks such negligence or unconscious bias [22]. Briefly, as as detecting cyberbullying, offensive language, abusive stated by [23], “disinformation is also wrong infor- language, toxic language, among others”. mation but, unlike misinformation, it is a known From this perspective, numerous research efforts fo- falsehood”. cus on detection methods, including the development of • Reliability vs. Veracity: these concepts are resources aimed at aiding identification. These resources closely related, but from what can be observed in primarily consist of lexicons containing lists of negative the literature, the term veracity is usually used in words or expressions that may be present in messages, tasks in which the information is contrasted and serving as indicators of potentially violent content. These verified [24], while the concept of reliability is word lists are employed in binary classification models to most commonly used in methods where the cred- ascertain the presence or absence of hate speech [13], vi- ibility of the source of the news is investigated, as olent language [14], abusive language [15], and offensive is the case of the proposed source-based method language [16]. [25]. Furthermore, corpus generation is utilised in vari- ous violence detection strategies, thereby enhancing the For this research, we will focus on the disinformation model’s capability to effectively discern both violent and and the reliability concepts. Reliability is an essential non-violent content [17, 18]. metric to be considered when assessing the quality of Besides, efforts have been made to identify the most information. Some of the indicators that influence the suitable detection tools that offer improved and expedited reliability of a news item are: the ambiguity of the infor- solutions to the problem. This includes exploring strate- mation, the lack of data and sources [26], the intention of gies such as heuristic techniques [19], Machine Learning hiding information, the representativeness or opacity of (ML) [20], and Deep Learning (DL) [21]. the headline, the external quotes from experts, the quotes from studies and organisations, emotional-charged ex- pressions [27], or stylistic features such as the punctua- tion, the extension, the use of capital letters or informal 1 https://thelawdictionary.org/ or swear words [28]. In addition, [29] suggested that there are ten indicatorswhich presents a dataset based on user responses to posts that are most likely to detect the reliability of a message from Argentinian digital newspapers on Twitter. Their and these appear when the message is: complete, concise, research is specially focused on the hate speech detection coherent, well presented, objective and representative; task but the authors work with replies to digital news- when it contains no spin, uses expert sources, is perceived papers posts, which is an interesting point of view for to have an impact and is professional. our research because we also work with users’ replies, in As in Section 2.1, NLP plays an essential role in this our case with news comments, but instead from digital line of research, as the continuous and easy access to newspapers posts, we focus on digital news. the internet, the large volume of misleading content and These authors also describe in their article a work in its rapid viralisation make it impossible to process and progress on hate speech that analyses Spanish tweets treat data manually in the time required and before it linked to newspapers, which is also related to our work, becomes pervasive in society. For that reason, disinfor- albeit with a different type of document. mation detection needs to be automated by means of NLP In [32] aim to analyse the relationship between the techniques. consumption of misinformation and the online hate and Disinformation detection is specially being addressed toxic language. To that end, they analyse a corpus of as a classification task, by training supervised ML models comments on Italian Youtube videos, distinguishing be- to automatically distinguish between real and fake news tween four categories of hate speech and categorising [4]. For that purpose, annotated corpora are needed to two types of speech channels: questionable (channels mark patterns of language that will help in the training likely to disseminate unverified and false content) and and classification processes. Most corpora generated reliable (the remainder of the channels). in this domain is composed of news articles classified In [33] analyse a large Twitter dataset annotated with following several techniques, ranging from stylistic and hate speech and counterhate speech. They state that, linguistic annotations to binary classifications depending even if misinformation and hate speech have been tack- on fact-checking verification [28, 30, 26]. In addition, led in parallel, they are often interwoven because “biased several methods are being applied to the disinformation people justify and defend their hate speech using misin- problem based on knowledge, style, propagation, source formation”. That reason makes this linkage crucial for or even hybrids methods, each of them approaching the effective online content moderation. automatic detection from different approaches [25]. To the best of the authors’ knowledge, there is limited research that conducts an analysis connecting these two 2.3. Violence and Reliability: a new line of lines of research (disinformation and violent discourse), especially in the context of Spanish language. Further- research more, our research proposes a comprehensive analysis Polarisation emerges as a key factor linking the phe- of both news reliability and violent discourse, using a nomena of disinformation (in particular, reliability) and newly created resource tailored specifically to address violent online messages that initially appear disparate. these two research areas. Disinformation, by spreading erroneous or misleading information, can intensify polarisation by encouraging the generation of radical opinions and the adoption of 3. Methodology hostile language. Polarisation, in turn, acts as a catalyst The main objective of this research is to ascertain the for the spread of hate speech, creating an environment correlation between various attributes of news articles, conducive to distrust towards those with divergent views. including topic, source credibility, and content reliability, This vicious cycle continuously feeds back on itself, with and the emergence of violence and polarisation within disinformation and hate speech fuelling polarisation, and readers’ comments on said articles. To achieve this goal, with polarisation facilitating the spread of more disinfor- two principal analyses are conducted: i) an exploratory mation and hate. analysis into the prevalence of violent language within Through a detailed analysis of relevant case studies digital media, aiming to assess the degree of violence rela- and research, it is assessed how this cycle contributes tive to topic and source; and ii) a study of the association to the escalation of online violence and undermines the between the reliability of news headlines and the occur- foundations of social cohesion and democracy [31]. It rence of violent language within readers’ comments on also discusses possible strategies to address these prob- news articles. The methodology employed to accomplish lems, highlighting the importance of media education, this objective entails a dual process involving data collec- the promotion of critical thinking and the building of tion and the analysis of both the violent language within more inclusive and respectful online communities. comments and the reliability of news headlines. This Among research linking hate speech and disinforma- is achieved through annotation using specific schemes tion, it can be highlighted the work presenting by [12], tailored for each aspect. The chosen system classified a total of 1,757 comments as Violent and 3,940 comments as Non-Violent. Following 3.1. Data collection the application of automatic classification, manual super- vision was conducted by an expert in NLP specialised in News was gathered using a web crawler that extracted criminology and violent language. data from ten Spanish digital sources. A diverse sample of widely-read national media digital newspapers was se- 3.3. Reliability analysis lected, each representing various ideological orientations. This diverse selection enables us to assess the prevalence The objective of this second analysis is to identify a cor- of violent language across different news sources. To relation between the usage of violent language in news analyse the extent of violence by topic, ten news items comments and the reliability of news headlines. As de- from each source were examined, resulting in a total tailed in (anonymous), the reliability of content is as- of 100 articles for this initial study. These articles en- sessed by considering the objectivity and accuracy of compass a range of topics, ranging from politics, society, the language used. Through this analysis, a research is economics, health, and sports. conducted to determine whether a relationship exists be- Given that news articles typically maintain a more tween disinformation (unreliable news) and the hatred neutral tone and considering that many of the chosen expressed in the comments pertaining to the news item sources consist of newspapers authored by professional in question. journalists, the decision was made to evaluate violence within the comments section rather than within the ar- 3.3.1. Data annotation ticles themselves. The objective is to gauge the level of violence stemming from readers’ comments, as these The annotation task for this second analysis was con- comments reflect individual users’ opinions and, for the ducted manually. Out of the 100 headlines, 31 were cate- most part, are not moderated, offering insights into how gorised as Unreliable, while 69 were categorised as Reli- users express themselves. able. All annotations were performed by an expert in NLP Moreover, concerning disinformation, the focus of this specialised in linguistics and disinformation detection. study is placed on the content reliability, as unreliable The annotation of headlines considered the following content is deemed potential disinformation. In this re- reliability criteria, as outlined in (anonymous): gard, to investigate whether the reliability of the news • Data accuracy: data should avoid vagueness or items influences the level of present violence, news head- ambiguity. The employment of evasive or indefi- lines were used to examine if there is a relationship be- nite expressions suggests concealment or an in- tween news reliability and violence in comments. ability to substantiate a fact. Additionally, the absence of evidence, such as scientific studies or 3.2. Violent language analysis verified official data, undermines the reliability of In this initial analysis, the primary objective is to examine a news item. the prevalence of violent language within the digital jour- • Data objectivity: the information presented nalism landscape. To achieve this, news articles collected should maintain neutrality. Information that from various digital newspapers presented diverse po- sways the reader either positively or negatively, litical affiliations and levels of popularity. The selection or that reflects the author’s standpoint through criteria included not only the newspapers’ popularity but personal remarks or experiences, indicates low also their editorial style and user engagement. reliability. Subjective data make the reader more vulnerable to believe unreliable news items. 3.2.1. Data annotation • Headlines style: headlines should be informa- tive, concise and neutral. Alarmist, subjective, The annotation process of the downloaded comments was opaque and striking headlines are characteristic performed using an automatic violence classifier (anony- of unreliable news items, as well as clickbait head- mous). This classifier was developed by fine-tuning a lines, which are those “sensational headlines that RoBERTa model in Spanish, using a dataset of tweets often exaggerate facts, usually to entice readers to annotated with Violent and Non-Violent labels. Given click on them” [34]. Other features, such as long the similarity in the language and length between tweets headlines or the presence of many exclamation and news comments, the system was applied to this re- marks or words in capitals can also influence the search. Moreover, it yields significant results in discern- reliability of the content. ing whether a tweet demonstrates violence, achieving an 𝐹1 𝑚 of 0.854 in a test set. 4. Exploratory analysis to Nazi ideology, confirming what is known as Godwin’s law, which states that as an online discussion lengthens, This section outlines three preliminary studies examin- the probability of someone comparing another person or ing the use of violent language within the digital news group of people to Nazis or Adolf Hitler tends to increase, landscape and its correlation with various external fac- regardless of the topic or position being debated [35]. For tors, including topic, source, and the reliability of the example: news content. These studies facilitate a comprehensive understanding of violent discourse behaviour within this • Comment in political news item: socialistas? = context, as well as the impact of news language relia- autoritarios fascionazis!! [socialists? = fascionazi bility on the proliferation of violent language in news authoritarians!!] comments. • Comment in sport news item: Mucho machi- nazi llorando [A lot of machinazi crying] 4.1. Level and type of violence by topic After analysing violent discourse related to the topics of 4.2. Level of violence by source the news items, the results obtained regarding the level One of the factors we aimed to analyse was the percent- of violence across different news topics are presented in age of violence generated by the source, contingent upon Table 1. its credibility. As stated by [25],“one can detect fake news by assess- Table 1 ing the credibility of its source, where credibility is often Violence percentage per news topic. defined in the sense of quality and believability”. This research proposes to relate credibility to the content anal- Topic Violence ysed according to the proposed reliability criteria in Sec- Politics 38.5% tion 3.3.1. Since there is currently no standard measure Economics 28.6% of media credibility, but several factors are taken into Society 27.0% account, in this research we will consider as credible Sport 24.6% sources those presenting reliable content, while those Health 23.5% sources whose news items analysed did not meet the reliability criteria of neutrality, objectivity, accuracy and coherence were classified as sources with low credibil- As can be observed, political comments exhibit the ity. To accomplish this, a correlation will be established highest level of violence (38.5%), followed by economics, between the number of headlines annotated as reliable society, sport and health, with the latter having the lowest and unreliable, and the credibility level of the media in percentage of violent language. which these news items are published. It is important to emphasise that while the news items In this regard, three levels of credibility will be delin- align with these topics as categorised by digital media, eated: high credibility, indicated when the percentage of comments that deviate from the associated topic of the reliable headlines exceeds 80%; low credibility, identified news items have been encountered. For instance, eco- when over 80% of the analysed headlines were deemed nomic news may also attract violence directed towards unreliable and failed to meet the established reliability cri- the political sphere due to the close relationship between teria; and medium credibility otherwise, denoting sources these topics. Similarly, in sports news, sexist comments that present both reliable and unreliable headlines in simi- may arise, particularly when addressing women’s foot- lar proportions. In our study, upon analysing all the news ball, as exemplified by: headlines, six sources were classified as highly credible, three were identified as having low credibility, and one • Violent comment: Descanse en paz la z empoder- source was deemed partially credible. ada que quería hacer cosas de chicos, como jugar al Results obtained concerning the level of violence ac- fútbol y conducir su propio coche, con el riesgo que cording to the sources credibility are presented in Table eso tiene. A trabajos y actividades de hombres, ries- 2. gos de hombres, eso es todo. [Rest in peace to the From these results it can be concluded that the more empowered z who wanted to do boys’ things, like credibility the source has, the lower the percentage of play football and drive her own car, with all the violence is generated in the comments associated with the risk that entails. Men’s jobs and men’s activities, news items published. Still, since this was a preliminary men’s risks, that’s all.] study to prove if there was a linkage between reliability In addition, we found cases in different topics where and violence, the analysis was carried out in a small the discussion drifts into radicalisation with comparisons sample. As the results show that we can go even deeper Table 2 Violence percentage per source credibility. Sources credibility Violent comments Non-violent comments High 28.42% 71.58% Medium 34.57% 65.45% Low 57.85% 42.15% 70 64.73 60 51.34 48.66 50 40 35.27 30 20 10 0 % Non-Violent % Violent Reliable news Unreliable news Figure 1: Degree of violence depending on the reliability of the headline. to test our hypothesis, this analysis will be carry out in a higher number of violent comments. This implies that larger sample in future work. more than 61.29% of the unreliable headlines elicited more negative responses from users, evidencing a rela- 4.3. Degree of violence depending on the tionship between the unreliability of a news item and the propensity of users to express violence in their com- reliability of the headline ments. One of the main purposes of this research is to find if there On the other hand, of the 69 news items that have is a relationship between the reliability concept (mostly been classified as reliable, only 9 of these news items related to disinformation tasks) and the violent discourse. had more violent comments. Therefore, 81.15% of these This work aims to join these two lines of research to delve news items recorded more non-violent comments and into the misleading and malicious digital content and only 5.79% generated an equal amount of violent and propose a preliminary combined solution to automatic non-violent comments. These findings support the relia- detect both disinformation and violent discourse. This bility of the headlines, suggesting that those news items analysis is carried out in the news scenario on the basis considered more reliable tend to generate a lower pro- of the following data: portion of violent comments compared to the unreliable ones. • News: 69 Reliable news (69%) and 31 Unreliable The following examples show how the way a headline news (31%). is presented or written can incite users to generate more • Comments: 3,940 Non-Violent comments violent content in comments: (69.15%) and 1,757 Violent comments (30.85%). • Unreliable headline: Pedro Sánchez en Estras- After conducting the analysis contrasting the violence burgo; sentí vergüenza ajena [Pedro Sánchez in generated in the comments of news articles with a reliable Strasbourg; I felt ashamed] headline versus those with an unreliable headline, as – Violent comment: Usted y todos. Nos dejó depicted in Figure 1, it is evident that reliable headlines a la altura del betún, como si fueramos una generate less violence than unreliable ones. dictadura bolivariana cualquiera. (Que, sin Following our analysis, we observed that of the 31 duda, es lo que el quiere. . . ) [You and every- headlines classified as unreliable, 19 of them attracted a one else. He made us feel small, as if we were a Bolivarian dictatorship (which, no media sources plays a significant role in shaping the tone doubt, is what he wants...)] and content of user-generated comments. A lack of trust in the source can trigger negative and hostile reactions, • Unreliable headline: Detengamos a la izquierda underscoring the pivotal role of integrity and reliability ya [Let’s stop the left now] in digital journalism. – Violent comment: La izquierda no existe Reliability: the reliability of the language used in es todo extrema derecha los que se dicen de a news item can indeed impact both the quantity and izquierdas en realidad no lo son, es una falsa tone of the violent comments it provokes. Trustworthy izquierda que traicionando al pueblo se ha media outlets typically uphold ethical and professional aliado con la banca usurera satanica que es standards in information presentation, thus lowering the la que hay que detener el verdadero enemigo probability of users responding in such an aggressive de la humanidad [The left does not exist it manner. Conversely, media with a lower credibility level is all extreme right those who say they are may disseminate misleading, biased, or sensationalist leftists actually they are not, it is a false left content, prompting negative and violent reactions from that betraying the people has allied itself readers. with the satanic usurious banking institu- Additionally, it is crucial to highlight the significance tion which is the real enemy of humanity of social media moderators used by certain media out- that must be stopped] lets. These moderators play a vital role in curbing the dissemination of violent comments by moderating and As can be observed in the previous examples, the lan- censoring those that contravene website usage policies. guage employed in news headlines is inherently biased, characterised by a subjective style that mirrors the au- thor’s polarised political stance. The manner in which 6. Conclusions and future work the headline is formulated and presented allows little In this study, we have investigated the correlation be- space for readers to formulate their own conclusions, but tween violence and disinformation. By integrating these rather prompts them to generate negative and violent two research approaches, our aim is to ascertain whether content on the given topic. a relationship exists between violent language and the reliability language in news. 5. Discussion Our findings show that addressing both violence and reliability concurrently may facilitate the identification The present research has yielded a series of significant and mitigation of malicious digital content. For instance, results, which are detailed below. the level of violence in comments may correlate with the Topics: our analysis indicates that politics emerges reliability of news headlines and the credibility of the as the subject with the highest incidence of violence media outlets. in comments across various news topics. This finding For future work, we aim to expand the initial resource suggests that the polarising and impassioned nature of created to propose a novel corpus annotated with relia- politics encourages users to express themselves more bility and violent language that facilitates the automatic freely, potentially leading to more confrontational and detection of both tasks. The objective is to ensure a bal- hostile interactions. Additionally, it is noteworthy that anced distribution of the news articles across different health garners the lowest frequency of violent comments, topics in order to offer a more comprehensive and repre- possibly due to its relatively lower prominence in our sentative understanding of the diverse types of violence selection of news articles. prevalent in the digital sphere. We have also found that there is not always a relation- In addition to expanding and balancing the corpus, ship between the topic of the news item and the type of a manual and meticulous revision process of the auto- violence that appears in users’ comments. As explained matic violence annotation made by the classifier will in Section 4.1, we found cases of news items classified be carry out, in order to ensure the correct classifica- as economics or sports containing comments of sexist or tion of the comments. This revision task, which will be political violence. accomplished by two NLP experts in linguistics and vi- Credibility: the analysis also uncovers a direct asso- olent discourse, will enable the generation of a quality ciation between the credibility of media sources and the annotated resource for both the violence discourse and prevalence of violence in online comments. Users are in- disinformation tasks. clined to express a higher frequency of violent comments Finally, it is proposed to conduct an assessment of a on news sourced from outlets that are perceived as less set of comments categorised as Non-Violent to determine reliable. This observation suggests that the credibility of whether they present concealed violence, that is, whether language employed subtly hides violence patterns, such [5] R. Greifeneder, M. E. Jaffe, E. J. Newman, in the case of irony or sarcasm. Alongside further exami- N. Schwarz, The Psychology of Fake News: Ac- nation of the extent of violence, a future hypothesis will cepting, Sharing, and Correcting Misinformation, focus on studying whether reliable news correlates with Routledge, New York, NY, 2020. levels of mild and more subtler forms of violence, and [6] B. Belsey, Cyber-bullying: An emerging whether unreliable news demonstrates more pronounced threat to the ‘always on’generation, 2006, violence expressed in a more aggressive manner. This Retirado de: http://www. cyberbullying. step will contribute to refining detection methods and ca/pdf/Cyberbullying_Article_by_Bill_Belsey. enhancing the accuracy of evaluations. pdf (2014). In summary, this study represents a significant step [7] W. Warner, J. Hirschberg, Detecting hate speech on towards a deeper understanding of the linkage between the world wide web, in: Proceedings of the second violence and disinformation in the NLP context, laying workshop on language in social media, 2012, pp. the groundwork for future research efforts. This research 19–26. aims at fostering a safer and healthier digital environment [8] R. Nielsen, N. de Domenico, Volume and patterns by creating a resource that combines both news reliability of toxicity in social media conversations during the and violent language of comments and thus proposing a covid-19 pandemic (2020). common strategy to address these two research lines. [9] E. Wulczyn, N. Thain, L. Dixon, Ex machina: Per- sonal attacks seen at scale, in: Proceedings of the 26th international conference on world wide web, Acknowledgments 2017, pp. 1391–1399. [10] M. Wiegand, J. Ruppenhofer, T. Kleinbauer, De- The research work is part of the R&D&I projects: tection of abusive language: the problem of biased CLEAR.TEXT: Enhancing the modernization public datasets, in: Proceedings of the 2019 conference of sector organizations by deploying Natural Language the North American Chapter of the Association for Processing to make their digital content CLEARER to Computational Linguistics: human language tech- those with cognitive disabilities” (TED2021-130707B- nologies, volume 1 (long and short papers), 2019, I00), funded by MCIN/AEI/10.13039/501100011033 pp. 602–608. and “European Union NextGenerationEU/PRTR”; [11] F.-M. Plaza-Del-Arco, M. D. Molina-González, L. A. COOLANG.TRIVIAL: Technological Resources for Ureña-López, M. T. Martín-Valdivia, Detecting Intelligent VIral AnaLysis (PID2021-122263OB-C22) misogyny and xenophobia in spanish tweets us- funded by MCIN/AEI/10.13039/501100011033/ and ing language technologies, ACM Transactions on by "ERDF A way of making Europe"; SOCIALFAIR- Internet Technology (TOIT) 20 (2020) 1–19. NESS.SOCIALTRUST: Assessing trustworthiness [12] J. M. Pérez, F. M. Luque, D. Zayat, M. Kondratzky, in digital media (PDC2022-133146-C22) funded by A. Moro, P. S. Serrati, J. Zajac, P. Miguel, N. De- MCIN/AEI/10.13039/501100011033/ and by the "Euro- bandi, A. Gravano, et al., Assessing the impact of pean Union NextGenerationEU/PRTR". At regional level, contextual information in hate speech detection, this research has been funded by the project NL4DISMIS: IEEE Access 11 (2023) 30575–30590. Natural Language Technologies for dealing with dis- and [13] G. Xiang, B. Fan, L. Wang, J. Hong, C. Rose, Detect- misinformation with grant reference (CIPROM/2021/21) ing offensive tweets via topical feature discovery by the Generalitat Valenciana. over a large scale twitter corpus, in: Proceedings of the 21st ACM International Conference on In- References formation and Knowledge Management, 2012, pp. 1980–1984. [1] M. Konieczny, Ignorance, disinformation, manipu- [14] P. Burnap, M. L. Williams, Cyber hate speech on lation and hate speech as effective tools of political twitter: An application of machine classification power, Policija i sigurnost 32 (2023) 123–134. and statistical modeling for policy and decision [2] A. J. Stewart, N. McCarty, J. J. Bryson, Polariza- making, Policy & internet 7 (2015) 223–242. tion under rising inequality and economic decline, [15] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Science advances 6 (2020) eabd4201. Y. Chang, Abusive language detection in online [3] N. Gidron, J. Adams, W. Horne, How ideology, eco- user content, in: Proceedings of the 25th inter- nomics and institutions shape affective polarization national conference on world wide web, 2016, pp. in democratic polities, in: Annual conference of 145–153. the American political science association, 2018. [16] F. M. Plaza-del Arco, A. B. P. Portillo, P. L. Úda, [4] J. Grieve, H. Woodfield, The language of fake news, B. Gil, M.-T. Martín-Valdivia, SHARE: A lexicon Cambridge University Press, 2023. of harmful expressions by Spanish speakers, in: Proceedings of the Thirteenth Language Resources 766. and Evaluation Conference, 2022, pp. 1307–1316. [29] A. Appelman, S. S. Sundar, Measuring message [17] M. Corazza, S. Menini, E. Cabrio, S. Tonelli, S. Vil- credibility: Construction and validation of an ex- lata, A multilingual evaluation for online hate clusive scale, Journalism & Mass Communication speech detection, ACM Transactions on Internet Quarterly 93 (2016) 59–79. Technology (TOIT) 20 (2020) 1–22. [30] H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, Y. Choi, [18] V. Kolhatkar, H. Wu, L. Cavasso, E. Francis, Truth of varying shades: Analyzing language in K. Shukla, M. Taboada, The SFU opinion and com- fake news and political fact-checking, in: Proceed- ments corpus: A corpus for the analysis of online ings of the 2017 conference on empirical methods in news comments, Corpus Pragmatics 4 (2020) 155– natural language processing, 2017, pp. 2931–2937. 190. [31] G. Americans, Medición del impacto de la informa- [19] F. Huang, H. Kwak, J. An, Chain of explanation: ción falsa, la desinformación y la propaganda en New prompting method to generate quality natural américa latina, Global Americans (2021). language explanation for implicit hate speech, in: [32] M. Cinelli, A. Pelicon, I. Mozetič, W. Quattrociocchi, Companion Proceedings of the ACM Web Confer- P. K. Novak, F. Zollo, Dynamics of online hate and ence 2023, 2023, pp. 90–93. misinformation, Scientific reports 11 (2021) 22083. [20] S. Rosenthal, P. Atanasova, G. Karadzhov, [33] J. Y. Kim, A. Kesari, Misinformation and hate M. Zampieri, P. Nakov, A large-scale semi- speech: The case of anti-asian hate speech during supervised dataset for offensive language the covid-19 pandemic, Journal of Online Trust and identification, arXiv preprint arXiv:2004.14454 Safety 1 (2021). (2020). [34] S. Chawda, A. Patil, A. Singh, A. Save, A novel [21] C. Arcila-Calderón, J. J. Amores, P. Sánchez- approach for clickbait detection, in: 2019 3rd In- Holgado, D. Blanco-Herrero, Using shallow and ternational conference on trends in electronics and deep learning to automatically detect hate moti- informatics (ICOEI), IEEE, 2019, pp. 1318–1321. vated by gender and sexual orientation on twitter [35] F. A. Wilson, Enough Already!: A Socialist Femi- in Spanish, Multimodal technologies and interac- nist Response to the Re-emergence of Right Wing tion 5 (2021) 63. Populism and Fascism in Media, Brill, 2020. [22] D. Fallis, The varieties of disinformation, The philosophy of information quality (2014) 135–161. [23] B. C. Stahl, On the difference or equality of informa- tion, misinformation, and disinformation: A critical research perspective, Informing Science 9 (2006) 83. [24] S. Vosoughi, D. Roy, S. Aral, The spread of true and false news online, science 359 (2018) 1146–1151. [25] X. Zhou, R. Zafarani, A survey of fake news: Fun- damental theories, detection methods, and opportu- nities, ACM Computing Surveys (CSUR) 53 (2020) 1–40. [26] S. Mottola, Las fake news como fenómeno social. análisis lingüístico y poder persuasivo de bulos en italiano y español, Discurso & Sociedad (2020) 683– 706. [27] A. X. Zhang, A. Ranganathan, S. E. Metz, S. Appling, C. M. Sehat, N. Gilmore, N. B. Adams, E. Vincent, J. Lee, M. Robbins, et al., A structured response to misinformation: Defining and annotating cred- ibility indicators in news articles, in: Companion Proceedings of the The Web Conference 2018, 2018, pp. 603–612. [28] B. Horne, S. Adali, This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news, in: Proceedings of the international AAAI conference on web and social media, volume 11, 2017, pp. 759–