=Paper=
{{Paper
|id=Vol-3340/39
|storemode=property
|title=Data Analytics on Twitter for Evaluating Women Inclusion and Safety in Modern Society
|pdfUrl=https://ceur-ws.org/Vol-3340/paper39.pdf
|volume=Vol-3340
|authors=Loredana Caruccio,Stefano Cirillo,Vincenzo Deufemia,Giuseppe Polese,Roberto Stanzione,Giuseppina Andresini,Andrea Iovine,Roberto Gasbarro,Marco Lomolino,Marco de Gemmis,Annalisa Appice,Bruno Casella,Roberto Esposito,Carlo Cavazzoni,Marco Aldinucci
|dblpUrl=https://dblp.org/rec/conf/itadata/CaruccioCDPS22
}}
==Data Analytics on Twitter for Evaluating Women Inclusion and Safety in Modern Society==
Data Analytics on Twitter for Evaluating Women Inclusion and Safety in Modern Society Loredana Caruccio, Stefano Cirillo, Vincenzo Deufemia, Giuseppe Polese and Roberto Stanzione University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, Salerno, Italy Abstract The inclusion and safety of women in modern society is a problem that over the years has become increasingly important in all countries, leading to a large number of awareness campaigns and social movements. Thus, several research communities have investigated various aspects of this problem, mainly focusing on interpreting the behaviors of people, also according to their geopolitical situations, for preventing and reducing discrimination situations against women. In this paper, we present a study on the resonance of the gender equality issues on the Twitter social network. In particular, we collected 4.4 million tweets shared by over 1 million users and applied data analytics techniques to evaluate statistics, correlations between the most common hashtags, and location-based sentiment analysis. Keywords Big Data Analytics, Gender Equality, Woman Inclusion, Twitter, Sentiment Analysis “I raise up my voice – not so that I can shout, but so that those without a voice can be heard”. Malala Yousafzai 1. Introduction The inclusion of women in society represented one of the main challenge that accompanied the history in the last centuries. In fact, although women always contributed to the cultural, scientific, and economic growth of the society, they had to fight for obtaining own rights and for starting in being considered equal to the men. The major examples of women’s battles are related to suffrage campaigns, which gave women the right of voting only in the last century (in most cases). Nevertheless, although from a legal point of view all women have the right to vote, in some regions appear quite difficult for them to apply their right due to societal norms, harassment, and violence at the polls, or pressure from their husbands1 . The gender equality still represents an important achievement to be addressed in the modern society. In fact, it has been included into the 17 Sustainable Development Goals defined by the United Nations (UN) in 20152 . Within it, the women’s empowerment and gender equality have been structured as 9 specific worldwide targets that must be reached until 2030, which range ITADATA2022: The 1𝑠𝑡 Italian Conference on Big Data and Data Science, September 20–21, 2022, Milan, Italy Envelope-Open lcaruccio@unisa.it (L. Caruccio); scirillo@unisa.it (S. Cirillo); deufemia@unisa.it (V. Deufemia); gpolese@unisa.it (G. Polese); r.stanzione9@studenti.unisa.it (R. Stanzione) Orcid 0000-0002-2418-1606 (L. Caruccio); 0000-0003-0201-2753 (S. Cirillo); 0000-0002-6711-3590 (V. Deufemia); 0000-0002-8496-2658 (G. Polese) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 1 https://worldpopulationreview.com/country-rankings/countries-where-women-cant-vote 2 https://www.globalgoals.org/goals/5-gender-equality/ from the ending of discrimination and violence against women to the inclusion of women in political, economical, and decision-making processes (by also promoting it through technology). In fact, different studies highlighted that the gender equality not only represents a human right issue, but also a precondition for the sustainable development [1], even though gender inequalities across economic, social and environmental dimensions remain widespread and persistent, especially in some areas of the world [2]. Consequently, many social movements and organizations were born with the mission of promoting the gender equality and highlighting any discrimination towards the woman empowering in the society. They typically feed the worldwide resonance of the problem, enabling people to be informed, think, and act against it. In this paper, we present the analysis that we performed on the resonance of the gender equality issues over the Twitter social network. In particular, we collected 4,4 million tweets shared by over 1 million users across the world, in a topic-specific dataset, named wishes- Women Inclusion and Safety for Holding Equality in Society. Furthermore, after an hydrating process, the dataset has been first used to perform a quantitative analysis highlighting major statistics of collected tweets, which have been successively used to perform a sentiment analysis by also considering the geopolitical scenario of different countries. The remainder of the paper is organized as follows. In Section 2 we describe relevant works concerning previous analysis over the gender equality issues. Then, Section 3 introduces the wishes dataset, by highlighting the building process we considered, and the main statistics concerning collected tweets. Instead, the evaluation of the people perception, performed through a sentiment analysis, is presented in Section 4. Then, final considerations and future directions conclude the paper. 2. Related Work Over the past decade, social media platforms have been an important means of communication for people, and their impact on daily lives is increasing every day. Thanks to their accessibility, social networks represent a big source of information that can be used to extract insights for analyzing collective thinking about specific issues, such as the role of women in the modern society. In this section, we provide an overview of some related works dealing with the issues of violence against women and the digital gender divide, which represent the main topics we considered throughout the proposed analysis. In [3] the authors analyzed 2,5 million tweets and then compared tweets about VAW (Violence Against Women) with tweets on other topics like politics or sports. In particular, they discovered that for the VAW topic there is a higher average of users and tweets per thread and also a higher average of thread depth. These tweets have been further analyzed in [4], in which authors focused their study on tweets in Arabic, by performing a sentiment analysis to extract insights on how the issue of VAW is treated, and they noticed that this topic is gaining attention in the Arabic world. Similar analysis also focused to South African [5] and Indonesian tweets [6]. In other recent studies, authors have extracted and analyzed the most used terms and the most discussed topics related to domestic violence in the shared tweets [7] and in news articles [8], by mainly focusing the analysis on articles related to gender-based violence from newspapers in India, Pakistan, and the United Kingdom. In [9], the authors conducted a similar analysis in Australia, considering live chats on Twitter related to a special Q&A episode dedicated to the topic of domestic violence. The authors collected tweets written from the beginning of the episode until the next day, and the results of their study showed that there were strongly conflicting opinions and no agreement on condemning domestic violence. Other Recent articles, such as [10] and [11], investigated the possible correlation between misogynistic tweets and episodes of violence in the United States. In particular, the first study reported a correlation with the number of rapes per capita, while the second one highlighted a correlation between tweets and episodes of domestic violence. Another topic on which the research has focused is the digital gender divide (DGD), which indicates imbalances in access to Information and Communication Technologies (ICTs). Several studies focus on developing countries [12, 13, 14], highlighting the obstacles faced by women in the use of technology. In these studies, authors mention lack of education, unfavorable employment, and income conditions as main causes of DGD. As widely discussed in [15] and [16], the DGD problem does not exist only in developing countries. In fact, these studies have highlighted how this disparity is also present in Europe. Among the mentioned factors, there are stereotypes, prejudice, male hostility, and other women- related problems like pregnancy and maternity. Quantifying DGD is not always easy because of the lack of data, especially in low-income countries. To address this problem, in [17] the authors proposed a method to track the gender digital gap using Facebook’s advertisement audience estimates. They showed that with the proposed method it is possible to extend the calculation of the gender gap indicators to more countries, especially low and middle-income ones. Unlike the papers introduced above, the proposed study introduces a new up-to-date dataset of tweets related to the role of women in different geopolitical areas, and provides a detailed multilingual sentiment analysis to evaluate the inclusion and safety of women in modern society. 3. The wishes dataset In this section, we provide an overview of the wishes dataset and analyze the different types of tweets collected from Twitter. Moreover, we discuss the strategy for retrieving the content of tweets with the aim to extract statistics from them and analyze the correlation between the most frequent hashtags. Finally, we provide a preliminary analysis of people’s collective thoughts by analyzing the content of the tweets. wishes dataset is publicly accessible on GitHub3 . 3.1. Overview of the wishes Dataset The creation of the dataset has required the definition of an extraction module capable of continuous monitoring tweets shared by users and collecting the most pertinent ones. This module has been configured for interacting with Twitter streaming API and reading the tweets related to a set of keywords concerning the role of women in the society and domestic violence. To define the set of keywords involved in our monitoring, we started from a small set of popular keywords manually extracted from the most popular hashtags used on Twitter that have been provided by online services4 , and collected tweets for one day aiming to extract keywords that co-occur with the initial ones. We have considered a set over 30 keywords representing 3 https://github.com/Roby46/Wishes_tweets_dataset 4 Tweeplers 107 Number of tweets 106 105 104 103 102 101 100 0 Turkish English Spanish Greek French Tagalog Japanese Hindi Dutch Indonesian Arabic Portuguese Korean Tamil Italian Danish Gujarati Basque Norwegian Lithuanian Telugu Bengali Malayalam Amharic Nepali Icelandic Divehi Vietnamese Hebrew Panjabi Slovenian Latvian Ukrainian Sinhala Oriya Serbian Bulgarian German Urdu Estonian Catalan Thai Haitian Polish Persian Chinese Romanian Finnish Swedish Marathi Russian Welsh Czech Hungarian Kannada C. Kurdish Sindhi Lao Pushto Burmese Tibetan Armenian C. Khmer Georgian Figure 2: Language distribution of tweets in wishes dataset. the most used in tweets from around the world, such as “#abuseisnotlove”, “#youarenotalone”, “#domesticabuse”, or “#domesticviolence”. The collection step has been continuously performed since November 10th, 2021, and by February 25th, 2022, achieving about 4.4 million tweets shared by over 1 million users. According to Twitter policies, during the monitoring steps, we stored only the IDs of the tweets, requiring the consequent hydration of them for analyzing their contents. In particular, the hydration process of tweets has requested to connect with the Twitter API for accessing to the content of the tweets. Table 1 shows statistics on tweets extracted before and after the hydration process. As we can see, the number of tweets has been reduced to 3,760,574, from which 17,537 are written by verified users, whereas 1,170,868 represent unverified tweets. This could be due to the fact that either the tweets were written by users who have been banned on Twitter, or they represent tweets that have been removed from the platforms. 1e6 Before After 3.5 Number of tweets Hydrating Hydrating 3.0 Number of tweets 4,489,070 3,760,574 2.5 Number of accounts - 1,170,868 2.0 Verified accounts - 17,537 1.5 1.0 Accounts with location - 2,507 0.5 Average tweets per account - 3.21 0.0 All Tweets Retweets Tweet with Plain text Most recent tweet 2022-02-25 mentions tweets Table 1: Details of wishes dataset. Figure 1: Types of tweet in wishes dataset. After hydrating the content of the tweets, we performed a cleaning process by removing all special characters and/or emojis, yielding the standardization of the tweet syntax. This operation allowed us to identify the most common hashtags used in tweets and analyze their frequency in the dataset. Figure 1 shows the different types of tweets shared by both verified and non-verified accounts [18]. As we can see, among the different types of tweets, there are about 2.5 millions of retweets (66% of all tweets), about 500,000 of tweets contain at least one mention towards other users (13% of all tweets), and about 1 million of tweets are plain text tweet (26% of all tweets), i.e., tweets written in a standard text-based format that are not retweets and do not contain mentions. Figure 2 shows the number of tweets for each language collected in wishes. As we can see, wishes contains tweets written in 64 different languages, most of which are written in English, Spanish, and Turkish. In order to perform a more detailed analysis of the collective thinking on the role of women in society and on the problem of domestic violence, we provide a preliminary analysis of the most frequent hashtags used in the tweets as shown in Figure 3. As we can see, among the most frequent hashtags (i.e., the hashtags with the largest font sizes), there are hashtags that seem to not be related to the purpose of the proposed study, such as “#quoteofthedays” and “#endsars”. In fact, although the former is a generic hashtag used by Twitter users when writing a sentence for the day, the latter refers to a decentralized social movement and series of mass protests against police brutality in Nigeria. Nevertheless, in both cases, the tweets including these hashtags also contain references to the woman and their roles in society. Furthermore, it is possible to notice that most hashtags have targeted references closer to the scope of wishes, such as “#metoo”, “#domesticviolence”, “#selflove”, “#heforshe”, and “#neverforget”. Table 2 shows the occurrences and the frequencies of the top 30 hashtags used in the collection of tweets. The frequency is the number of occurrences of each hashtag with respect to the tweets shared from both verified and non-verified accounts. It is important to notice that only 2,741,802 of 3,760,574 tweets contain at least one hashtag, which corresponds to 73% of all the tweets. As shown in Table 2, we provide three different values representing the frequency of each hashtag with respect to 𝑖) the number of all the hashtags used in the tweets (i.e., 𝑁𝑋 ); 𝑖𝑖) the number of tweets that contain at least one hashtag (i.e., 𝑁𝑌 ), and 𝑖𝑖𝑖) the number of all tweets in the data collection (i.e., 𝑁𝑍 ). In particular, let 𝑁 be the number of occurrences of each hashtag in the data collection, the frequencies are defined as follows: 𝑁 ⋅ 100 𝑁 ⋅ 100 𝑁 ⋅ 100 𝐹1 = 𝐹2 = 𝐹3 = (1) 𝑁𝑋 𝑁𝑌 𝑁𝑍 From Table 2 we can notice that, there is a large number of tweets containing hashtags that refer to social movements against abuse and solidarity movements for the advancement of gender equality, such as “#metoo”, “#niunamenos”, “#orangetheworld”, and “#heforshe”. Among them, one of the most frequent hashtags is “#endsars”, which, as said above, is used for referring to a series of demonstrations against the police that is accused of torture and other crimes in Nigeria. However, the use of these hashtags in most tweets shows that these tortures and crimes also affect Nigerian women. Moreover, we can notice that these hashtags are contained in approximately 30.82% of the tweets of users who have used at least one hashtag in their posts, and 22.47% of the entire collection of tweets. Many of the hashtags used refer to domestic life scenarios, which still today represent an important social scourge, such as “#relationships”, “#divorce”, “#domesticviolence”, and “#never- Figure 3: Hashtags used in wishes dataset. Hashtag Occurrences F1 (%) F2 (%) F3 (%) Hashtag Occurrences F1 (%) F2 (%) F3 (%) #endsars 375,592 3.56 13.70 9.99 #women 63,063 0.60 2.30 1.68 #metoo 216,588 2.05 7.90 5.76 #niunamenos 59,542 0.56 2.17 1.58 #survivor 194,176 1.84 7.08 5.16 #heforshe 57,912 0.55 2.11 1.54 #neverforget 186,327 1.77 6.80 4.95 #inspiration 51,268 0.49 1.87 1.36 #quoteoftheday 186,220 1.77 6.79 4.95 #mentalhealth 44,849 0.43 1.64 1.19 #children 152,653 1.45 5.57 4.06 #divorce 40,350 0.38 1.47 1.07 #selflove 103,229 0.98 3.77 2.75 #education 37,427 0.35 1.37 1.00 #positivevibes 99,874 0.95 3.64 2.66 #therapy 37,048 0.35 1.35 0.99 #healing 92,818 0.88 3.39 2.47 #domesticviolence 36,875 0.35 1.34 0.98 #love 78,735 0.75 2.87 2.09 #selfcare 36,174 0.34 1.32 0.96 #orangetheworld 75,034 0.71 2.74 2.00 #life 35,900 0.34 1.31 0.95 #quote 71,699 0.68 2.62 1.91 #survivor2022allstar 35,773 0.34 1.30 0.95 #motivation 65,138 0.62 2.38 1.73 #business 35,238 0.33 1.29 0.94 #relationships 64,552 0.61 2.35 1.72 #survivorallstar2022 34,151 0.32 1.25 0.91 #quotes 64,461 0.61 2.35 1.71 #video 34,050 0.32 1.24 0.91 Table 2 Top 30 hashtags in wishes dataset. forget”. These hashtags are contained in approximately 4.42% of the tweets of users who have used at least one hashtag in their posts, and 3.22% of the entire collection of tweets. These preliminary statistics allow us to estimate how much domestic violence and gender inequality have sparked strong thoughts in people around the world. 3.2. Correlation Analysis Among the different kinds of correlation that can be analyzed [19], we used Spearman’s Rank- Order Correlation [20] to evaluate the correlation between hashtags, by applying it to the top-50s hashtags appearing in at least 30,000 different tweets. The values obtained from this analysis are comprised in the range [−1, 1]. In particular, given two elements 𝑥 and 𝑦, a zero correlation value implies no relationship between them, while values −1 and 1 indicate monotonic positive and negative relationships, respectively. More specifically, a positive correlation means that the increase of 𝑥 leads to the consequent increase of 𝑦, whereas a negative correlation means that when 𝑥 increases, 𝑦 decreases. Two elements have a high correlation value if they have a similar rank, whereas if their rank is significantly different they have a low correlation value. Figure 4 shows the correlation matrix of the most frequent hashtags used in the tweets of wishes. As we can see, there is no evidence of strong negative correlations between hashtags. However, there are some weak correlations between “#children” and the hashtags “#quoteofthe- day” and “#metoo”. In the first case, the negative correlation is probably due to the fact that the tweets containing quotes of the day often do not contain any reference to children, but they are reflexive aphorisms about woman and their role in society. In the second case, the negative correlation between “#children” and “#metoo” is due to the fact that the metoo movement is mainly about harassment in the workplace, leading it to be not significantly related to children. Other significant correlations can be found between “#motivation”, “#mindset”, with “#motiva- tionalquotes”, and between “#inspiration” with “#inspirationalquotes”. These positive correlations indicate that many tweets and users provide motivational messages for women living in dis- tressed family and work situations. Another topic covered by tweets in wishes dataset regards the relationship between the #business #children 1.00 #divorce #domesticabuse #domesticviolence #education #endsars 0.75 #healing #health #heforshe #heforshexkerem #inspiration #inspirationalquotes 0.50 #kids #lawofattraction #lekkimassacre #life #love #machinelearning 0.25 #meditation #mentalhealth #metoo #metoogr Correlation #mindset #motivation 0.00 #motivationalquotes #neveragain #neverforget #niunamenos #orangetheworld #positivevibes 0.25 #qotd #quote #quoteoftheday #quotes #relationships #selfcare 0.50 #selflove #success #survivor #survivor2022 #survivor2022allstar #survivor41 0.75 #survivorallstar2022 #technology #therapy #video #wapbaze #women 1.00 #wordsofwisdom #wordsofwisdom #machinelearning #business #children #niunamenos #quoteoftheday #success #divorce #domesticabuse #domesticviolence #education #endsars #healing #health #inspiration #inspirationalquotes #heforshe #heforshexkerem #kids #lawofattraction #lekkimassacre #life #love #meditation #mentalhealth #metoo #metoogr #mindset #motivation #motivationalquotes #neveragain #neverforget #orangetheworld #positivevibes #qotd #quote #quotes #relationships #technology #selfcare #selflove #survivor #survivor2022 #survivor2022allstar #survivor41 #survivorallstar2022 #therapy #video #wapbaze #women Figure 4: Correlation matrix of the top-50 hashtags that appear in at least 30.000 different tweets. women and their children. In particular, we can notice that there exist moderate correlations between the hashtags “#children” with “#kids” and “#education” meaning that when tweets discuss children, it is often women who express thoughts about the relationship of them with parents and with the school. Finally, other interesting correlations can be found between the hashtags “#healing” and “#mentalhealth”, and between “#selfcare” and “#selflove”. These hashtags are contained in tweets in which many people ask for support and positive opinions for healing from situations of violence that have indissolubly marked their lives. Moreover, they incite self and free love without borders, because, in modern society, situations of discrimination and violence capable of limiting people’s freedom are no longer acceptable. 4. Sentiment Analysis Sentiment analysis is a field of natural language processing (NLP) that allows to automatically evaluate user comments in order to determine whether a text expresses a positive or negative opinion. It is based on the main methods of computational linguistics and textual analysis, 1.0 Positive 0.8 Negative Neutral % of tweets 0.6 0.4 0.2 0.0 Turkish French Russian Portuguese Greek German Arabic Dutch English Hindi Italian Indonesian Japanese Spanish Figure 5: Sentiments detected for the analyzed languages. and it is adopted in several sectors, such as politics [21], stock markets [22], marketing [23] and sports [24]. In our study, the sentiment analysis aims at evaluating the feelings expressed by users about gender inclusion and the role of women in different countries. In particular, we performed a multi-language sentiment analysis by considering 3,275,529 tweets written in Arabic, Dutch, English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Portuguese, Russian, Spanish, and Turkish languages. The sentiment analysis on all these languages has been performed using XLM-Roberta [25]. This model is based on Google BERT (Bidirectional Encoder Representations from Transformers) [26], which is a bidirectional model capable of identifying the meaning of words by considering only the other words contained in a sentence. BERT has been designed to be easily fine-tuned for multiple tasks, like language inference and it has been already trained on a huge number of text in different languages. Figure 5 shows the sentiment percentages detected for each languages involved in our evaluation. As we can see, tweets written in Hindi and English represent the only ones where the percentage of positive sentiment exceeds both neutral and negative sentiments, i.e., 39.32% and 53.42%, respectively. Moreover, we can notice that for Russian, German, Dutch, and Greek languages there is a predominance of neutral sentiment and low percentages of positive tweets, i.e., 9.9%, 18%, 19%, and 12.9%, respectively. Concerning Italian, Portuguese, Indonesian, and Turkish tweets they present a predominantly neutral sentiment, but the percentage of positive tweets exceeds the negative ones. Furthermore, we can notice that for Spanish, Japanese, French, and Arabic tweets the main sentiments is negative. In fact, French tweets have the highest percentage of negative sentiment, i.e., 51.62%, whereas for Arabic tweets the percentages of tweets is similar for all different types of sentiments. Although the previous analysis provides an overview of the sentiments of the tweets for each language, to analyze the impact of the sentiments in relation to the geopolitical situation of the various countries, we performed a more detailed analysis of the tweets by taking into account the geographical origin of the tweets (Figure 6). Positive tweets are shown in green, negative tweets in red, and in yellow the neutral ones. As we can see, there is a high concentration of tweets from the American continent, Indonesia, Central Europe, and India. There is also a relevant number of tweets from China, Japan, and some African countries, such as Nigeria, South Africa, Kenya, and Somalia. In particular, the highest concentration of negative tweets is from India, Africa, and some areas of South America. A possible explanation is that in these countries there is an important part of the Figure 6: Analysis of the geographical origin of tweets in wishes dataset. population living in difficult conditions and there still exists a strong disparity between women and men, compared to more developed countries. In fact, Jewkes highlighted how poverty and the associated stress are key factors that contribute to violence and that affect women’s empowerment [27]. Considering the countries with many negative tweets, we can indeed notice that often the living conditions are difficult. The economic and political situation of African countries has led this continent to be classified as the last in terms of nominal gross domestic product per capita, according to the International Monetary Fund5 . Similarly, also South America suffers from the problem of poverty and it is the second to last continent in this ranking. Concerning India, overpopulation can be another negative factor, because it leads to high levels of unemployment and unfair income distributions. This generates stress and discontent, which can lead to violent behaviors. In fact, as reported from the National Family Health Survey-5 6 , nearly 30% of women in India have experienced sexual or physical violence. Moreover, the Women Peace & Security Index 7 have ranked 170 countries based on women’s inclusion and safety, by taking into account several factors, such as social norms, education levels, and economic situations. Among all these countries, it is possible to notice that countries 5 https://statisticstimes.com/economy/continents-by-gdp-per-capita.php 6 http://rchiips.org/nfhs/NFHS-5_FCTS/India.pdf 7 https://giwps.georgetown.edu/wp-content/uploads/2021/11/WPS-Index-2021.pdf like India, Nigeria, Somalia, and Congo are in rather backward positions, i.e., 148th , 130th , 159th , and 155th position, respectively. Furthermore, it is important to notice that the presence of the endsars movement in Nigeria could contribute to more negative tweets, since they highlight the violence of the police against all the Nigerian people. Concerning the Central Europe, we can notice that there is a greater balance between positive and negative tweets. The latter are probably due to the episodes of violence that probably raised reactions by several negative discussions on Twitter. Instead, the high number of positive tweets is probably due to the fact that the awareness campaigns and movements, such as MeToo and HeForShe, are positively impacting the European population. Similarly, for the population of Latin America, the spread of the awareness campaign of the Niunamenos movement has probably led to a greater number of positive tweets. The analysis discussed in this section revealed interesting differences in the sentiments expressed in tweets with respect to the language and geographic origin of tweets. Unfortunately, there are still many situations and countries in the world that require more intensive awareness campaigns on violence against women and in favor of gender equality in order to sensitize different cultures to the criteria of non-violence and respect for other people. 5. Conclusion and Future Directions Aiming at analyzing the resonance of topics related to the inclusion and safety of women in modern society, with this study we collected a huge quantity of tweets concerning gender equality and domestic violence in the wishes dataset (also shared in a public repository). Then, we analyzed the collected tweets by highlighting the most used hashtags and the correlations between them. Finally, we performed a sentiment analysis process in order to emphasize the positive/negative inflection of tweets with respect to the tweet’s language and/or geopolitical areas from which they have originated. The performed analysis allowed us to notice that many of the collected tweets are related to no-profit movements and/or awareness campaigns, which promote woman empowerment and promptly highlight violent episodes according to events occurring throughout the world. An interesting witness of the proposed analysis is related to the tweet concentration (also according to sentiments they include), which revealed a different perception with respect to the social and political scenario in countries. Starting from this preliminary analysis, many other deep analysis could be performed. For instance, it is possible to analyze contents and sentiments of tweets according to features more related to the user profiles and their influence over social networks. These analyses could be performed also exploiting topic modeling techniques to more accurately identify the content of tweets with the aim to obtain a more detailed analysis of them. Moreover, studies could be conducted by specifically considering events and/or awareness campaigns. Finally, a comparative analysis could also be accomplished by collecting and considering posts from other social networks, blogs, and websites [28]. References [1] M. L. Alvarez, From unheard screams to powerful voices: A case study of women’s political empowerment in the philippines, in: 12th National Convention on Statistics, 2013, pp. 1–73. [2] E. Bayeh, The role of empowering women and achieving gender equality to the sustainable development of ethiopia, Pacific Science Review B: Humanities and Social Sciences 2 (2016) 37–42. [3] J. Xue, K. Macropol, Y. Jia, T. Zhu, R. J. Gelles, Harnessing big data for social justice: An exploration of violence against women-related conversations on twitter, Human Behavior and Emerging Technologies 1 (2019) 269–279. [4] M. Alzyout, E. A. Bashabsheh, H. Najadat, A. Alaiad, Sentiment analysis of arabic tweets about violence against women using machine learning, in: 2021 12th International Conference on Information and Communication Systems (ICICS), IEEE, 2021, pp. 171–176. [5] J. Oyasor, M. Raborife, P. Ranchod, Sentiment analysis as an indicator to evaluate gender disparity on sexual violence tweets in south africa, in: 2020 International SAUPEC/Rob- Mech/PRASA Conference, IEEE, 2020, pp. 1–6. [6] K. Budiman, N. Zaatsiyah, U. Niswah, F. M. N. Faizi, Analysis of sexual harassment tweet sentiment on twitter in indonesia using naïve bayes method through national institute of standard and technology digital forensic acquisition approach, Journal of Advances in Information Systems and Technology 2 (2020) 21–30. [7] J. Xue, J. Chen, R. Gelles, Using data mining techniques to examine domestic violence topics on twitter, Violence and gender 6 (2019) 105–114. [8] M. A. Manzoor, S.-U. Hassan, A. Muazzam, S. Tuarob, R. Nawaz, Social mining for sustainable cities: thematic study of gender-based violence coverage in news articles and domestic violence in relation to covid-19, Journal of ambient intelligence and humanized computing (2022) 1–12. [9] M. Dragiewicz, J. Burgess, Domestic violence on# qanda: The “man” question in live twitter discussion on the australian broadcasting corporation’s q&a, Canadian journal of women and the law 28 (2016) 211–229. [10] R. Fulper, G. L. Ciampaglia, E. Ferrara, Y. Ahn, A. Flammini, F. Menczer, B. Lewis, K. Rowe, Misogynistic language on twitter and sexual violence, in: Proceedings of the ACM Web Science Workshop on Computational Approaches to Social Modeling, 2014, pp. 57–64. [11] K. R. Blake, S. M. O’Dean, J. Lian, T. F. Denson, Misogynistic tweets correlate with violence against women, Psychological science 32 (2021) 315–325. [12] M. Hilbert, Digital gender divide or technologically empowered women in developing countries? a typical case of lies, damned lies, and statistics, Women’s Studies International Forum 34 (2011) 479–489. [13] N. O. Alozie, P. Akpan-Obong, The digital gender divide: Confronting obstacles to women’s development in africa, Development Policy Review 35 (2017) 137–160. [14] A. Antonio, D. Tuffley, The gender digital divide in developing countries, Future Internet 6 (2014) 673–687. [15] M. Perifanou, A. Economides, Gender digital divide in europe, International Journal of Business, Humanities and Technology 10 (2020) 7–14. [16] M. López-Martínez, O. García-Luque, M. Rodríguez-Pasquín, Digital gender divide and convergence in the european union countries, Economics 15 (2021) 115–128. [17] M. Fatehkia, R. Kashyap, I. Weber, Using facebook ad data to track the global digital gender gap, World Development 107 (2018) 189–209. [18] L. Caruccio, D. Desiato, G. Polese, Fake account identification in social networks, in: 2018 IEEE international conference on big data (big data), IEEE, 2018, pp. 5078–5085. [19] L. Caruccio, V. Deufemia, F. Naumann, G. Polese, Discovering relaxed functional depen- dencies based on multi-attribute dominance, IEEE Transactions on Knowledge and Data Engineering 33 (2020) 3212–3228. [20] C. Spearman, The proof and measurement of association between two things, The American Journal of Psychology 15 (1904) 72–101. [21] J. Ramteke, S. Shah, D. Godhia, A. Shaikh, Election result prediction using twitter sentiment analysis, in: 2016 international conference on inventive computation technologies (ICICT), volume 1, IEEE, 2016, pp. 1–5. [22] V. S. Pagolu, K. N. Reddy, G. Panda, B. Majhi, Sentiment analysis of twitter data for predicting stock market movements, in: 2016 international conference on signal processing, communication, power and embedded system (SCOPES), IEEE, 2016, pp. 1345–1350. [23] E. Kauffmann, J. Peral, D. Gil, A. Ferrández, R. Sellers, H. Mora, Managing marketing decision-making with sentiment analysis: An evaluation of the main product features using text data mining, Sustainability 11 (2019) 4235. [24] S. Aloufi, A. El Saddik, Sentiment identification in football-specific tweets, IEEE Access 6 (2018) 78609–78621. [25] F. Barbieri, L. E. Anke, J. Camacho-Collados, Xlm-t: A multilingual language model toolkit for twitter, arXiv preprint arXiv:2104.12250 (2021). [26] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [27] R. Jewkes, Intimate partner violence: causes and prevention, The lancet 359 (2002) 1423–1429. [28] F. Cerruto, S. Cirillo, D. Desiato, S. M. Gambardella, G. Polese, Social network data analysis to highlight privacy threats in sharing data, Journal of Big Data 9 (2022) 1–26.