=Paper=
{{Paper
|id=Vol-2699/paper39
|storemode=property
|title=Investigating Online Toxicity in Users Interactions with the Mainstream Media Channels on YouTube
|pdfUrl=https://ceur-ws.org/Vol-2699/paper39.pdf
|volume=Vol-2699
|authors=Sultan Alshamrani,Mohammed Abuhamad,Ahmed Abusnaina,David Mohaisen
|dblpUrl=https://dblp.org/rec/conf/cikm/AlshamraniAAM20
}}
==Investigating Online Toxicity in Users Interactions with the Mainstream Media Channels on YouTube==
Investigating Online Toxicity in Users Interactions with the Mainstream Media Channels on YouTube ⋄‡ • ⋄ ⋄ Sultan Alshamrani , Mohammed Abuhamad , Ahmed Abusnaina , and David Mohaisen ⋄ ‡ • University of Central Florida Saudi Electronic University Loyola University Chicago news. Among the different social media platforms, the video-sharing platform “YouTube” has witnessed Abstract a massive growth in contents, measured by the number of published videos, as well as their popularity, with a Social media has become an essential platform viewership of more than 2 billion monthly users [21]). and source for most mainstream news chan- This massive growth has attracted publishers to de- nels, and many works have been dedicated liver their content through video-sharing platforms for to analyzing and understanding user experi- a fast delivery of content to viewers, and to enable the ence and engagement with the online news on social interaction with their viewers, which is enabled social media in general, and on YouTube in by the comment section of videos. particular. In this study, we investigate the A major feature of video-sharing platforms such as correlation of different toxic behaviors such YouTube used for delivering news stories is the inter- as identity hate, and obscenity with differ- active experience of the audience. However, users may ent news topics. To do that, we collected a misuse such a feature by posting toxic comments or large-scale dataset of approximately 7.3 mil- spreading hate and racism. To improve the user ex- lion comments and more than 10,000 news perience and facilitate positive interactions, numerous video captions, utilized deep learning-based efforts have been made to detect inappropriate com- techniques to construct an ensemble of clas- ments [5]. Despite the efforts focused on detecting in- sifiers tested on a manually-labeled dataset appropriate comments, the associations between vari- for label prediction, achieved high accuracy, ous types of toxicity and topics covered in news videos uncovered a large number of toxic comments from mainstream media remains an unexplored chal- on news videos across 15 topics obtained us- lenge. This work provides an in-depth analysis of the ing Latent Dirichlet Allocation (LDA) over relationship of such toxic comments and the topics pre- the captions of the news videos. Our analy- sented on the news. Discovering topics in news videos sis shows that religion and crime-related news requires accessing, processing, and modeling the script have the highest rate of toxic comments, while (i.e., caption) at a fine granularity, to allow the detec- economy-related news has the lowest rate. We tion of all news topics. Relying on the YouTube cate- highlight the necessity of effective tools to ad- gorization feature does not accurately capture the top- dress topic-driven toxicity impacting interac- ics of the video. For instance, YouTube has categorized tions and public discourse on the platform. 87.3% of the collected videos as news & politics. To this end, we explored and established topics using the 1 Introduction Latent Dirichlet Allocation (LDA) topic-modeling ap- proach that allowed assigning videos to specific topics. People around the globe adopt social media as an es- Our analysis shows that religion- and violence/crime- sential part of their daily routine, not only for social- related news derive the highest rate of toxic comments izing with each other, but also as a major source of constituting 24.8%, and 25.9% of the total comments Copyright © by the paper’s authors. Use permitted under Cre- posted on videos covering these topics, while economy- ative Commons License Attribution 4.0 International (CC BY related news shows the lowest rate of toxic comments 4.0). with 17.4% of the total comments. Title of the Proceedings: “Proceedings of the CIKM 2020 Work- shops October 19-20, Galway, Ireland”. Editors of the Proceed- Contribution. This work investigates the online tox- ings: Stefan Conrad, Ilaria Tiddi. icity observed in the comments posted on mainstream 2069K 1500K 1492K media channels and videos. We summarize our contri- 1247K butions as follows. 840K 1000K 687K Views 621K 540K • Data Collection and Ground Truth Annotation: 509K 480K 430K 400K 343K 234K We collected a large-scale dataset of ≈7.3 mil- 500K 186K lion comments posted on more than 14,000 news 484 K videos. We manually-annotated approximately t RT Pos GTN Fox eer a C rg N TV ws C C C É s six thousand comments to three types of toxicity. ff C NB be CN ND Ne CB BB AB RT new az MS om Hu J o k y r o Al Bl S Eu • Ensemble-based Toxicity Detection: Using de- Channel signed and evaluated an ensemble-based ap- proach, that utilizes state-of-the-art techniques Figure 1: The average number of views per news video for the different stages of our approach incorpo- for the top-15 mainstream media channels. rating data representation and classification, for 6622 8K Comments detecting various inappropriate comments. 4243 • LDA-based News Topic Modeling: Using LDA- 3851 4K 2379 based topic modeling, we discovered and defined 1907 1676 1097 1079 topics of news videos based on the caption. 677 569 537 493 377 242 55 • Topic/Toxicity Association: Using the discovered K t s s C NN RT BC Fox DTV eera TN BC NBC berg ew TÉ ew Pos topics, we assigned videos to specific topics and AB C B N az CG C MS om y N R on uf f J o S k ur H explore the topic/toxicity associations for differ- Al Bl E ent toxic behaviors. Further, we provided an in- Channel depth analysis of the toxic comments, including Figure 2: The average number of comments per news their popularity and users’ interactions. video for the top-15 mainstream media channels. 2 Related Works al. [3] investigated the relationship between the quality of the comments and both the consumption and pro- With the growing popularity of online platforms in de- duction of news on SacBee.com, including users moti- livering news [8, 6], the comment section of these plat- vation for both reading and writing news comments. forms has become an important feature where users in- Ksiazek et al. [7] proposed a framework to distinguish teract with the contents, contents providers, and each between users commenting on contents and those re- other, to express their opinions on the published con- plying to other users to better understand engagement. tents. The convenience of expressing opinions through In this work, and in the same space, we study the cor- the non-restrictive medium of online social platforms relation between the topic of the news and the type of may result in misusing such a medium by posting toxic inappropriate comments, e.g., obscenity and identity comments [11]. This has led many researchers to in- hate. vestigate different inappropriate behaviors in the com- Other noteworthy works that have been con- ment section of different websites. The majority of the ducted on behavioral modeling of YouTube content in- prior research work, however, has focused on designing clude [13, 10, 12], although not particularly addressing classification or detection mechanisms for inappropri- fine-grained toxicity analysis of mainstream news. ate comments, while a few have focused on user expe- rience and engagement, as outlined below. 3 Methodology Toxic Comment Classification. Despite various ef- forts on analyzing toxic contents, identifying distinct This section describes the methods used for data behaviors and patterns in this space is a challenge, es- collection and representation, toxicity detection, and pecially when (1) providing directions for prevention topic modeling. and detection methods, and (2) establishing an asso- ciation with the comment/content topics. However, 3.1 Data Collection and Measurements there are numerous studies that explored several as- The data used in this study consists of comments pects of toxicity, hate speech, and bias in online social posted on news videos from YouTube, as well as the interactions [18, 16, 4, 1]. captions of these videos. We collected more than User Engagement and Interactivity. Another 7.3 million comments posted on roughly 14,500 news major area in studying user’s behavior is using the videos from popular 30 news channels. The collected comments to identify users’ engagement with the on- comments are distributed from early 2007 until Octo- line news and comments [17, 9, 19]. Diakopoulos et ber 2019. We were able to extract video captions from only 10,883 videos, as the remaining videos do not in- obscene, or identity hate). The final dataset had clude captions. Moreover, we extended our data col- 1,832 safe, 4,126 toxic, 2,367 obscene, and 788 lection with the annotated ground truth dataset from identity hate comments. the Conversation AI team [2] for the purpose of com- ment toxicity analysis task. 3.2 Data Preprocessing YouTube News Channels. We collected comments For proper data analysis, we initially removed all non- on YouTube videos published by the most viewed English contents across all datasets and eliminated ir- mainstream media based on Ranker [14]. We ex- relevant characters, tokens, and stop-words. We also tended our list of mainstream media channels from a removed frequent words appearing in more than 50% Wikipedia list of the most viewed news channels [20]. of the captions. The final list includes 30 English-speaking news chan- nels from 16 countries. 3.3 Data Representation Data Statistics and Measurements. We collected a total of 7.3 million comments posted by 2,992,273 Comments Data Representation. We utilized unique users, and published in the past 13 years (2007 the pre-trained Word2Vec model from Gensim [15]. to 2019) where most of the videos were published in Word2Vec maps words to numerical vectors, and 2019, as the trend shows an increase in news video words occurring in a similar context are mapped popularity in recent years. into similar vectors. Capturing such relationships is The popularity of the channels used in our study possible when acquiring enough data, enabling the can be seen in the average number of views as shown Word2Vec model to accurately predict the word mean- in Figure 1 for the top-15 most-viewed channels. For ing based on past appearances from the provided con- instance, videos collected from channels such as ABC, text. The comment is then represented as word vec- CNN, and RT have a considerably high number of tors of size n × 300, where n is the number of words views (i.e., with an average exceeds one million views in the comment, with an upper limit of 50 words per per video). Intuitively, as the number of views in- comment, as most comments have less than 50 words. creases, the number of comments is more likely to in- Captions Data Representation. Investigating the crease. The average number of comments posted on topic/comments associations requires defining and un- videos from the most popular mainstream media chan- derstanding the topics raised in videos where the com- nels on YouTube is very high as shown in Figure 2. ments are observed. This understanding of topics can Here, the videos published by CNN, ABC, and Fox be done using topic modeling on captions extracted news have the highest average number of comments from videos. For the topic modeling task and topics per video which are 6,622, 4,243, and 3,581 respec- assignment to videos, we extracted and pre-processed tively. Generally, most of the top-15 channels maintain captions from the videos, i.e., transforming captions an average of more than 500 comments per video. to lowercase, tokenization, and eliminating irrelevant Toxicity-related Annotated Datasets. To study tokens such as stopwords, punctuation, and words users’ behavior in the comment section, we utilized containing less than three characters. After the pre- two ground truth datasets to train a machine learning- processing phase, captions are represented using bags based ensemble classifier for toxic comment detection of words, in which, words are assigned a unique identi- and classification: (i) Wikipedia comments created by fier. To reduce the dimensionality of the bag-of-words, Conversation AI team [2] and (ii) our own manually- we selected the top 10,000 words to be the caption data annotated YouTube comments. representation. • Wikipedia Ground Truth: 160,000 comments from 3.4 Toxicity Detection Models Wikipedia Talk pages, manually-annotated by the Conversation AI team, with 143,000 comments la- The first task of this study is to detect and classify dif- beled as safe, 15,294 toxic, 8,449 obscene, and ferent toxic behaviors of comments, in order to further 1,405 identity hate comments. The labels may investigate their association with the topics covered in overlap, allowing the assignment of more than one the news of which the comments are collected. We label to a toxic comment. inspected comments for three categories of toxicity: • YouTube Ground Truth Dataset: This is an in- toxic, obscene, and identity hate. We utilized a neural house dataset that we created by manually an- network-based ensemble of three models for classifying notating 5,958 random YouTube comments, first the three toxic categories. into either toxic or safe. The toxic (general class) Deep Neural Network (DNN)-based Architec- comments are then mapped to either (i.e., toxic, ture. DNN is a supervised learning method that can Frequency TPR TNR Frequency TPR TNR Frequency TPR TNR 1.0 1.0 1.0 Rate Rate Rate 0.5 0.5 0.5 0.24 0.09 0.12 0.0 0.0 0.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 Threshold Threshold Threshold (a) Toxic (b) Obscene (c) Identity hate Figure 3: The evaluation of the ensemble model across categories in terms of TPR and TNR. 0.1 discover both linear and non-linear relationships be- Topic-Driven Content-Related tween the input and the output. Comments repre- Ratio sented as sequences of word embeddings are fed to the DNN-based models for labeling. The DNN model used in this study consists of (1) an input layer of size 0 (50 × 300), similar to the shape of the embeddings of irs cy gy st m y on on ts rm on mes rties rime ges ffa Poli Ener Prote yste onom cati Uni Even t/Fa eligi a a fu a n A ign t e/ ic t/ a l S Ec Edu ean ilm/ /Die R ts/G S P nce/C k/Re the Word2Vec representation, (2) two fully connected ic re a f l g r U Afr Fo Clim Con rt/Le p /F uro mily Foo d Sp o le ac S u E Vio r/Att U Co Fa Topic Wa hidden layers of size 128 with ReLU activation func- tion, and (3) the output layer with one sigmoid. Dataset Handling and Splitting. Using the two Figure 4: The distribution of the obscene comments ground truth datasets, we utilized two different ap- over different topics generated by the LDA model. proaches to split the datasets for training and eval- uating the models. (1) We adopted a 50/50 split- videos (87.3%) published by the news channels are cat- ting method for the training and testing of our models egorized as News & Politics. Based on our analysis using our YouTube ground truth comments datasets. of topics appeared in news videos, a variety of top- Since the manually-annotated comments dataset is rel- ics were captured including war/attack/refugees, vio- atively small, the training process is initially done us- lence/crime, sports/games, politics, economy. ing Wikipedia ground truth comments dataset. Then, each model was fine-tuned using the 50% training LDA Model Settings and Evaluation. The LDA dataset of the manually-annotated YouTube com- operates using the bag of words representation of cap- ments. (2) We also used 50/50 training/testing splits tion segments. The topic model receives input vectors of the Wikipedia ground truth comments dataset for of 10,000 bag-of-word representation and assigns top- exploring the effects of different experimental settings. ics for each segment. This process includes a training We note that comments can be categorized into mul- phase that requires setting several parameters such as tiple toxic categories, e.g. one comment can be toxic, the number of topics, alpha (the segment-topic den- obscene, and implies identity hate. Therefore, com- sity), and beta (topic-word density). To examine the ments that imply multiple toxic behaviors can be used effect of different parameters on the modeling task, we for training and evaluating multiple models. conducted a grid search mechanism to obtain the best configuration of the LDA model that allows for the 3.5 Topic Modeling using LDA highest coherence score possible. For the number of Topic modeling is an unsupervised statistical machine topics, we explored the effects of changing the number learning technique that processes a set of documents of targeted topics from 10 to 40 with an increase of 5 and detects word and phrase patterns across docu- topics each iteration. For tuning alpha and beta pa- ments to cluster them based on their similarities. rameters, we vary the values from 0.01 to 1 with an in- crement of 0.3 at each step. The LDA-model achieves Fine-grained Topics Extraction. We studied the the best performance using the following settings: associations between a specific toxic behavior (e.g. ob- [numberof topics = 20, alpha = 0.61, beta = 0.31] with scenity) and an extracted topic from videos of main- a coherence score of 0.55. stream media channels. To do so, we conducted a topic modeling to assign topics to videos based on their cap- We manually inspected the frequent keywords of the tion. This is a challenging task since YouTube cate- best-performing LDA output and assigned names and gorization is generic and lacks specification of topics descriptions to them, resulting in various consolida- covered in the video script. We observed that most tions, and producing 15 distinct topics. 0.2 0.3 Topic-Driven Content-Related Topic-Driven Content-Related 0.2 Ratio Ratio 0.1 0.1 0 0 s y t es ties ime ges y y air olic nerg rotes stem nom atio n ion nts rm ion s y y t y n n s n s s e air olic nerg rotes stem nom atio Unio vent Farm ligio ame artie rim fuge s Aff P y o uc Un Eve t/Fa elig /Gam Par e/Cr efu Aff y t/ E P an reign ate/ flict/ gal S Ec Ed pean ilm/ /Die R rts P E P o uc E Re ts/G S P ce/C k/Re r ic F d o US c len ack /R ic an reign ate/ flict/ gal S Ec Ed pean ilm/ /Die r U n Af Fo Clim Con rt/Le ro ly/ Foo Sp Vio r/Att Afr Fo Clim Con rt/Le ro F ly/ Food Sp o le ac US u Eu ami Eu ami Vio r/Att Co F Topic W a US Co u F Wa Topic Figure 5: The distribution of identity hate comments Figure 6: The distribution of the toxic comments over over different topics generated by the LDA model. different topics generated by the LDA model. 4 Results and Discussion 2 Obscene Comments: The violence/crime- related news had the highest number of obscene 4.1 Toxicity Detection and Measurement comments; 10% of the total comments. News 1 Toxic Comments: Figure 3(a) shows the per- covering the United States foreign policy had the formance of the toxic-behavior detection model least number of obscene comments, with only 3%, in terms of TPR and TNR using different clas- as shown in Figure 4. sification probability thresholds. We selected the 3 Identity Hate Comments: Among the 15 top- threshold of 0.520 as the best TPR/TNR trade-off ics, African affairs and religion news had the high- with a TPR of 86.2% and a TNR of 71.2%. This est ratio of identity hate comments; 20% of the model shows that 22.4% of the comments are clas- comments. While news related to climate/energy sified as toxic with a total of 1,648,345 comments. and the United States foreign policy have the least 2 Obscene Comments: The model with a deci- number of identity hate comments with about 4% sion threshold of 0.27 achieves a high TPR of of total comments as shown in Figure 3(c). 86.6% and TNR of 88.8% for detecting obscene Content-related Toxicity. We note that toxic com- comments. Figure 3(b) shows the results of adopt- ments can be posted due to several factors and may ing different thresholds. Applying the model al- not be totally driven by the covered topics. In an at- lows the classification of 7.43% of the comments tempt to relate specific toxic comments with the topics as obscene with a total of 547,222 comments. content, we conducted a statistical analysis to measure 3 Identity Hate Comments: Figure 3(c) shows the commonalities between comments and the content the outstanding performance of the specialized of the caption. For videos of each topic, we obtained model for detecting identity hate. Using a deci- the average number of common terms and expressions sion threshold of 0.140, the model achieves a TPR to be the baseline of indicating the relationship be- of 74.8% and a TNR of 98.4%. The model shows tween the topic and the toxic comment. We note that that 7.03% of the comments are classified as iden- this might not always hold. However, we observed that tity hate with a total of 518,213 comments. comments containing a number of common terms with the caption that is higher than the average of common 4.2 Toxicity and Topics Associations terms in a target topic are more likely to be related to the topics covered in the caption. This analysis The detection of toxic behaviors and access to the produced similar ratios of different toxic behaviors in topic categorization of videos allow us to conduct tox- different news topics. icity/topic analyses. Such associations show whether specific toxicity is topic-driven or derived by other fac- 5 Conclusion tors. Based on our topic model and ensemble classifier, we examined the presence of toxic, obscene and iden- We designed and evaluated an ensemble of models to tity hate comments on each topic of our LDA model. detect various types of toxicity in comments posted on YouTube mainstream media channels. By analyz- 1 Toxic Comments: Figure 6 shows that the ing 7 million YouTube comments, posted on 14,506 videos discussing topics related to religions or vi- YouTube news videos, we detected and classified toxic olence/crime have the highest rate of toxic com- comments with high accuracy, and demonstrated that ments, with roughly 25% of the comments are despite countless efforts in comment moderation taken toxic. On the other hand, economy-related news by YouTube, ≈69% of the collected videos contained shows the lowest rate of toxic comments with 17% toxic comments. We investigated the correlation be- of the total number of comments. tween the content of news videos and different toxic behaviors across 15 topics, showing that religion and [10] Mariconti, E., Suarez-Tangil, G., Black- violence/crime-related news have the highest rate of burn, J., Cristofaro, E. D., Kourtellis, N., toxic comments, while economy-related news have the Leontiadis, I., Serrano, J. L., and Stringh- lowest rate of toxic comments. While interesting in ini, G. ”you know what to do”: Proactive detec- its own right from a behavioral standpoint, this study tion of youtube videos targeted by coordinated highlights the need for more effective moderation. hate attacks. Proc. ACM Hum. Comput. Inter- Acknowledgement. Work was done while all au- act. 3, CSCW (2019), 207:1–207:21. thors were at the University of Central Florida, and is [11] Massaro, T. M. Equality and freedom of ex- supported by NRF grant 2016K1A1A2912757 (Global pression: The hate speech dilemma. Research Lab). S. Alshamrani was supported by a scholarship from the Saudi Arabian Cultural Mission. [12] Papadamou, K., Papasavva, A., Zannettou, S., Blackburn, J., Kourtellis, N., Leon- tiadis, I., Stringhini, G., and Sirivianos, References M. Disturbed youtube for kids: Characteriz- [1] Brassard-Gourdeau, É., and Khoury, R. ing and detecting disturbing content on youtube. Impact of sentiment detection to recognize arXiv:1901.07046 (2019). toxic and subversive online comments. CoRR [13] Papadamou, K., Zannettou, S., Blackburn, abs/1812.01704 (2018). J., Cristofaro, E. D., Stringhini, G., and [2] ConversationAI. Sirivianos, M. Understanding the incel commu- https://conversationai.github.io/, 2019. Ac- nity on youtube. CoRR abs/2001.08293 (2020). cessed: 2019-10-03. [14] Ranker. www.ranker.com, 2019. Accessed: [3] Diakopoulos, N., and Naaman, M. Towards 2019-09-09. quality discourse in online news comments. In [15] Řehůřek, R., and Sojka, P. Software Frame- Proc. of the ACM Conference on Computer Sup- work for Topic Modelling with Large Corpora. ported Cooperative Work, CSCW (2011). In Proc. of the Workshop on New Challenges for [4] D’Sa, A. G., Illina, I., and Fohr, D. Towards NLP Frameworks (2010). non-toxic landscapes: Automatic toxic comment [16] Shtovba, S., Shtovba, O., and Petrychko, detection using DNN. CoRR abs/1911.08395 M. Detection of social network toxic comments (2019). with usage of syntactic dependencies in the sen- [5] Ernst, J., Schmitt, J. B., Rieger, D., tences. In Proc. of the 2nd International Work- Beier, A. K., Vorderer, P., Bente, G., and shop on Computer Modeling and Intelligent Sys- Roth, H.-J. Hate beneath the counter speech? a tems, CMIS (2019). qualitative content analysis of user comments on [17] Sil, D. K., Sengamedu, S. H., and Bhat- youtube related to counter speech videos. Journal tacharyya, C. Supervised matching of com- for Deradicalization, 10 (2017), 1–49. ments with news article segments. In Proc. of [6] GEIGER, A. Key findings about the online news the 20th ACM Conference on Information and landscape in america. tinyurl.com/y44m63xu, Knowledge Management, CIKM (2011). 2019. Accessed: 2020-16-04. [18] Silva, L. A., Mondal, M., Correa, D., Ben- [7] Ksiazek, T. B., Peer, L., and Lessard, K. evenuto, F., and Weber, I. Analyzing the User engagement with online news: Conceptual- targets of hate in online social media. In Proc. izing interactivity and exploring the relationship of the 10th International Conference on Web and between online news videos and user comments. Social Media, ICWSM (2016). New media & society 18, 3 (2016), 502–520. [19] Tsagkias, M., Weerkamp, W., and de Ri- [8] Locklear, M. More people get their jke, M. Predicting the volume of comments on news from social media than newspapers. online news stories. In Proc. of 18th ACM Con- https://tinyurl.com/y8ht3ubr, 2018. Accessed: ference on Information and Knowledge Manage- 2020-16-04. ment, CIKM (2009). [9] Ma, Z., Sun, A., Yuan, Q., and Cong, G. [20] wikipedia. https://tinyurl.com/y5oyytc8, 2019. Topic-driven reader comments summarization. In Accessed: 2019-09-09. Proc. of 21st ACM International Conference on [21] YouTube. https://tinyurl.com/y9nmv95q, 2020. Information and Knowledge Management, CIKM Accessed: 2020-04-29. (2012).