-

A Human in the Loop Approach to Capture Bias and Support Media Scientists in News Video Analysis

Panagiotis Mavridis

Markus de Jong

m.a.dejongg@vu.nl 4

Lora Aroyo

Alessandro Bozzon

a.bozzong@tudelft.nl 2

Jesse de Vos

jdvosg@beeldengeluid.nl 0

Johan Oomen

Antoaneta Dimitrova

a.l.dimitrova@fgga.leidenuniv.nl 1

Alec Badenoch

A.W.Badenoch@uu.nl 3 0 Beel en Geluid 1 Leiden University 2 TU Delft , Web Information Systems 3 Utrecht University 4 Vrije Universiteit Amsterdam, User-Centric Data Science Group

Bias is inevitable and inherent in any form of communication. News often appear biased to citizens with di erent political orientations, and understood di erently by news media scholars and the broader public. In this paper we advocate the need for accurate methods for bias identi cation in video news item, to enable rich analytics capabilities in order to assist humanities media scholars and social political scientists. We propose to analyze biases that are typical in video news (including framing, gender and racial biases) by means of a human-in-the-loop approach that combines text and image analysis with human computation techniques.

Bias detection bias in news video les machine learning crowdsourcing human computation human in the loop

News media scholars analyze online media for di erent international events from a variety of online news channel sources such as CNN, France24, RT or Al Jazeera. However, news reporters in each channel present news stories from different perspectives. As such, news often appear biased to citizens with di erent political orientations and are understood di erently by news media scholars and broader public [ 7 ]. Since bias is inherent in every communication, it could lead to a misguided audience, whether scientists or broader public. For instance it could a ect democratic institutions by a ecting voters choice [ 11, 6 ]. A more accurate detection of bias could enable consumers of video news item to become aware of possible misrepresentations, and could provide with more useful media analysis for scientists.

Since news media are abundant and manual detection of bias is costly { both in monetary and temporal terms { we propose to assist media news scholars with automatic techniques. Focusing either on the di erent manifestations of bias or on the ambiguity of interpretations of media news, this problem can be studied from two di erent perspectives: (1) the study of the di erent manifestations of bias; and (2) the role of content ambiguity in the detection of bias. In this work, we propose an approach for the rst. 2

Related Work

Bias is often manifested through misrepresentation of entities which is performed by framing [ 1, 12 ]. Framing is also used when news agencies adjust their report approach for their intended public and target speci c groups [ 3 ]. The framing acts upon concepts or entities of the story; when such entities are individuals, bias can manifest in terms of (1) gender bias [ 8 ] and (2) racial bias [ 5 ] when a particular gender or race is misrepresented.

Framing can be captured through either an extensive manual thematic analysis [ 12 ] or by word-based quantitative text analysis performed manually or with computer assisted methods [ 1 ]. In the case of video, crowdsourced labels have been used to gain insight into how exactly themes and sentiment di er between news sources [ 3 ]. As mentioned, research can discover racial bias expressed by discrepancies between on-screen representation of ethnic groups and various ofcial statistics [ 5 ]. Example results from this 2017 crowdsourced investigation in Los Angeles showed, for example, that whites were signi cantly overrepresented in the victim, perpetrator and police o cer categories. Similar quantitative comparisons can be carried out to investigate gender bias [ 8 ]

However, automated methods for the detection of bias also exist. For instance [ 9 ] identify on a particular controversial topic (Edward Snowden) two di erent groups of twitter users talk about the controversial topic and how information is shaped and propagated about the topic by comparing the rate of original tweets and retweets over this controversial topic during a month. On a similar subject but with a di erent method, [ 10 ] identi es seed words and trains a semi-automatic method to detect partisans on a controversial topic. [ 4 ] identies unintended bias that comes from an imbalanced dataset when demographics on participants are not always available. 3

Proposed Approach

To address bias in news video, we propose a comparative correlation and sentiment analysis of the di erent manifestations of bias as mentioned in Section 2 through the use case of news analysis for media scientists. We propose to automate a procedure that extracts di erent properties and elements that can lead to automatic bias detection and involves humans in the loop in an iterative process. Since the automatic methods are not enough to identify the bias cues related to entities and sentiments we automate a process that involves humans in the loop. Then, social science and political science scholars evaluate the output of this process. More speci cally, we specify the initial datasets and explain the preprocessing of the data in order to extract the di erent bias cues for framing, gender and racial bias with the use of machine learning and human computation methods. In the end we evaluate with the help of our experts. 3.1

Datasets

Videos and textual data: The datasets consist of online news videos reporting on a news event. We gather video and their metadata such as subtitles, video comments and video tags. As sources, we have selected English language online video news channels that post their videos on YouTube and are mentioned in Section 1 as these present international news from di erent perspectives. We also take advantage of the keyword annotated datasets on videos provided in the YouTube8m dataset6.

To determine news events we use Wikipedia 7 and online news articles. Wikipedia provides crowd-sourced articles from di erent contributors. This data takes some time to build, improves over time and could be used to compare the entities and facts presented between di erent news sources. Online news articles can provide with comparison data over videos when Wikipedia articles are missing. 3.2

Data preprocessing

Captions and Text Extraction: Since we want to compare the video event coverage with online news articles that contain mainly text, we need to retrieve the text mentioned in the video. Thus, we need to generate subtitles for the videos (if none available) using a speech to text engine. We also detect and extract informative text displayed on screen as part of the narration .e.g. speaker or location descriptors, section titles) using optical character recognition (OCR).

News event detection and data gathering: From the Wikipedia pages, we extract events using NLP processing. From these events and supported by Wordnet 8 we can create seed words to assist a crowd to annotate an event. When the events are identi ed, we can collect video data from the di erent video channels of our initial dataset. 3.3

Bias Cues extraction

We identify the di erent bias cues by a comparative analysis of the di erent textual and video data that we have from di erent sources concerning the same

6 https://research.google.com/youtube8m/ 7 www.wikipedia.com 8 wordnet.princeton.edu/

event. This method permits identifying missing or misrepresented entities in terms of number or sentiment attached and thus provides a detection of framing and misrepresentation of gender or race within the presented video. For instance, how many times some entities appear more compared to the other entities on a particular event. We perform the above with di erent ways such as video deconstruction, keyword and entity extraction and sentiment analysis.

Video deconstruction and Analysis . In order to be able to annotate videos for their events we need to be able to separate the scenes from each video with automated scene recognition. We plan to obtain bias cues with both machine learning and human computation. Ideally, we use machine learning to identify what needs to be annotated by humans in order to nd out e.g. who is reporting, who is talking, who is present at the scene, etc.

Entity and Sentiment Analysis . To make use of all data modalities in our news videos, we investigate the combination of existing API's for textual, voiceand face-based sentiment analysis [ 13 ] attached to entities. Also to be able to attach the entities to particular sentiments [ 2 ] we use human computation to identify or validate the output from sentiment analysis from machine learning methods. 3.4

Evaluation

Finally, we evaluate our approach with domain experts from humanities and political sciences. Given an event, they are presented with an interface with di erent graphs from our hybrid human-machine approach. The expert should be able to use a representation of the event and di erent word clouds for the same event from di erent channels and be able to perform the bias investigation. 4

Discussion and Directions

We presented how bias is manifested and can be captured with an approach using state of the art machine learning and human computation. We mainly focused on identifying the di erent bias cues such as framing, gender and race misrepresentations in order to assist media scientists in video news analysis. We want to apply this approach through a pilot experiment and compare the di erent types of bias, their possible correlations and also perform a sentiment analysis.

Acknowledgements

This research is supported by the Capture Bias project 9, part of the VWData Research Programme funded by the Startimpuls programme of the Dutch National Research Agenda, route "Value Creation through Responsible Access to and use of Big Data" (NWO 400.17.605/4174).

9 https://capturebias.eu/

1. Borang, F. , Eising , R. , Kluver, H., Mahoney , C. , Naurin , D. , Rasch , D. , Rozbicka , P. : Identifying frames: A comparison of research methods . Interest Groups & Advocacy 3 ( 2 ), 188 { 201 ( 2014 )

Calais

Guerra , P.H. , Veloso , A. , Meira , Jr., W. , Almeida , V. : From bias to opinion: A transfer-learning approach to real-time sentiment analysis . In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . pp. 150 { 158 . KDD '11, ACM , New York, NY, USA ( 2011 ). https://doi.org/10.1145/2020408.2020438, http://doi.acm. org/10 .1145/2020408.2020438

3. Dimitrova , A. , Frear , M. , Mazepus , H. , Toshkov , D. , Boroda , M. , Chulitskaya , T. , Grytsenko , O. , Munteanu , I. , Parvan , T. , Ramasheuskaya , I. : The elements of russias soft power: Channels, tools, and actors promoting russian in uence in the eastern partnership countries ( 2017 )

4. Dixon , L. , Li , J. , Sorensen , J. , Thain , N. , Vasserman , L. : Measuring and mitigating unintended bias in text classi cation ( 2018 )

5. Dixon , T.L. : Good guys are still always in white? positive change and continued misrepresentation of race and crime on local television news . Communication Research 44 ( 6 ), 775 { 792 ( 2017 )

6. Gelman , A. , Azari , J.: 19 things we learned from the 2016 election . Statistics and Public Policy 4 ( 1 ), 1 { 10 ( 2017 ). https://doi.org/10.1080/2330443X. 2017 . 1356775 , https://doi.org/10.1080/2330443X. 2017 .1356775

7. Hackett , R.A. : Decline of a paradigm? bias and objectivity in news media studies . Critical Studies in Mass Communication 1 ( 3 ), 229 { 259 ( 1984 ). https://doi.org/10.1080/15295038409360036, https://doi.org/10.1080/15295038409360036

8. Kinnick , K.N. : Gender bias in newspaper pro les of 1996 olympic athletes: A content analysis of ve major dailies . Women's Studies in Communication 21 ( 2 ), 212 { 237 ( 1998 )

9. Liao , Q.V. , Fu , W.T., Strohmaier , M. : #snowden: Understanding biases introduced by behavioral di erences of opinion groups on social media . In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . pp. 3352 { 3363 . CHI '16, ACM , New York, NY, USA ( 2016 ). https://doi.org/10.1145/2858036.2858422, http://doi.acm. org/10 .1145/2858036.2858422

10. Lu , H. , Caverlee , J. , Niu , W. : Biaswatch: A lightweight system for discovering and tracking topic-sensitive opinion bias in social media . In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management . pp. 213 { 222 . CIKM '15, ACM , New York, NY, USA ( 2015 ). https://doi.org/10.1145/2806416.2806573, http://doi.acm. org/10 .1145/2806416.2806573

11. N., D.J. , Michael , P.: The impact of media bias: How editorial slant a ects voters . Journal of Politics 67 ( 4 ), 1030 { 1049 . https://doi.org/10.1111/j.1468- 2508 . 2005 . 00349 .x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468- 2508 . 2005 . 00349 .x

12. Philo , G. , Briant , E. , Donald , P. : Bad news for refugees . Pluto Press ( 2018 )

13. Poria , S. , Peng , H. , Hussain , A. , Howard , N. , Cambria , E.: Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis . Neurocomputing 261 , 217 { 230 ( 2017 )