<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Human in the Loop Approach to Capture Bias and Support Media Scientists in News Video Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Panagiotis Mavridis</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus de Jong</string-name>
          <email>m.a.dejongg@vu.nl</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lora Aroyo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Bozzon</string-name>
          <email>a.bozzong@tudelft.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesse de Vos</string-name>
          <email>jdvosg@beeldengeluid.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johan Oomen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoaneta Dimitrova</string-name>
          <email>a.l.dimitrova@fgga.leidenuniv.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alec Badenoch</string-name>
          <email>A.W.Badenoch@uu.nl</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beel en Geluid</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leiden University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TU Delft</institution>
          ,
          <addr-line>Web Information Systems</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Utrecht University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Vrije Universiteit Amsterdam, User-Centric Data Science Group</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bias is inevitable and inherent in any form of communication. News often appear biased to citizens with di erent political orientations, and understood di erently by news media scholars and the broader public. In this paper we advocate the need for accurate methods for bias identi cation in video news item, to enable rich analytics capabilities in order to assist humanities media scholars and social political scientists. We propose to analyze biases that are typical in video news (including framing, gender and racial biases) by means of a human-in-the-loop approach that combines text and image analysis with human computation techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>Bias detection</kwd>
        <kwd>bias in news video</kwd>
        <kwd>les</kwd>
        <kwd>machine learning crowdsourcing</kwd>
        <kwd>human computation</kwd>
        <kwd>human in the loop</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        News media scholars analyze online media for di erent international events from
a variety of online news channel sources such as CNN, France24, RT or Al
Jazeera. However, news reporters in each channel present news stories from
different perspectives. As such, news often appear biased to citizens with di erent
political orientations and are understood di erently by news media scholars and
broader public [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Since bias is inherent in every communication, it could lead to
a misguided audience, whether scientists or broader public. For instance it could
a ect democratic institutions by a ecting voters choice [
        <xref ref-type="bibr" rid="ref11 ref6">11, 6</xref>
        ]. A more accurate
detection of bias could enable consumers of video news item to become aware of
possible misrepresentations, and could provide with more useful media analysis
for scientists.
      </p>
      <p>Since news media are abundant and manual detection of bias is costly { both
in monetary and temporal terms { we propose to assist media news scholars with
automatic techniques. Focusing either on the di erent manifestations of bias or
on the ambiguity of interpretations of media news, this problem can be studied
from two di erent perspectives: (1) the study of the di erent manifestations of
bias; and (2) the role of content ambiguity in the detection of bias. In this work,
we propose an approach for the rst.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Bias is often manifested through misrepresentation of entities which is performed
by framing [
        <xref ref-type="bibr" rid="ref1 ref12">1, 12</xref>
        ]. Framing is also used when news agencies adjust their report
approach for their intended public and target speci c groups [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The framing
acts upon concepts or entities of the story; when such entities are individuals,
bias can manifest in terms of (1) gender bias [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and (2) racial bias [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] when a
particular gender or race is misrepresented.
      </p>
      <p>
        Framing can be captured through either an extensive manual thematic
analysis [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or by word-based quantitative text analysis performed manually or with
computer assisted methods [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the case of video, crowdsourced labels have
been used to gain insight into how exactly themes and sentiment di er between
news sources [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As mentioned, research can discover racial bias expressed by
discrepancies between on-screen representation of ethnic groups and various
ofcial statistics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Example results from this 2017 crowdsourced investigation in
Los Angeles showed, for example, that whites were signi cantly overrepresented
in the victim, perpetrator and police o cer categories. Similar quantitative
comparisons can be carried out to investigate gender bias [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
      </p>
      <p>
        However, automated methods for the detection of bias also exist. For
instance [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] identify on a particular controversial topic (Edward Snowden) two
di erent groups of twitter users talk about the controversial topic and how
information is shaped and propagated about the topic by comparing the rate of
original tweets and retweets over this controversial topic during a month. On a
similar subject but with a di erent method, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] identi es seed words and trains
a semi-automatic method to detect partisans on a controversial topic. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
identies unintended bias that comes from an imbalanced dataset when demographics
on participants are not always available.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach</title>
      <p>To address bias in news video, we propose a comparative correlation and
sentiment analysis of the di erent manifestations of bias as mentioned in Section 2
through the use case of news analysis for media scientists. We propose to
automate a procedure that extracts di erent properties and elements that can lead to
automatic bias detection and involves humans in the loop in an iterative process.
Since the automatic methods are not enough to identify the bias cues related
to entities and sentiments we automate a process that involves humans in the
loop. Then, social science and political science scholars evaluate the output of
this process. More speci cally, we specify the initial datasets and explain the
preprocessing of the data in order to extract the di erent bias cues for framing,
gender and racial bias with the use of machine learning and human computation
methods. In the end we evaluate with the help of our experts.
3.1</p>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>Videos and textual data: The datasets consist of online news videos reporting
on a news event. We gather video and their metadata such as subtitles, video
comments and video tags. As sources, we have selected English language online
video news channels that post their videos on YouTube and are mentioned in
Section 1 as these present international news from di erent perspectives. We
also take advantage of the keyword annotated datasets on videos provided in
the YouTube8m dataset6.</p>
        <p>To determine news events we use Wikipedia 7 and online news articles.
Wikipedia provides crowd-sourced articles from di erent contributors. This data
takes some time to build, improves over time and could be used to compare the
entities and facts presented between di erent news sources. Online news
articles can provide with comparison data over videos when Wikipedia articles are
missing.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data preprocessing</title>
        <p>Captions and Text Extraction: Since we want to compare the video event
coverage with online news articles that contain mainly text, we need to retrieve the
text mentioned in the video. Thus, we need to generate subtitles for the videos
(if none available) using a speech to text engine. We also detect and extract
informative text displayed on screen as part of the narration .e.g. speaker or
location descriptors, section titles) using optical character recognition (OCR).</p>
        <p>News event detection and data gathering: From the Wikipedia pages, we
extract events using NLP processing. From these events and supported by Wordnet
8 we can create seed words to assist a crowd to annotate an event. When the
events are identi ed, we can collect video data from the di erent video channels
of our initial dataset.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Bias Cues extraction</title>
        <p>We identify the di erent bias cues by a comparative analysis of the di erent
textual and video data that we have from di erent sources concerning the same</p>
        <sec id="sec-3-3-1">
          <title>6 https://research.google.com/youtube8m/ 7 www.wikipedia.com 8 wordnet.princeton.edu/</title>
          <p>event. This method permits identifying missing or misrepresented entities in
terms of number or sentiment attached and thus provides a detection of framing
and misrepresentation of gender or race within the presented video. For instance,
how many times some entities appear more compared to the other entities on
a particular event. We perform the above with di erent ways such as video
deconstruction, keyword and entity extraction and sentiment analysis.</p>
          <p>Video deconstruction and Analysis . In order to be able to annotate videos
for their events we need to be able to separate the scenes from each video with
automated scene recognition. We plan to obtain bias cues with both machine
learning and human computation. Ideally, we use machine learning to identify
what needs to be annotated by humans in order to nd out e.g. who is reporting,
who is talking, who is present at the scene, etc.</p>
          <p>
            Entity and Sentiment Analysis . To make use of all data modalities in our
news videos, we investigate the combination of existing API's for textual,
voiceand face-based sentiment analysis [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] attached to entities. Also to be able to
attach the entities to particular sentiments [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] we use human computation to
identify or validate the output from sentiment analysis from machine learning
methods.
3.4
          </p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Evaluation</title>
        <p>Finally, we evaluate our approach with domain experts from humanities and
political sciences. Given an event, they are presented with an interface with
di erent graphs from our hybrid human-machine approach. The expert should
be able to use a representation of the event and di erent word clouds for the
same event from di erent channels and be able to perform the bias investigation.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Directions</title>
      <p>We presented how bias is manifested and can be captured with an approach
using state of the art machine learning and human computation. We mainly
focused on identifying the di erent bias cues such as framing, gender and race
misrepresentations in order to assist media scientists in video news analysis.
We want to apply this approach through a pilot experiment and compare the
di erent types of bias, their possible correlations and also perform a sentiment
analysis.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This research is supported by the Capture Bias project 9, part of the VWData
Research Programme funded by the Startimpuls programme of the Dutch
National Research Agenda, route "Value Creation through Responsible Access to
and use of Big Data" (NWO 400.17.605/4174).</p>
      <sec id="sec-5-1">
        <title>9 https://capturebias.eu/</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Borang,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Eising</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , Kluver, H.,
            <surname>Mahoney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Naurin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Rasch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Rozbicka</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Identifying frames: A comparison of research methods</article-title>
          .
          <source>Interest Groups &amp; Advocacy</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <volume>188</volume>
          {
          <fpage>201</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Calais</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.H.</given-names>
            ,
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Meira</surname>
          </string-name>
          , Jr.,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          :
          <article-title>From bias to opinion: A transfer-learning approach to real-time sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>150</volume>
          {
          <fpage>158</fpage>
          . KDD '11,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2011</year>
          ). https://doi.org/10.1145/2020408.2020438, http://doi.acm.
          <source>org/10</source>
          .1145/2020408.2020438
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dimitrova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frear</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazepus</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toshkov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boroda</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chulitskaya</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grytsenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Munteanu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parvan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramasheuskaya</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>The elements of russias soft power: Channels, tools, and actors promoting russian in uence in the eastern partnership countries (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dixon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sorensen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thain</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasserman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Measuring and mitigating unintended bias in text classi cation (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dixon</surname>
            ,
            <given-names>T.L.</given-names>
          </string-name>
          :
          <article-title>Good guys are still always in white? positive change and continued misrepresentation of race and crime on local television news</article-title>
          .
          <source>Communication Research</source>
          <volume>44</volume>
          (
          <issue>6</issue>
          ),
          <volume>775</volume>
          {
          <fpage>792</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gelman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azari</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>19 things we learned from the 2016 election</article-title>
          .
          <source>Statistics and Public Policy</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>10</fpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1080/2330443X.
          <year>2017</year>
          .
          <volume>1356775</volume>
          , https://doi.org/10.1080/2330443X.
          <year>2017</year>
          .1356775
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hackett</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>Decline of a paradigm? bias and objectivity in news media studies</article-title>
          .
          <source>Critical Studies in Mass Communication</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <volume>229</volume>
          {
          <fpage>259</fpage>
          (
          <year>1984</year>
          ). https://doi.org/10.1080/15295038409360036, https://doi.org/10.1080/15295038409360036
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kinnick</surname>
            ,
            <given-names>K.N.</given-names>
          </string-name>
          :
          <article-title>Gender bias in newspaper pro les of 1996 olympic athletes: A content analysis of ve major dailies</article-title>
          .
          <source>Women's Studies in Communication</source>
          <volume>21</volume>
          (
          <issue>2</issue>
          ),
          <volume>212</volume>
          {
          <fpage>237</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
          </string-name>
          , W.T.,
          <string-name>
            <surname>Strohmaier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>#snowden: Understanding biases introduced by behavioral di erences of opinion groups on social media</article-title>
          .
          <source>In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems</source>
          . pp.
          <volume>3352</volume>
          {
          <fpage>3363</fpage>
          . CHI '16,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2016</year>
          ). https://doi.org/10.1145/2858036.2858422, http://doi.acm.
          <source>org/10</source>
          .1145/2858036.2858422
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caverlee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Biaswatch: A lightweight system for discovering and tracking topic-sensitive opinion bias in social media</article-title>
          .
          <source>In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          . pp.
          <volume>213</volume>
          {
          <fpage>222</fpage>
          . CIKM '15,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2015</year>
          ). https://doi.org/10.1145/2806416.2806573, http://doi.acm.
          <source>org/10</source>
          .1145/2806416.2806573
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. N.,
          <string-name>
            <given-names>D.J.</given-names>
            ,
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.:</surname>
          </string-name>
          <article-title>The impact of media bias: How editorial slant a ects voters</article-title>
          .
          <source>Journal of Politics</source>
          <volume>67</volume>
          (
          <issue>4</issue>
          ),
          <volume>1030</volume>
          {
          <fpage>1049</fpage>
          . https://doi.org/10.1111/j.1468-
          <lpage>2508</lpage>
          .
          <year>2005</year>
          .
          <volume>00349</volume>
          .x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-
          <fpage>2508</fpage>
          .
          <year>2005</year>
          .
          <volume>00349</volume>
          .x
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Philo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Briant</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donald</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Bad news for refugees</article-title>
          . Pluto Press (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Poria</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
          </string-name>
          , E.:
          <article-title>Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis</article-title>
          .
          <source>Neurocomputing</source>
          <volume>261</volume>
          ,
          <issue>217</issue>
          {
          <fpage>230</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>