=Paper= {{Paper |id=Vol-2482/paper13 |storemode=property |title=Understanding Characteristics of Biased Sentences in News Articles |pdfUrl=https://ceur-ws.org/Vol-2482/paper13.pdf |volume=Vol-2482 |authors=Sora Lim,Adam Jatowt,Masatoshi Yoshikawa |dblpUrl=https://dblp.org/rec/conf/cikm/LimJY18 }} ==Understanding Characteristics of Biased Sentences in News Articles== https://ceur-ws.org/Vol-2482/paper13.pdf

Understanding Characteristics of Biased Sentences in
News Articles

Sora Lim Adam Jatowt Masatoshi Yoshikawa
Kyoto University Kyoto University Kyoto University
Kyoto, Japan Kyoto, Japan Kyoto, Japan
lim.sora.88u@st.kyoto-u.ac.jp adam@dl.kuis.kyoto-u.ac.jp yoshikawa@i.kyoto-u.ac.jp

own views towards the society, politics and other top-
ics. Furthermore, they need to attract readers to make
Abstract their businesses profitable. This frequently leads to the
potentially harmful reporting style resulting in biased
Providing balanced and good quality news ar- news.
ticles to readers is an important challenge in To overcome news bias, as a remedy, users often
news recommendation. Often, readers tend try to choose news articles from news sources (outlets)
to select and read articles which confirm their which are known to be relatively unbiased. Ideally, this
social environment and their political beliefs. should be performed by corresponding recommender
This issue is also known as filter bubble. As a systems. However, bias-free article recommendations
remedy, initial approaches towards automati- are still not feasible given the state-of-the-art. Fur-
cally detecting bias in news articles have been thermore, the recommendations might not be trusted
developed. Obtaining a suitable ground truth by users, as readers often need concrete evidence of
for such a task is however difficult. In this pa- bias in the form of bias-inducing words and similar
per, we describe ground truth dataset created aspects.
with the help of crowd-sourcing for fostering In this paper, we focus on understanding news bias
research on bias detection and removal from and on developing a high-quality gold standard for
news content. We then analyze the charac- fostering bias-detection studies on the sentence and
teristics of the user annotations, in particular word levels. We assume here that word choices made
concerning bias-inducing words. Our results by articles’ authors might reflect some bias in terms
indicate that determining bias-induced words of their viewpoint. For example, the phrases “illegal
is subjective to certain degree and that a high immigrants” and “undocumented immigrants” chosen
agreement on all bias-inducing words of all by news reporters to refer to immigrants in relation
readers is hard to obtain. We also study the to Donald Trump’s decision to rescind Deferred Ac-
discriminative characteristics of biased con- tion for Childhood Arrivals may be considered as case
tent and find that linguistic features, such as where the choice of words can result in a bias. Here,
negative words, tend to be indicative for bias. the use of the word “illegal” degrades the immigrants
by inducing more negative value than in the case of us-
1 Introduction ing the adjective “undocumented”. By such nuanced
In news reporting it is important for both authors word choices, news authors may imply their stance on
and readers to maintain high fairness, accuracy, and the news event and deliver biased view to the readers.
to keep balance between different view points. How- It is, however, challenging to identify words that
ever, bias in news articles has become a major issue cause the article to have biased points of view
[GM05, Ben16] even though many news outlets claim [BEQ+ 15]. The bias inherent in news articles tend to
to have dedicated policy to assure the objectiveness in be subtle and intricate. In this research, we construct
their articles. Different news sources may have their a comparable news dataset which consists of news ar-
ticles reporting the same news event. The objective is
Copyright © CIKM 2018 for the individual papers by the papers' to help designing methods to detect bias triggers1 and
authors. Copyright © CIKM 2018 for the volume as a collection
1 https://github.com/skymoonlight/newsdata-bias
by its editors. This volume and its papers are published under
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
shed new light on the way in which users recognize
Table 1: Statistics of Labeled Sentences
bias in news articles. To the best of our knowledge,
this is the first dataset with annotated bias words in
Total number of news articles 88
news articles. In the following, we describe the design Total number of sentences 1,235
of the crowd-sourcing task to obtain the bias labels Average tagged sentences per a news article 73.48%
for the news words and we subsequently analyze the No. of sentences including tagged words 826 (66.88%)
characteristics of detected biased content in news. No. of tagged sentences on agreement level 2 431 (34.90%)
No. of tagged sentences on agreement level 3 173 (14.01%)
No. of tagged sentences on agreement level 4 42 (3.40%)
2 Related Works No. of tagged sentences on agreement level 5 7 (0.57%)

Several prior works have focused on media bias in gen-
eral and news bias in particular. Generally, accord- guage in political news as well as features from theo-
ing to D’Allessio and Allen [DA00], media bias can retical literature on framing.
be divided into three different types: (1) gatekeeping,
(2) coverage and (3) statement bias. Gatekeeping bias 3 Annotating Bias in News Articles
is a selection of stories out of the potential stories;
3.1 Dataset
coverage bias expresses how much space specific po-
sitions receive in media; statement bias, in contrast, To detect the subtle differences which cause bias, one
denotes how an author’s own opinion is woven into a way is to compare words across the content of different
text. Similarly, Alsem et al. [ABHK08] divide news news articles which are reporting the same news event.
bias into ideology and spin. Ideology reflects news out- This should allow for pinpointing differences in the
lets’ desire to affect readers’ opinions in a particular subtle use of words by different authors from diverse
direction. Spin reflects the outlet’s attempt to simply media outlets to describe the same event. Although,
create a memorable story. Given these distinctions, we many news datasets were created for news analysis, to
consider the bias type tackled in this paper as state- the best of our knowledge, none focused on a single
ment bias w.r.t. [DA00] and as spin bias according to event while, at the same time, covering many news
[ABHK08]. articles from various news outlets from a short time
Several researches made efforts to provide effective range.
means for solving the news bias problem. However, We selected the news event titled “Black men ar-
most of them have focused on the news diversification rested in Starbucks” which has caused controversial
according to the content similarity and the political discussions on racism. The event happened on April
stance of news outlets. Park et al. [PKCS09], for in- 12, 2018. We focused on news articles written on April
stance, have developed a news diversification system, 15, 2018 as the event was widely reported in different
named NewsCube, to mitigate the bias problem by pro- news on that day.
viding diverse information to the users. Hambourg For collecting news articles from various news out-
et al. [HMG17] presented a matrix-based news analy- lets we used Google News2 . Google News is a conve-
sis to display various perspectives for the same news nient source for our case as it already clusters news
topic in a two-dimensional matrix. An et al. [ACG+ 12] articles concerning the same event coming from vari-
revealed skewness of news outlets by analyzing their ous sources. We first crawled all news articles available
news contents spread throughout tweets. online that described the aforementioned event. Based
Alonso et al. [ADS17] focused on omissions between on manual inspection, we then verified whether all arti-
news statements which are similar but not identical. cles are about the same news event. We next extracted
The omission occupies one category in news bias in the titles and text content from the crawled pages ig-
that it is a means of statement bias [GS06]. Ogawa et noring pages which covered only pictures or contained
al. [OMY11] attempted to describe the relationship be- only a single sentence. In the end, our dataset con-
tween main participants in news articles to detect news sists of 89 news articles with 1,235 sentences and 2,542
bias. To catch describing way of the relationship, they unique words from 83 news outlets. Articles contain
expanded sentiment words in SentiWordNet [BES10]. on average 14 paragraphs.
Other works focused on linguistic analysis for bias
detection on text data. Recasens et al. [RDJ13] tar- 3.2 Bias Labeling via Crowd-Sourcing
geted detecting bias words from the revised sentence
To overcome scalability issue in annotations, crowd-
history in Wikipedia. They utilized NPOV tags for
sourcing has been widely used [FMK+ 10, ZLP+ 15].
bias labels, and linguistically categorized resources for
We also use crowdsourcing to collect bias labels and
the bias feature. Baumer et al. [BEQ+ 15] used Re-
casens et al.’s linguistic features to identify biased lan- 2 https://news.google.com/?hl=en-US&gl=US&ceid=US:en
we choose Figure Eight3 as our platform. Figure Eight 88 documents, we collected 2,982 bias words (1,647
(called CrowdFlower until March 2018) has been used unique words) covered by 1,546 non-overlapping an-
in a variety of annotation tasks and is especially suit- notations.
able for our purposes due to the focus on producing
high-quality annotations. We note that it is difficult 3.3 Analysis of Perceived News Bias
to obtain bias-related label information such as binary
judgements on each sentence of news articles, as the We next analyze what kind of words are tagged as bias
bias may depend on the news event and its context. triggers by the workers. First, we analyze the phrases
To design the bias labeling task, we divided the news annotated as biased in terms of the word length. Each
dataset into one reference news article4 and 88 target annotation consists of four words on average (examples
news articles. Having a reference news article, users being “did absolutely nothing wrong”, “putting them
could first get familiar with the overall event. Fur- in handcuffs”, “racism and racial profiling”, “merely
thermore, the motivation was to have some reference for their race”, and “Starbucks manager was white”).
text which being relatively bias-free allows for detect- Most answers submitted by workers are, however, sin-
ing bias content in a target article. Our reference ar- gle words, for example, “accuse”, “absurd”, “boy-
ticle has been selected after being manually judged as cott”, “discrimination”, and “outrage”. These exam-
relatively unbiased according to several annotators. ples also show a tendency of negative sentiment and
We let the workers make judgements on each tar- that rather extreme, emotion-related words are anno-
get news article (using also the reference news article). tated, which could be extracted almost without consid-
Each article has been independently annotated by 5 ering the context. As second most frequent phrase pat-
workers. In order to ensure a high-quality labeling, tern, three words in a sentence have been annotated,
we produced various test questions to filter out low such as “absolutely nothing wrong”, “accusations of
quality answers. To create reliable answers to our test racism”, “black men arrested”, “who is black”, and
questions, we conducted a preliminary labeling task on “other white ppl”. These are typical combinations of
a set of five randomly selected news articles from the sentiment words and modifiers or intensifiers. These
same news collection, plus the same reference news sentiment words (with positive or negative polarity)
article used for comparison. Nine graduate students are typically associated with the overall topic or event
(male: 6, female: 3) labeled bias-inducing words in and can also be considered as outstanding or salient
these news articles. The words which have been la- to some degree.
beled as “bias-inducing” by at least two people were We aggregated the answers of the crowd-workers on
considered as “biased” in general and served as ground the sentence level assuming that if a sentence includes
truth for our test questions. any word annotated as biased, the sentence itself is
The instructions and main questions given to the biased. Note that the information on sentence level
workers in the crowdsourcing tasks and to annotators bias might be enough for the purpose of automatic
in the preliminary task can be summarized as follows: bias detection. However, we let users annotate the
1. Read the target news article and the reference news specific bias-inducing phrases, since this lets us gain a
article. fine-grained insight in the actual thoughts of users and
2. Check the degree of bias of the target news article by
allows to choose appropriate machine learning features
comparing with the reference news article.
for bias-detection algorithms, as well as to show con-
• not at all biased, slightly biased, fairly biased,
strongly biased. crete evidence of bias-inducing aspects in the texts to
3. Select and submit words or phrases which cause the users. Table 1 shows the statistics of the dataset and
bias, compared to the reference news article. labeled results. Agreement level n denotes that only
• Submit words or phrases with the line identifier. annotations tagged by at least n people are consid-
• Try to submit as short as possible content and ered. When we only consider the unique, i.e., fusioned
don’t submit whole paragraphs. answers from the workers, among 1,235 sentences in
• If no bias inducing words are found, submit the whole data set, 826 sentences (66.88%) included
“none”. bias-annotated words. On average, 73.48% of the sen-
4. Select your level of understanding of the news story tences would be then considered potentially biased in
• four scale ratings from “I didn’t understand at an article. Yet, assuming an agreement of 2 workers
all.” to “I understood well.”
the average number of biased sentences is 34.9%, while
In total, 60 workers participated in the task. We for n = 3 the corresponding number is 14.01%. These
only used the answers from 25 reliable workers who statistics reveal that people consider different words as
passed at least 50% of test questions. Overall, for representing biased content through different words.
3 https://www.figure-eight.com/. Inter-rater agreement. We next investigated the
4 https://reut.rs/2ve3rMz inter-rater agreement among the five workers’ answers
1.0
Table 2: POS Feature Effects by t-test in Each Agree-
0.8
ment Level5
0.6
0.4 Agreement Level 1 3 5

0.2 Cardinal number (CC) 5.19 4.0554
Determiner (DT) 4.87 -4.4403
0.0 Existential there (EX) 3.81 -6.9333
0.2 Preposition/subordinating 7.63 3.4378
Krippendorff's Alpha Pairwise Jaccard participle conjunction (IN)
Adjective (JJ) 9.2987 3.4507
Figure 1: Inter-rater reliability on the Crowdsourcing Adjective, superative (JJS) -7.6947
result: (a) Krippendorff’s alpha (b) Pairwise Jaccard. Noun (NN) 7.5422
Noun, plural (NNS) 5.3969
Predeterminer (PDT) 3.7788 -8.7549
for the each target news. We calculated Krippen- Adverb 5.3142
dorff’s alpha and pairwise Jaccard similarity coeffi- Adverb, superative (RBR) -3.4822 -3.4797
cients. Krippendorff’s alpha are used for quantifying Particle 5.6674 -11.969
the extent of agreement among multiple raters, and Verb, past tense (VBD) 6.5408
Jaccard similarity is mainly used for comparing the Verb, gerund/present 7.4645 3.3702
similarity between two sets. Here, we regard each (VBG)
sentence in a target news as item to be measured. Verb, past participle (VBN) 8.2355 4.0162 -2.6979
The mean scores calculated over all the target articles Verb, 3rd ps. sing. present 6.1593 3.713
are 0.513 for Krippendorff’, and 0.222 for Jaccard, as (VBZ)
shown also in Figure 1. The agreement scores show Wh-pronoun (WP) 5.4197 2.4701
relatively low tendency which means the answers from Wh-adverv (WRB) -15.243
the five workers are diverse and with slight agreement.
In practice, it is hard to get substantial agreement on
the arrest, therefore, many negative words affect to
news articles in general [NR10]. This may have several
the bias cognition of users. Interestingly, factive verbs
reasons in our case: Firstly, the degree of perception
do not show any significant difference.
concerning bias differs from person to person. Sec-
For the preliminary experiments, we next use the
ondly, the answer coverage by people is different and
POS tags and the mentioned linguistic features for
imperfect. For example, some people might feel it is
approaching the task of automatically detecting bias.
enough to submit around five different answers on a
We employ a standard SVM model and use randomly
target news article, while others might try to find as
selected 80% of the sentences for training the model
many as possible evidences of biased content. It is then
and the remaining 20% of sentences for testing. The
hard to decide whether the differences are from insin-
classification accuracy is 70%. As our data set is pri-
cerity of individuals or the matter of their perception.
marily designed for linguistic analysis, larger numbers
Analysis of POS tags. We investigated the part
of train/test examples are needed for obtaining more
of speech tags included in the sentences. The Stanford
reliable evaluation results.
POS Tagger [TKMS03] was employed in this process.
Further extensions. We analyzed bias in the
To that end, we considered different agreement levels,
news sentences perceived by people using crowdsourc-
i.e., the minimum number of users who tag words as
ing. In this research, we used a news event that oc-
biased in the same sentences. We conducted the t-
curred in a short time period. Thus, users do not need
test for the bias tagged sentences and non-tagged sen-
to spend much time to understand the context of the
tences. Table 2 shows the statistically significant POS
news event. However, in case of a long time lasting
tags under the p-value < 0.001.
news event, the news topic tends to be complicated or
Analysis of further linguistic features. We
consists of many sub-events and there might be many
also investigate words by using the linguistic cate-
aspects to be aware of. For example, politics-related
gories proposed by [RDJ13], including sentiment, sub-
news events, typically have a long time span when
ject/object, verb types, named entity and so on. In
they cover elections the reports on actions of candi-
Table 3, we observe that the most significant word
dates appear in the weeks beforehand. For detecting
category is negative subject words in agreement level
and/or minimizing the news bias under more complex
1. Also weak subject words and negative words are
situations, an alternative strategy for obtaining a rea-
shown to be significant. We believe this result is be-
cause our news event is controversial and related to 5 Only significant results are shown (p < 0.001).
achieved by measuring the effect of article read-
Table 3: Linguistic Feature Effects by t-test in Each
ing by not only asking readers before and after
Agreement Level5
the reading about their opinion on topic/event,
but also by correlating the read news with ac-
Agreement Level 1 3 5 tions, such as the votes of readers in upcoming
Factive verb -10.154 elections.
Assertive verb -3.2339 -4.3784 Acknowledgments This research was supported
Implicative verb -3.7975 in part by MEXT grants (#17H01828; #18K19841;
Entailment -2.7975 #18H03243).
Weak subject word 5.5862 4.917
Negative word 7.5961 5.6002
Bias Lexicon -2.9986
References
Named Entity 3.375 [ABHK08] Karel Jan Alsem, Steven Brakman, Lex
Negative subject words 9.7921 8.2414 Hoogduin, and Gerard Kuper. The im-
pact of newspapers on consumer confi-
dence: does spin bias exist? Applied Eco-
sonable ground truth concerning news bias might be
nomics, 40(5):531–539, 2008.
to focus on credibility aspects and to target the recom-
mendation of citations to clearly and formally stated [ACG+ 12] Jisun An, Meeyoung Cha, Krishna P Gum-
facts and/or events, such as ones in existing knowledge madi, Jon Crowcroft, and Daniele Quercia.
bases. Visualizing media bias through Twitter. In
Proc. of ICWSM SocMedNews Workshop,
4 Conclusions and Future Works 2012.
Detecting news bias is a challenging task for computer [ADS17] Héctor Martı́nez Alonso, Amaury Dela-
science as well as linguistics and media research areas maire, and Benoı̂t Sagot. Annotating
due to the subtle nature and heterogeneous, diverse omission in statement pairs. In Proc. of
kinds of biases. In this paper, we set up a crowdsourc- LAW@EACL 2017, pages 41–45, 2017.
ing task to annotate news articles concerning bias-
inducing words. We then analyzed features concerning [Ben16] W Lance Bennett. News: The politics of
the annotated words based on different user agreement illusion. University of Chicago Press, 2016.
levels. Based on the results, we make the following
conclusions: [BEQ+ 15] Eric Baumer, Elisha Elovic, Ying Qin,
1. Generally, it is hard to reach an agreement among Francesca Polletta, and Geri Gay. Testing
users concerning biased words or sentences. and comparing computational approaches
2. According to results, it is reasonable to focus on for identifying the language of framing in
linguistic features, such as negative words, nega- political news. In Proc. of NAACL HLT
tive subjective words, etc. for detecting bias on 2015, pages 1472–1482, 2015.
a word level. This also means that for detect-
ing bias, capturing the context, such as having [BES10] Stefano Baccianella, Andrea Esuli, and
semantically-structured representations of state- Fabrizio Sebastiani. SentiWordNet 3.0: An
ments or sentences might not be needed for a shal- Enhanced Lexical Resource for Sentiment
low bias detection. Analysis and Opinion Mining. In Proc of
3. Our experiments on the characteristics of bias- LREC 2010, 2010.
inducing words indicate that presenting the read-
ers with bias-inducing words (e.g., by highlighting [DA00] Dave D’Alessio and Mike Allen. Media bias
them in the text) is still worthwhile to be pursued in presidential elections: A meta-analysis.
in the future. Journal of communication, 50(4):133–156,
4. A deeper analysis of bias in the news is needed. 2000.
Current efforts, such as the SemEval 2019 Task 4
(“Hyperpartisan News Detection”)6 , can be seen [FMK+ 10] Tim Finin, William Murnane, Anand
as first steps in this direction. More generally, we Karandikar, Nicholas Keller, Justin Mar-
argue that we need novel ways to measure the ac- tineau, and Mark Dredze. Annotating
tual bias of news (and other texts). This could be Named Entities in Twitter Data with
Crowdsourcing. In Proc. of CSLDAMT’10,
6 https://pan.webis.de/semeval19/semeval19-web/ pages 80–88, 2010.
[GM05] Tim Groseclose and Jeffrey Milyo. A mea-
sure of media bias. The Quarterly Journal
of Economics, 120(4):1191–1237, 2005.

[GS06] Matthew Gentzkow and Jesse M Shapiro.
Media bias and reputation. Journal of po-
litical Economy, 114(2):280–316, 2006.
[HMG17] Felix Hamborg, Norman Meuschke, and
Bela Gipp. Matrix-Based News Aggrega-
tion: Exploring Different News Perspec-
tives. In Proc. of JCDL 2017, pages 69–78,
2017.
[NR10] Stefanie Nowak and Stefan M. Rüger. How
reliable are annotations via crowdsourcing:
a study about inter-annotator agreement
for multi-label image annotation. In Proc.
of MIR 2010, pages 557–566, 2010.
[OMY11] Tatsuya Ogawa, Qiang Ma, and Masatoshi
Yoshikawa. News bias analysis based on
stakeholder mining. IEICE Transactions,
94-D(3):578–586, 2011.
[PKCS09] Souneil Park, Seungwoo Kang, Sangyoung
Chung, and Junehwa Song. NewsCube:
delivering multiple aspects of news to mit-
igate media bias. In Proc. of SIGCHI
on Human Factors in Computing Systems,
pages 443–452, 2009.
[RDJ13] Marta Recasens, Cristian Danescu-
Niculescu-Mizil, and Dan Jurafsky.
Linguistic models for analyzing and de-
tecting biased language. In Proc. of ACL
2013, volume 1, pages 1650–1659, 2013.
[TKMS03] Kristina Toutanova, Dan Klein, Christo-
pher D. Manning, and Yoram Singer.
Feature-Rich Part-of-Speech Tagging with
a Cyclic Dependency Network. In Proc. of
HLT-NAACL 2003, pages 173–180, 2003.
[ZLP+ 15] Arkaitz Zubiaga, Maria Liakata, Rob
Procter, Kalina Bontcheva, and Peter
Tolmie. Crowdsourcing the annotation of
rumourous conversations in social media.
In Proc. of WWW 2015, pages 347–353,
2015.