=Paper= {{Paper |id=Vol-2079/paper2 |storemode=property |title=Visualizing Polarity-based Stances of News Websites |pdfUrl=https://ceur-ws.org/Vol-2079/paper2.pdf |volume=Vol-2079 |authors=Masaharu Yoshioka,Myungha Jang,James Allan,Noriko Kando |dblpUrl=https://dblp.org/rec/conf/ecir/YoshiokaJAK18 }} ==Visualizing Polarity-based Stances of News Websites== https://ceur-ws.org/Vol-2079/paper2.pdf
     Visualizing Polarity-based Stances of News Websites

                      Masaharu Yoshioka                Myungha Jang James Allan
                      Hokkaido University                    UMass Amherst
                  Sapporo-shi, Hokkaido, Japan              Amherst, MA, USA
                   yoshioka@ist.hokudai.ac.jp          {mhjang, allan}@cs.umass.edu
                                            Noriko Kando
                                National Institute of Informatics (NII)
                                     Chiyoda-ku, Tokyo, Japan
                                           kando@nii.ac.jp



                                                                type of user, because content is the primary factor in
                                                                selecting articles, is exposed to news from more diverse
                       Abstract                                 sources, which demonstrates a wider array of political
                                                                stances. Users must therefore use their own judgment
    We develop a novel framework that helps iden-               to selectively digest what they read, especially for con-
    tify potential bias in news websites to sup-                troversial topics.
    port users who are exposed to news articles                    Many users judge the trustworthiness of new web-
    with a wide variety of political leanings. We               sites based on their political bias. Hence, we propose
    propose a polarity-based stance (PS), a vec-                a novel framework that represents the bias of news
    tor that represents how often a website pub-                websites toward a particular topic as a vector. Us-
    lishes articles that are positive or negative               ing this framework, we then visualize stances of news
    with regard to a topic. We derive PS using                  websites toward a given topic. For this, we define a
    the GDELT database and visualize the news                   polarity-based stance, a vector that represents bias to-
    websites’ stances. We demonstrate the utility               ward a particular topic of a website using the polarity
    of our framework via a case study of the 2016               of stances. This allows us to visualize the stance of
    US Presidential Election.                                   news websites, guiding users for the potential bias of
                                                                the articles published by the websites. We demon-
1    Introduction                                               strate the usefulness of our framework via the case
                                                                study of 2016 US President Election using the GDELT
There are two types of users when it comes to their
                                                                database1 .
pattern of news navigation. The first type already has
particular news websites that they trust and actively
use by accessing them directly for news. Such web-
sites tend to demonstrate the same political stances or         2     Polarity-based Stances
leanings as their users. As a result, the articles that                                                         −→
they read are likely ones that already share their ide-         We formally define a polarity-based stance, P S w , as
ologies. The other type, those who are less politically         a two-dimensional vector that denotes the stance of a
engaged, use a news aggregation website that shows a            website w. We first assume that each article of the
compiled list of news articles from various sources. A          website has one of three stances: positive, negative,
                                                                                      −→
key difference in the two approaches is that the latter         or neutral. We let P S w = [p, n] where p is the ra-
                                                                tio of positively-stanced articles and n is the ratio of
    Copyright c 2018 for the individual papers by the papers’   negatively-stanced articles for a particular topic. Note
authors. Copying permitted for private and academic purposes.   that the stance has been identified beforehand. We
This volume is published and copyrighted by its editors.        discuss how to use the GDELT database to derive this
     In: D. Albakour, D. Corney, J. Gonzalo, M. Martinez,       vector.
B. Poblete, A. Vlachos (eds.): Proceedings of the NewsIR’18
Workshop at ECIR, Grenoble, France, 26-March-2018, pub-
lished at http://ceur-ws.org                                        1 https://www.gdeltproject.org
2.1   Dataset                                                3    Case Study
The GDELT database is one of the largest news article        We demonstrate the utility of our approach via a case
repositories collected by the Google Jigsaw project. It      study of the 2016 US Presidential Election around two
is a useful resource for multifaceted analysis for news      topics: Donald Trump and Hillary Clinton. To visu-
articles because it has a large amount of data and con-      alize the polarity-based stances for these topics, we
tains the metadata including the source website that         estimate the set of news articles on each topic using a
are automatically extracted from various NLP algo-           simple Boolean query. When an article references both
rithms for the crawled articles [YK16].                      Trump and Clinton, there is ambiguity about which
   We use tone, one type of automatically generated          topic is indicated by the tone. We therefore identify
                       −→
metadata, to derive P S w . Tone refers to the average       the set of articles that exclusively references only one
attitude of the article, which is computed by the differ-    of the topic to compute the polarity-based stances (see
ence between the percentage of positive and negative         Table 2).
terms in the document[Pro15]. Calculation of polarity        Table 2: The numbers of articles for the boolean
score based on the term matching is simple and it is         queries of “Donald Trump”(DT) and “Hillary Clin-
better to use more sophisticated methodology [RR15].         ton”(HC) (The numbers in the parenthesis indicates
However, due to the large numbers of the articles for        the total number of articles that contain DT and HC)
analysis, it is almost impossible for the GDELT users                     Query      # of articles
to crawl the all text of the articles and calculate scores               DT - HC 677,307 (1,516,225)
for them. For the case study analysis later, we use arti-                HC - DT 388,162 (1,227,080)
cles from the GDELT database published on the 2016                      DT or HC 838,918
US Presidential Election during a three month period
that includes voting day (see Table 1).                         Table 3 shows distributions of tone (-100 to 100) in
                                                             the articles retrieved by DT-HC and HC-DT as queries
Table 1: The description on the article dataset in the       using their number. For both queries, numbers of arti-
GDELT database used.                                         cles for negative tone are larger than one for positive,
                                                             but the difference is not so large in general2 . So we set
 Period                    Sep 1, 2016 - Nov 30, 2016
                                                             the value of σ = 1 in equation 1 for this experiment.
 # of Articles             22.4M (0.2M per a day)
                                                             However, it is better to check how σ affects the final
 # of News Websites 44,624
                                                             results in the future research.
                                                             Table 3: Distribution of tone (using number of articles)
2.2    Deriving Polarity-based Stances                       in the retrieved articles
              −→                                                            Tone       DT-HC HC - DT
We compute P S w using the tone score provided by the                    [−100, −3] 188,709 89,283
GDELT database. Let d be a news article published                         (−3, −2]     95,665     51,781
by a news website w and t be the tone of d. We classify                   (−2, −1]     109,575 67,006
the document stance sd into one of three classes: pos-                     (−1, 0]     123,231 74,878
itive (1), neutral (0), and negative (-1). The stance is                    (0, 1)     65,554     42,080
derived from t given a threshold σ using the equation                       [1, 2)     46,999     31,618
                                                                           [2, 3)     23,528     15,510
                      1
                            t>σ                                           [3, 100]    24,046     16,006
                sd = 0       −σ < t < σ             (1)
                      
                      
                        −1 t < σ                                Figure 1 and 2 show the scatter plot of polarity-
                                                             based stances of various news websites for the Trump
                                             −→              and Clinton topics. In these plots, we include news
   We then define a polarity-based stance (P S w ) for a
website (w) using the equation                               websites that published more than 30 articles for the
                                                             particular topic. Each circle indicates a news web-
                 d∈wτ (1[sd = 1])        (1[sd = −1]) 
−→           P                    P
                                                             site with a radius that signifies the number of articles.
P S w (τ ) =                      , d∈wτ                     The top 20 news websites that published the most ar-
                      |w|                 |w|
                                                     (2)     ticles exclusively on Trump and Clinton are indicated
where wτ is a set of articles on τ published by w. By        by colored circles. Note that a new website with a
plotting these stances on a graph, users can compare         small number of articles is shown as a point.
stances of different news websites.                             To visualize the bias of the websites (toward Trump
   In addition, bias can be identified by comparing          or Clinton), we plot the absolute difference of positive
stances of the similar topics or one with a particular         2 Most of the articles have their tone values between -3 to 3

topic and general topic.                                     (DT-HC:69%, HC-DT:73%)
                                                                                                                            and negative articles ratio for Trump and Clinton in
                                                                                                                            Figure 3. We let Diff(τ ) to be the absolute differ-
                                                                                                                                                                       −→
                                                                                                                            ence between the two components of P S w (τ ). We
                                                                                                                            plot Diff(T rump) and Diff(Clinton) for compari-
                                                                                                                            son (See Figure 3). The websites whose bias towards
                                    1                                                                         iheart.com
                                                                                                                            the two topics are the same are plotted on the line of
                                                                                                              yahoo.com
                                                                                                      freerepublic.com      (Diff(T rump = Diff(Clinton)). The points at the
                                                                                                                   ap.org
                                  0.8                                                                        reuters.com    top left of the plot are the articles that are positively-
Negative Article Ratio(Trump)




                                                                                               newsviewsnreviews.com
                                                                                                                 wn.com
                                                                                                   washingtonpost.com
                                                                                                                            stanced towards Clinton, and the ones at the bottom
                                                                                                         dailymail.co.uk
                                  0.6                                                                  alltechnews.org      right are positively-stanced towards Trump. The plot
                                                                                                    huffingtonpost.com
                                                                                                          avauncer.com      helps us identify the news websites whose polarity-
                                                                                                        bloomberg.com
                                  0.4                                                         washingtonexaminer.com
                                                                                                           einnews.com
                                                                                                                            based stances are completely different between the two
                                                                                                              sfgate.com
                                                                                                           foxnews.com
                                                                                                                            topics. For example, thebostonpilot.com has (0.15,
                                                                                                    contacto-latino.com
                                  0.2                                                                         chron.com     0.27) for ”Trump”, and (0.16, 0.65) for ”Clinton” and
                                                                                               princegeorgecitizen.com
                                                                                                                            sci-tech-today.com as (0.02, 0.90) for ”Trump”, and
                                    0
                                         0    0.2           0.4        0.6        0.8     1
                                                                                                                            (0.41, 0.25) for ”Clinton”. It is important to take into
                                                    Positive Article Ratio(Trump)                                           account such bias when such big difference happens.
Figure 1: The polarity-based stances of the Trump
                                                                                                                            4    Conclusion
topic visualized in a scatter plot
                                    1                                                                         iheart.com    In this paper, we propose a framework to visualize
                                                                                                              yahoo.com
                                                                                                      freerepublic.com
                                                                                                                   ap.org
                                                                                                                            stances in the dimensions of polarity of news websites
                                  0.8                                                                        reuters.com    to identify a potential bias in the articles that are pub-
Negative Article Ratio(Hillary)




                                                                                               newsviewsnreviews.com
                                                                                                                 wn.com
                                                                                                   washingtonpost.com       lished by them. We define a vector named Polarity-
                                                                                                         dailymail.co.uk
                                  0.6                                                                  alltechnews.org      based Stance and demonstrate the utility via a case
                                                                                                    huffingtonpost.com
                                                                                                          avauncer.com
                                                                                                        bloomberg.com
                                                                                                                            study of 2016 U.S. Presidential Eleciton, and that the
                                  0.4                                                         washingtonexaminer.com        GDELT database is a useful resource for this type of
                                                                                                           einnews.com
                                                                                                              sfgate.com
                                                                                                           foxnews.com      analysis. As a future work, we plan to apply our frame-
                                                                                                    contacto-latino.com
                                  0.2                                                                         chron.com     work to a variety of topics for evaluation. We observe
                                                                                               princegeorgecitizen.com
                                                                                                                            that some topics generally have a higher positive, or
                                    0                                                                                       negative articles than the others. We plan to study
                                         0    0.2          0.4         0.6          0.8   1
                                                    Positive Article Ratio(Hillary)                                         how to take this factor into account to visualize stances
                                                                                                                            in an useful way.
Figure 2: The polarity-based stances of the Clinton
topic visualized in a scatter plot                                                                                          Acknowledgment
                                    1                                                                         iheart.com
                                                                                                              yahoo.com     This work was partially supported by JSPS KAKENHI
                                                                                                      freerepublic.com
                                                                                                                   ap.org
                                                                                                             reuters.com    Grant Number 16H01756.
                                  0.5                                                          newsviewsnreviews.com
Positive - Negative (Hillary)




                                                                                                                 wn.com
                                                                                                   washingtonpost.com
                                                                                                         dailymail.co.uk
                                                                                                       alltechnews.org
                                                                                                                            References
                                                                                                    huffingtonpost.com
                                    0                                                                     avauncer.com
                                                                                                        bloomberg.com       [Pro15] GDELT Project. The gdelt global knowledge
                                                                                              washingtonexaminer.com
                                                                                                           einnews.com              graph (gkg) data format codebook v2.1, 2015.
                                                                                                              sfgate.com
                                  -0.5                                                                     foxnews.com
                                                                                                    contacto-latino.com
                                                                                                              chron.com     [RR15] Kumar Ravi and Vadlamani Ravi. A sur-
                                                                                               princegeorgecitizen.com
                                                                                                                                   vey on opinion mining and sentiment anal-
                                   -1
                                         -1     -0.5           0           0.5            1
                                                                                                                                   ysis: Tasks, approaches and applications.
                                                  Positive - Negative (Trump)                                                      Knowledge-Based Systems, 89:14 – 46, 2015.

Figure 3: Diff(Trump) and Diff(Clinton) to compare                                                                          [YK16] Masaharu Yoshioka and Noriko Kando. Com-
their polarity-based stances                                                                                                       parative analysis of gdelt data using the news
                                                                                                                                   site contrast system. In The first International
                                                                                                                                   Workshop on Recent Trends in News Informa-
                                                                                                                                   tion Retrieval (NewsIR), 2016.