=Paper=
{{Paper
|id=Vol-2068/esida3
|storemode=property
|title=Searching for Diverse Perspectives in News Articles: Using an LSTM Network to Classify Sentiment
|pdfUrl=https://ceur-ws.org/Vol-2068/esida3.pdf
|volume=Vol-2068
|authors=Christopher Harris
|dblpUrl=https://dblp.org/rec/conf/iui/Harris18
}}
==Searching for Diverse Perspectives in News Articles: Using an LSTM Network to Classify Sentiment==
Searching for Diverse Perspectives in News Articles: Using an LSTM Network to Classify Sentiment Christopher Harris Department of Computer Science University of Northern Colorado Greeley, CO 80639 ABSTRACT elections. However, there are ways to evaluate and When searching for emerging news on named entities, many categorize this variation in reporting. Sentiment analysis, users wish to find articles containing a variety of which has been widely applied to classifying movie and perspectives. Advances in sentiment analysis, particularly product reviews, could also be applied to the sentiment used by tools that use Recurrent Neural Networks (RNNs), have in reporting news articles, particularly those that focus on a made impressive gains in their accuracy handling NLP tasks specific named entity. Although early approaches in such as sentiment analysis. Here we describe and implement sentiment analysis suffered from poor accuracy, recent a special type of RNN called a Long Short Term Memory advances – particularly applying deep learning techniques (LSTM) network to detect and classify sentiment in a such as Recurrent Neural Networks (RNNs) – have increased collection of news articles. Using an interactive query its accuracy and can even distinguish the sentiment between interface created expressly for this purpose, we conduct an different named entities when an article contains references empirical study in which we ask users to classify sentiment to more than one entity. on named entities in articles and then we compare these sentiment classifications with those obtained from our It is important for search systems to work with named entities LSTM network. We compare this sentiment in articles that containing both informal text (i.e., blog posts) and formal mention the named entity in a collection of news articles. text (i.e., news articles). To this end, it is also important to Last, we discuss how this analysis can identify outliers and distinguish these different types of sources to the user. When help detect fake news articles. information on a named entity appears from a verified news source, it carries a different weight (in terms of authenticity) Author Keywords from a blog posting from a non-expert; the user should be Sentiment analysis; RNN; LSTM; named entities; artificial made aware of this provenance in the search results and be neural networks; news analysis; fake news. able to filter the search results based on the verifiability of ACM Classification Keywords the news. I.5.1 [Pattern Recognition]: Models → Neural nets; I.2.7 With the rise in social media as a user’s primary news source [Artificial Intelligence] → Natural Language Processing. [9], misleading news articles called fake news have clouded H.3.3 [Information systems] → Information retrieval many users’ ability to determine if a news article has merit diversity or if it is a deliberate attempt to misinform and spread a hoax. INTRODUCTION Recently, more attention from the NLP community has been Named entities, which we define as information units such placed on identifying fake news, which we define as as person, organization and location names, are extremely propaganda disguised as real news that is created to mislead popular components of user queries. For example, Yin and readers and damage a person’s, an agency’s, or an entity’s Shah found that nearly 30% of searches on the Bing search reputation. engine were simply a named entity and 71% of searches A study conducted following the 2016 election found 64% contained a named entity as part of the query string [13]. of adults indicated that fake news articles caused a great deal Thus, the proper identification and handling of named of confusion and 23% said they had shared fabricated articles entities is essential to provide an excellent search experience. themselves – sometimes by mistake and sometimes There has been a growing number of voices who claim bias intentionally [3]. We believe that sentiment analysis, when in reporting from media sources, particularly (but not limited done properly, can be used to separate news from genuine to) named entities in politics and entertainment. News news sources from fake news. We explore this concept articles covering the same named entity can be reported from briefly in this paper. a variety of perspectives, some sympathetic to the subject BACKGROUND AND MOTIVATION while others are far less so – a phenomenon widely noted Performing queries and obtaining news articles are tasks that during two 2016 events: the U.K. Brexit vote and U.S. rank only behind sending email as the most common internet activities, with 91% and 76% of users reportedly engaging in © 2018. Copyright for the individual papers remains with the authors. these activities, respectively [10]. Overall, the internet has Copying permitted for private and academic purposes. ESIDA '18, March 11, Tokyo, Japan. grown in importance as a source of information and news on named entities. As of August 2017, 43% of Americans Sentiment Analysis report often obtaining their news online, quickly approaching News articles shared on social media are often used to incite the 50% who often obtain news by television. This 7% gap affective behavior in readers [7] and are ideal for sentiment has narrowed considerably from the 19% gap between the classification. Sentiment analysis is an area of Natural two sources found only 18 months earlier [6]. Language Processing (NLP) that examines and classifies the affective states and subjective information about a topic or The Role of Social Media entity. The research question we wish to examine is how well Social media platforms such as Facebook and Twitter have machine classified sentiment analysis is correlated with the transformed how news is created and disseminated. News sentiment as determined by users (which we set as our content on any named entity can be spread among users ground truth). We do this by looking at the without substantial third-party filtering, fact-checking, or subjectivity/objectivity, the polarity, and the magnitude of editorial judgment on this information. It is now possible for sentiment in the text of the article at the sentence level while a non-expert user with no prior reputation on a news topic to keeping track of contextual issues such as anaphora reach as many readers as the verified sources such as the resolution. By creating a two-dimensional vector to represent Washington Post, CNN, or the BBC [1]. the sentiment for each named entity in each sentence (see With social media, unsurprisingly, users tend to Figure 1), we can create an overall vector to match this to the communicate with others having a similar political ideology, overall sentiment of the article. In Figure 1, the blue lines affecting the ability for them to gain a balanced perspective. represent the boundaries between the classifications of Of the Facebook articles involving national news, politics, or sentiment, from very negative to very positive. Note that world affairs, only 24% of liberals and 35% of conservatives some of the boundary lines between sentiment ratings (the have exposure to other perspectives through shares on social blue lines) are not strictly vertical; if a word is more media [2]. Therefore, most social media users who wish to objective, the threshold for it to be at the extremes (either gain a different perspective on a named entity require a very positive or very negative) is lower than that when the convenient yet customizable interface to search these articles term is denoted as subjective. We discuss how we classify and view information on these different perspectives. these terms in the next section. Although websites like Allsides1 use a bias rating system to Long Short Term Memory (LSTM) Models illustrate the spectrum of reporting on a liberal-conservative We use the LSTM model introduced by Hochreiter and bias, to our knowledge, no search interface has been created Schmidhuber [8], and subsequently modified to include to classify news articles based on the sentiment used in the forget gates as implemented by Gers, Schmidhuber, text. Cummins in [4] and by Graves in [5]. LSTMs have been traditionally applied to machine translation efforts, but here we apply them to classifying sentiment. With RNNs, a weight matrix is associated with the connections between the neurons of the recurrent hidden layer. The purpose of this weight matrix is to model the synapse between two neurons. During the gradient back- propagation phase of a traditional neural network, the gradient signal can be multiplied many times by this weight matrix, which means it have a disproportionately strong influence on the learning process. When weights in this matrix are small (i.e., the leading eigenvalue of the weight matrix < 1.0), a situation called vanishing gradients can occur. In this situation, the gradient signal gets so small that learning either becomes very slow or may stop completely. This has a negative impact on learning the long-term dependencies in the data. However, Figure 1: An example illustrating the vector representation of terms in the phrase “She was excellent at helping others but when the weights in this matrix are large (i.e., the leading found the task boring” illustrating the polarity along the x-axis eigenvalue of the weight matrix > 1.0), the gradient signal and subjectivity along the y-axis. Magnitude is represented as can become so large that learning will diverge, which is often the length of the vector. Vertical blue lines represent the referred to as exploding gradients. boundaries between sentiment classes, with a tighter range for terms labeled subjective as compared with those labeled as Minimizing the vanishing and exploding gradients is the objective. primary motivation behind the LSTM model. This model 1 https://www.allsides.com/unbiased-balanced-news introduces a new structure called a memory cell (see Figure Our model is a variation of the standard LSTM model; here 2). A memory cell is comprised of four main elements: (a) the activation of a cell’s output gate is independent of the an input gate, (b) a neuron with a self-recurrent connection, memory cell’s state t. (c) a forget gate, and (d) an output gate. The self-recurrent connection maintains a weight very close to 1.0. Its purpose This variation allows us to compute equations (1), (2), (3), is to ensure that from one timestep to the next, barring any and (5) in parallel, improving computational efficiency. This outside interference, the state of a memory cell will remain is possible because none of these four equations rely on a constant. The gates serve to modulate the interactions result produced by any of the other three. We achieve this by between the memory cell itself and its environment. The concatenating the four matrices ∗ into a single weight input gate can allow incoming signal to alter the state of the matrix W, performing the same concatenation on the four memory cell or block it. On the other hand, the output gate weight matrices ∗ to produce the matrix , and the four bias can allow the state of the memory cell to affect other neurons. vectors ∗ to produce the vector b. Then, the pre-nonlinearity Last, the forget gate modulates the memory cell’s self- activations can be computed with: recurrent connection, allowing the cell to remember to (7) z = ignore, or forget, its previous state. The result is then sliced to obtain the pre-nonlinearity activations for i, f, t, and o. These non-linearity activations are then applied independently to their respective cells. Our model is composed of a single LSTM layer followed by an average pooling and a logistic regression layer as illustrated in Figure 3. From an input sequence x0, x1, x2, ..., xn, the memory cells in the LSTM layer will produce a representation sequence h0, h1, h2, ..., hn. This representation sequence is then averaged over all n timesteps resulting in representation, h. Last, this representation is fed to a logistic Figure 2: Illustration of an LSTM memory cell. regression layer whose target is the class label associated with the input sequence, which is the five ordinal levels of The following equations illustrate how a layer of memory sentiment, ranging from very positive to very negative. To cells is updated at timestep t. We define xt and ht as the input map these vectorized terms (as seen in Figure 1) to an ordinal and output, respectively, to the memory cell layer at time t, value for sentiment, we take the cosine of the term vector. Wi, Wf, Wc, Wo, hidden-state-to-hidden-state matrices Ui, Uf, Uc, Uo, are the weight matrices, and bi, bf, bc and bo are the bias vectors. First, we determine the values for the input gate, it, and the candidate values for the states of the memory cells at time t, t: (1) it = (2) t = tanh(Wcxt + Uc + bc) Next, we compute the value for ft, the activation of the memory cells’ forget gates, at time t: (3) ft = (Wf xt + Uf h(t-1) + bf) Figure 3: It is composed of a single LSTM layer followed by Given the value of the input gate activation it, the forget gate mean pooling over time and logistic regression. activation, ft, and the candidate state value, t, we can compute Ct, the memory cells’ new state, at time t: INTERFACE COMPONENTS Figure 4 illustrates the flow of a user query involving a (4) Ct = it * t + ft * C(t-1) named entity on our interactive query interface. In this where * denotes a point-wise (Hadamard) multiplication section, we describe the major steps and related interfaces. operator. Once we obtain the new state of the memory cells, Data Collection we can compute the value of their output gates, ot, and their We use a collection of 433,175 news articles scraped from outputs, ht: 211 formal and informal news sources. Of the 211 news sources, 109 of these are from verified sources. We (5) ot = determine verified sources as those from Media Bias/Fact (6) ht = ot * tanh(Ct) Check that indicate a factual reporting score of “high”. The articles in our collection are on a variety of topics, but all are Figure 4: Flow diagram showing the major components of the search system. written in English, have publication dates from 2012-2017, (which was empirically determined). We use a learning rate and are available on the internet (although some are available of 10-5, an L2 regularization weight of 0.009, and dropout only through paywalls). Figure 5 illustrates the distribution value of 1.0. of news articles, news sources, and verified sources for each Interactive Query Interface year in our collection. Figure 6 shows the interactive query interface used in our The processing of the data in the collection was designed to study. The query interface is designed to give users as much be done quickly. Using a single server, we were able to information to refine their search based on the sentiment of index, detect and classify sentiment for the entire collection the search results. The interface is divided into two columns. of 433,175 articles in approximately 4 minutes, allowing us The left column contains an area to enter and refine queries, to handle emergent stream data (i.e. Twitter) with only a a checkbox for the user to only have results from verified minor delay. sources returned, several checkboxes to determine the types of sentiments to include, from very negative to very positive. At the bottom of the left-hand column, the most popular search terms not used in the user query appear in the results, with color coding to indicate the sentiment of the term. In the right-hand column, we have a display of the article counts by sentiment, and the top-ranked search results. Users are also given the ability to sort the search results based on relevance, date, sentiment, or verified source. Next to each search result, users can see the sentiment our approach has indicated for that article, as well as an indication if the article is from a verified source. We implemented searches on our collection using Indri, a scalable open-source search engine [12]. Indri works well with smaller queries, which are typically used in searches on named entities. Figure 5: Number of articles (top) and number of unique sources (bottom) in our collection, by publication date of the article. Training of the LSTM Network The dataset used for training is the recently proposed Stanford Sentiment Treebank [11], which includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences. In our experiment, we focus in Figure 6: The Interactive Query Interface for searching our sentiment prediction of complete sentences with respect to collection, showing an example query. The sentiment we the named entities contained within each sentence. derive from each article is represented as the sentiment of the article. For our LSTM, we use a use the softsign activation function over tanh; it is faster than softmax and there is a smaller Detecting Ambiguous Named Entities probability of saturation (i.e., having a gradient that To ensure we are tracking the correct named entity, when approaches 0). We evaluated our training set over 20 epochs, appropriate, we need to disambiguate potentially confounding entities. We use an API from Wikipedia to EXPERIMENT DESIGN check for a disambiguation page on that user-provided Sentiment analysis is primarily associated with a named named entity. If one is found, we obtain the different entity, so if multiple entities are described in the article text, categories, if any, that are provided by Wikipedia. Figure 7 each with a different sentiment, this can convolute the true shows an example of a search on “Michael Jackson” and the sentiment around each entity if not properly handled. Also, categories containing entities named “Michael Jackson”. the sentiment of the article is a relative concept – if all This allows users to narrow their search to the correct entity, articles are negative about a named entity, even a slightly reducing the possibility of confounding results from positive article can look very positive in comparison. Our mistakenly grouping disparate entities together. research question is to evaluate if machine generated sentiment analysis is a strong predictor of article sentiment from a user’s perspective. We accomplish this by evaluating feedback on the sentiment rating the users provide. Evaluating Sentiment As with determining relevance in information retrieval, humans widely known to be better than machines at determining the correctness of article sentiment. We hired 293 crowdworkers from Amazon Mechanical Turk. These crowdworkers performed 600 separate tasks (HITs) to evaluate 1500 articles (approximately 0.35% of our collection) by searching on 150 named entities. Each article was evaluated by at least 3 different crowdworkers Figure 7: The disambiguation page for Michael Jackson. (crowdworkers could not evaluate an article more than once). Categories are pulled from Wikipedia through their API, The distribution of ratings made by crowdworkers is given allowing the user to find the correct Michael Jackson. Note the shortcut in the upper right-hand side linking to the most in Figure 9. Most raters evaluated 5 articles and the mean popular named entity. number of articles rated was 15. Detecting Verified Sources As described earlier, we allow users to search on only verified news sources or all sources. This allows users to examine both informal and formal sources. We describe how we verify sources in the Data Collection section. Figure 8 shows search results without the verified sources only checkbox checked, allowing unverified sources. Figure 9: The number of articles rated (x-axis) by the number of raters evaluating that number of articles (y-axis). Figure 8: The Interactive Query Interface for searching our collection, showing search results containing unverified sources Applying Sentiment Analysis We use the LSTM method to detect and classify sentiment analysis for each major named entity in each article as well as the main keywords associated with that article. We provide five classes of sentiment, from very negative to very positive. We display this information to the user as the Figure 10: The interface used to evaluate the article’s sentiment of the article. relevance and classification of the article’s sentiment. Instructions to Users (either very positive or very negative) and looked at those Each user (crowdworker) is asked to determine if the article articles which were extreme outliers, or a difference in retrieved by their query is relevant to the search criteria. This ratings of 3 or more on our 5-point scale. Of the 150 named is used to help refine the search criteria parameters provided entities examined in our study, we found 14 that had one or in Indri. More importantly, the user is asked to evaluate the more articles meeting this condition. These 14 named entity sentiment assigned to the article on a five-point scale (see searches yielded 29 articles, of which 28 were unverified Figure 10). Users were also asked to take a survey on news articles. usability of the interface and the perceived accuracy of the LSTM classified sentiment. We ran a separate analysis of any quotations and facts raised in each of these 28 articles. We then tried to find these facts Intra-Rater Reliability mentioned in the other articles. Of the 31 quotations in these To evaluate intra-rater reliability, we kept track of each articles about the named entity in question, we found 20 crowdworker’s ratings and the articles they rated without instances where the quotations did not exist in any other identifying them personally. When the articles were article in our collection and 11 instances where these presented to the crowdworker to rate, they were not made quotations were mentioned, but convoluted in a way to aware of the overall rating previously made by our sentiment contort its context. Of the 89 facts raised in these articles, 77 analysis model. We also kept track so that a single user could of these were not mentioned in any other article, and 12 that not evaluate any article more than once. were mentioned but taken out of context with respect to the We understand that the raters’ personal bias can influence other articles in our collection. While we cannot confidently their perspective on an article’s sentiment. Although we did conclude that these articles represent fake news, we believe not attempt to recalibrate each crowdworker’s ratings based this approach can help identify articles that have a distinctly on the pattern of their ratings, we did see if any crowdworker different sentiment from other articles and bring up consistently selected the sentiment to be very positive or very quotations and facts not mentioned in other articles. We plan negative, implying they were rushing through the task to explore this relationship in a future study. instead of evaluating each article thoroughly. Of the 600 Last, we asked the crowdworkers to provide optional tasks, only 3 tasks needed to be repeated due to this behavior. feedback on the interface both in terms of usability and in Fake News terms of accuracy of sentiment classification on a five-point We also wish to determine if outliers in sentiment on a named Likert scale. Of the 293 crowdworkers, we received entity were good predictors of fake news. For example, if a feedback from 177 (60.4%). Survey takers scored the large percentage of articles for an entity are slightly positive interface as 3.28 for usability (5 = best), with many providing or very positive, those articles with sentiments rated very comments that more work needs to be done to reduce its negative (particularly from unverified sources) are complexity. The survey takers scored the LSTM model’s candidates to be fake news articles. To examine the details sentiment classification accuracy as 4.54, with many further, we look at the most negative quotations or facts providing feedback indicating they concurred with the provided in these articles using a separate process, and look LSTM model’s overall accuracy. at the overlap between these sources and other articles in our Correlation of Rating Obtained by the LSTM collection. We briefly report and analyze these findings. Ratings between Sentiment Analysis Model RESULTS AND ANALYSIS Users and the LSTM Model 1 2 3 4 5 Our primary research question was to examine how well the sentiment analysis provided by our LSTM model correlates Average 1 122 70 6 3 0 with the sentiment rating made by users. Since each of the of User articles was evaluated at least 3 times, we took the average 2 67 242 77 5 0 Ratings rating of the users (rounded to the nearest integer) to be the (min of 3 3 4 84 192 79 5 correct article sentiment. Ratings per 4 0 10 73 250 74 We performed a Pearson correlation coefficient, r, on the 5 Article) 5 0 2 11 45 79 sentiment classes determined by our LSTM network with the 5 sentiment classes provided by the users. There was a Table 1: Correlation of Ratings between the average supplied positive correlation between the two variables [r = 0.823, n by the users and those obtained by sentiment analysis model. = 1500, p < 0.001]. Therefore, based on the sample of 1500 news articles evaluated, we believe the sentiment analysis CONCLUSION AND FUTURE WORK provided by the LSTM model is a reasonably good predictor In this paper, we describe an interactive query interface that of an article’s sentiment. Table 1 shows the correlation makes use of sentiment analysis. This allows users between the two sets of ratings. performing a named entity search to receive information on the sentiment of the article and therefore find a wide diversity To evaluate fake news articles, we examine named entities of opinions on a named entity search quickly and easily. where the sentiment is skewed heavily in one direction We describe the LSTM model we used, and how this can be 2. Bakshy, E., Messing, S., & Adamic, L. A. (2015). used to classify sentiment of the news article text into five Exposure to ideologically diverse news and opinion on classes, ranging from very negative to very positive. The Facebook. Science, 348(6239), 1130-1132. advantage of this model is that even when multiple entities 3. Barthel, M., Michell, A, and Holcomb, J. (2016). Many are mentioned in an article it can match the sentiment for the Americans Believe Fake News Is Sowing Confusion. named entity in question. We have shown that this technique Pew Research Center. Available at: can process news articles quickly, allowing emergent news http://www.journalism.org/2016/12/15/many- to be covered quickly. americans-believe-fake-news-is-sowing-confusion/ We conducted a user study with 293 unique participants to 4. Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). answer a research question. They were instructed to classify Learning to forget: Continual prediction with LSTM. the sentiment of 1500 articles and indicate how this Neural Computation, vol. 12, 2451–2471 sentiment correlates with the sentiment obtained from our 5. Gottfried, J, and Shearer, E. (2017) Americans’ online model. Each article was evaluated by at least 3 users. With news use is closing in on TV news use. 2017. Pew a Pearson correlation coefficient, r=0.823, we found the Research Center. Available at: classification of article sentiment and the classification from http://www.pewresearch.org/fact-tank/2017/09/07/ the LSTM sentiment analysis tool are strongly correlated. americans-online-news-use-vs-tv-news-use/ Combining the sentiment classification techniques with 6. Graves, A. (2012). Supervised sequence labelling with some additional analysis allows us to identify potentially recurrent neural networks (Vol. 385). Heidelberg: fake news articles. We identified news articles where the Springer. Chicago ratings were outliers from a majority of the other relevant articles using the same named entity search. We find that 28 7. Hasell, A., & Weeks, B. E. (2016). Partisan of the 29 articles identified using this approach were provocation: The role of partisan news use and suspicious news articles and would need further emotional responses in political information sharing in investigation. We leave this for a future study. social media. Human Communication Research, 42(4), 641-661. There are some limitations to our work. First, out study only 8. Hochreiter, S., & Schmidhuber, J. (1997). Long short- looks at queries on named entities, which are easier to term memory. Neural computation, 9(8), 1735-1780. retrieve and analyze semantically than general concepts. Second, the study worked with a collection of 433,175 9. Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What articles, with 84.1% of these being pulled from verified is Twitter, a social network or a news media? In sources. With exposure to more unverified sources our Proceedings of the 19th international conference on correlation may be lower, which we leave for future work. World wide web (pp. 591-600). ACM. Another limitation has to do with the sentence complexity. 10. Purcell, K., Brenner, J., & Rainie, L. (2012). Search Our model evaluated sentiment at the sentence level. We engine use. 2012. Pew Research Center. Available at: found proximity to the named entity played a role; if more http://www.pewinternet.org/files/old- than one named entity was mentioned in a sentence, such as media/Files/Reports/2012/PIP_Search_Engine_Use_20 “In the 1938 movie Carefree, Fred Astaire performed well 12.pdf but Ralph Bellamy was forgettable.”, we would expect our 11. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, model to provide a positive sentiment for Fred Astaire, a C. D., Ng, A., & Potts, C. (2013). Recursive deep neutral sentiment for “Carefree” and a negative sentiment for models for semantic compositionality over a sentiment “Ralph Bellamy”; instead it provided a positive sentiment for treebank. In Proceedings of the 2013 conference on “Fred Astaire” and “Carefree” and a neutral sentiment for empirical methods in natural language processing (pp. “Ralph Bellamy”. Evaluating at the phrase level instead of 1631-1642). the sentence level will improve the accuracy of our results. 12. Strohman, T., Metzler, D., Turtle, H., & Croft, W. B. In other future work, we plan to examine the role of images (2005, May). Indri: A language model-based search in articles and how this can be analyzed for sentiment as well. engine for complex queries. In Proceedings of the We also plan to examine the choice of photos used to International Conference on Intelligent Analysis (Vol. represent named entities in news articles. We plan to 2, No. 6, pp. 2-6). examine searches that don’t contain named entities and 13. Yin, X., & Shah, S. (2010). Building taxonomy of web evaluate if our methods are as accurate as they are with search intents for name entity queries. In Proceedings named entities. of the 19th international conference on World wide REFERENCES web (pp. 1001-1010). ACM. 1. Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election (No. w23089). National Bureau of Economic Research.