=Paper= {{Paper |id=Vol-2068/esida3 |storemode=property |title=Searching for Diverse Perspectives in News Articles: Using an LSTM Network to Classify Sentiment |pdfUrl=https://ceur-ws.org/Vol-2068/esida3.pdf |volume=Vol-2068 |authors=Christopher Harris |dblpUrl=https://dblp.org/rec/conf/iui/Harris18 }} ==Searching for Diverse Perspectives in News Articles: Using an LSTM Network to Classify Sentiment== https://ceur-ws.org/Vol-2068/esida3.pdf

Searching for Diverse Perspectives in News Articles:
Using an LSTM Network to Classify Sentiment
Christopher Harris
Department of Computer Science
University of Northern Colorado
Greeley, CO 80639
ABSTRACT elections. However, there are ways to evaluate and
When searching for emerging news on named entities, many categorize this variation in reporting. Sentiment analysis,
users wish to find articles containing a variety of which has been widely applied to classifying movie and
perspectives. Advances in sentiment analysis, particularly product reviews, could also be applied to the sentiment used
by tools that use Recurrent Neural Networks (RNNs), have in reporting news articles, particularly those that focus on a
made impressive gains in their accuracy handling NLP tasks specific named entity. Although early approaches in
such as sentiment analysis. Here we describe and implement sentiment analysis suffered from poor accuracy, recent
a special type of RNN called a Long Short Term Memory advances – particularly applying deep learning techniques
(LSTM) network to detect and classify sentiment in a such as Recurrent Neural Networks (RNNs) – have increased
collection of news articles. Using an interactive query its accuracy and can even distinguish the sentiment between
interface created expressly for this purpose, we conduct an different named entities when an article contains references
empirical study in which we ask users to classify sentiment to more than one entity.
on named entities in articles and then we compare these
sentiment classifications with those obtained from our It is important for search systems to work with named entities
LSTM network. We compare this sentiment in articles that containing both informal text (i.e., blog posts) and formal
mention the named entity in a collection of news articles. text (i.e., news articles). To this end, it is also important to
Last, we discuss how this analysis can identify outliers and distinguish these different types of sources to the user. When
help detect fake news articles. information on a named entity appears from a verified news
source, it carries a different weight (in terms of authenticity)
Author Keywords from a blog posting from a non-expert; the user should be
Sentiment analysis; RNN; LSTM; named entities; artificial made aware of this provenance in the search results and be
neural networks; news analysis; fake news. able to filter the search results based on the verifiability of
ACM Classification Keywords the news.
I.5.1 [Pattern Recognition]: Models → Neural nets; I.2.7 With the rise in social media as a user’s primary news source
[Artificial Intelligence] → Natural Language Processing. [9], misleading news articles called fake news have clouded
H.3.3 [Information systems] → Information retrieval many users’ ability to determine if a news article has merit
diversity or if it is a deliberate attempt to misinform and spread a hoax.
INTRODUCTION Recently, more attention from the NLP community has been
Named entities, which we define as information units such placed on identifying fake news, which we define as
as person, organization and location names, are extremely propaganda disguised as real news that is created to mislead
popular components of user queries. For example, Yin and readers and damage a person’s, an agency’s, or an entity’s
Shah found that nearly 30% of searches on the Bing search reputation.
engine were simply a named entity and 71% of searches A study conducted following the 2016 election found 64%
contained a named entity as part of the query string [13]. of adults indicated that fake news articles caused a great deal
Thus, the proper identification and handling of named of confusion and 23% said they had shared fabricated articles
entities is essential to provide an excellent search experience. themselves – sometimes by mistake and sometimes
There has been a growing number of voices who claim bias intentionally [3]. We believe that sentiment analysis, when
in reporting from media sources, particularly (but not limited done properly, can be used to separate news from genuine
to) named entities in politics and entertainment. News news sources from fake news. We explore this concept
articles covering the same named entity can be reported from briefly in this paper.
a variety of perspectives, some sympathetic to the subject BACKGROUND AND MOTIVATION
while others are far less so – a phenomenon widely noted Performing queries and obtaining news articles are tasks that
during two 2016 events: the U.K. Brexit vote and U.S. rank only behind sending email as the most common internet
activities, with 91% and 76% of users reportedly engaging in
© 2018. Copyright for the individual papers remains with the authors. these activities, respectively [10]. Overall, the internet has
Copying permitted for private and academic purposes.
ESIDA '18, March 11, Tokyo, Japan. grown in importance as a source of information and news on
named entities. As of August 2017, 43% of Americans Sentiment Analysis
report often obtaining their news online, quickly approaching News articles shared on social media are often used to incite
the 50% who often obtain news by television. This 7% gap affective behavior in readers [7] and are ideal for sentiment
has narrowed considerably from the 19% gap between the classification. Sentiment analysis is an area of Natural
two sources found only 18 months earlier [6]. Language Processing (NLP) that examines and classifies the
affective states and subjective information about a topic or
The Role of Social Media
entity. The research question we wish to examine is how well
Social media platforms such as Facebook and Twitter have
machine classified sentiment analysis is correlated with the
transformed how news is created and disseminated. News
sentiment as determined by users (which we set as our
content on any named entity can be spread among users
ground truth). We do this by looking at the
without substantial third-party filtering, fact-checking, or
subjectivity/objectivity, the polarity, and the magnitude of
editorial judgment on this information. It is now possible for
sentiment in the text of the article at the sentence level while
a non-expert user with no prior reputation on a news topic to
keeping track of contextual issues such as anaphora
reach as many readers as the verified sources such as the
resolution. By creating a two-dimensional vector to represent
Washington Post, CNN, or the BBC [1].
the sentiment for each named entity in each sentence (see
With social media, unsurprisingly, users tend to Figure 1), we can create an overall vector to match this to the
communicate with others having a similar political ideology, overall sentiment of the article. In Figure 1, the blue lines
affecting the ability for them to gain a balanced perspective. represent the boundaries between the classifications of
Of the Facebook articles involving national news, politics, or sentiment, from very negative to very positive. Note that
world affairs, only 24% of liberals and 35% of conservatives some of the boundary lines between sentiment ratings (the
have exposure to other perspectives through shares on social blue lines) are not strictly vertical; if a word is more
media [2]. Therefore, most social media users who wish to objective, the threshold for it to be at the extremes (either
gain a different perspective on a named entity require a very positive or very negative) is lower than that when the
convenient yet customizable interface to search these articles term is denoted as subjective. We discuss how we classify
and view information on these different perspectives. these terms in the next section.
Although websites like Allsides1 use a bias rating system to Long Short Term Memory (LSTM) Models
illustrate the spectrum of reporting on a liberal-conservative We use the LSTM model introduced by Hochreiter and
bias, to our knowledge, no search interface has been created Schmidhuber [8], and subsequently modified to include
to classify news articles based on the sentiment used in the forget gates as implemented by Gers, Schmidhuber,
text. Cummins in [4] and by Graves in [5]. LSTMs have been
traditionally applied to machine translation efforts, but here
we apply them to classifying sentiment.
With RNNs, a weight matrix is associated with the
connections between the neurons of the recurrent hidden
layer. The purpose of this weight matrix is to model the
synapse between two neurons. During the gradient back-
propagation phase of a traditional neural network, the
gradient signal can be multiplied many times by this weight
matrix, which means it have a disproportionately strong
influence on the learning process.
When weights in this matrix are small (i.e., the leading
eigenvalue of the weight matrix < 1.0), a situation called
vanishing gradients can occur. In this situation, the gradient
signal gets so small that learning either becomes very slow
or may stop completely. This has a negative impact on
learning the long-term dependencies in the data. However,
Figure 1: An example illustrating the vector representation of
terms in the phrase “She was excellent at helping others but
when the weights in this matrix are large (i.e., the leading
found the task boring” illustrating the polarity along the x-axis eigenvalue of the weight matrix > 1.0), the gradient signal
and subjectivity along the y-axis. Magnitude is represented as can become so large that learning will diverge, which is often
the length of the vector. Vertical blue lines represent the referred to as exploding gradients.
boundaries between sentiment classes, with a tighter range for
terms labeled subjective as compared with those labeled as Minimizing the vanishing and exploding gradients is the
objective. primary motivation behind the LSTM model. This model

1
https://www.allsides.com/unbiased-balanced-news
introduces a new structure called a memory cell (see Figure Our model is a variation of the standard LSTM model; here
2). A memory cell is comprised of four main elements: (a) the activation of a cell’s output gate is independent of the
an input gate, (b) a neuron with a self-recurrent connection, memory cell’s state t.
(c) a forget gate, and (d) an output gate. The self-recurrent
connection maintains a weight very close to 1.0. Its purpose This variation allows us to compute equations (1), (2), (3),
is to ensure that from one timestep to the next, barring any and (5) in parallel, improving computational efficiency. This
outside interference, the state of a memory cell will remain is possible because none of these four equations rely on a
constant. The gates serve to modulate the interactions result produced by any of the other three. We achieve this by
between the memory cell itself and its environment. The concatenating the four matrices ∗ into a single weight
input gate can allow incoming signal to alter the state of the matrix W, performing the same concatenation on the four
memory cell or block it. On the other hand, the output gate weight matrices ∗ to produce the matrix , and the four bias
can allow the state of the memory cell to affect other neurons. vectors ∗ to produce the vector b. Then, the pre-nonlinearity
Last, the forget gate modulates the memory cell’s self- activations can be computed with:
recurrent connection, allowing the cell to remember to (7) z =
ignore, or forget, its previous state.
The result is then sliced to obtain the pre-nonlinearity
activations for i, f, t, and o. These non-linearity activations
are then applied independently to their respective cells.
Our model is composed of a single LSTM layer followed by
an average pooling and a logistic regression layer as
illustrated in Figure 3. From an input sequence x0, x1, x2, ...,
xn, the memory cells in the LSTM layer will produce a
representation sequence h0, h1, h2, ..., hn. This representation
sequence is then averaged over all n timesteps resulting in
representation, h. Last, this representation is fed to a logistic
Figure 2: Illustration of an LSTM memory cell. regression layer whose target is the class label associated
with the input sequence, which is the five ordinal levels of
The following equations illustrate how a layer of memory sentiment, ranging from very positive to very negative. To
cells is updated at timestep t. We define xt and ht as the input map these vectorized terms (as seen in Figure 1) to an ordinal
and output, respectively, to the memory cell layer at time t, value for sentiment, we take the cosine of the term vector.
Wi, Wf, Wc, Wo, hidden-state-to-hidden-state matrices Ui,
Uf, Uc, Uo, are the weight matrices, and bi, bf, bc and bo are
the bias vectors. First, we determine the values for the input
gate, it, and the candidate values for the states of the memory
cells at time t, t:
(1) it =
(2) t = tanh(Wcxt + Uc + bc)
Next, we compute the value for ft, the activation of the
memory cells’ forget gates, at time t:
(3) ft = (Wf xt + Uf h(t-1) + bf)
Figure 3: It is composed of a single LSTM layer followed by
Given the value of the input gate activation it, the forget gate mean pooling over time and logistic regression.
activation, ft, and the candidate state value, t, we can
compute Ct, the memory cells’ new state, at time t: INTERFACE COMPONENTS
Figure 4 illustrates the flow of a user query involving a
(4) Ct = it * t + ft * C(t-1) named entity on our interactive query interface. In this
where * denotes a point-wise (Hadamard) multiplication section, we describe the major steps and related interfaces.
operator. Once we obtain the new state of the memory cells, Data Collection
we can compute the value of their output gates, ot, and their We use a collection of 433,175 news articles scraped from
outputs, ht: 211 formal and informal news sources. Of the 211 news
sources, 109 of these are from verified sources. We
(5) ot = 
determine verified sources as those from Media Bias/Fact
(6) ht = ot * tanh(Ct) Check that indicate a factual reporting score of “high”. The
articles in our collection are on a variety of topics, but all are
Figure 4: Flow diagram showing the major components of the search system.

written in English, have publication dates from 2012-2017, (which was empirically determined). We use a learning rate
and are available on the internet (although some are available of 10-5, an L2 regularization weight of 0.009, and dropout
only through paywalls). Figure 5 illustrates the distribution value of 1.0.
of news articles, news sources, and verified sources for each
Interactive Query Interface
year in our collection. Figure 6 shows the interactive query interface used in our
The processing of the data in the collection was designed to study. The query interface is designed to give users as much
be done quickly. Using a single server, we were able to information to refine their search based on the sentiment of
index, detect and classify sentiment for the entire collection the search results. The interface is divided into two columns.
of 433,175 articles in approximately 4 minutes, allowing us The left column contains an area to enter and refine queries,
to handle emergent stream data (i.e. Twitter) with only a a checkbox for the user to only have results from verified
minor delay. sources returned, several checkboxes to determine the types
of sentiments to include, from very negative to very positive.
At the bottom of the left-hand column, the most popular
search terms not used in the user query appear in the results,
with color coding to indicate the sentiment of the term.
In the right-hand column, we have a display of the article
counts by sentiment, and the top-ranked search results.
Users are also given the ability to sort the search results based
on relevance, date, sentiment, or verified source.
Next to each search result, users can see the sentiment our
approach has indicated for that article, as well as an
indication if the article is from a verified source.
We implemented searches on our collection using Indri, a
scalable open-source search engine [12]. Indri works well
with smaller queries, which are typically used in searches on
named entities.

Figure 5: Number of articles (top) and number of unique
sources (bottom) in our collection, by publication date of the
article.
Training of the LSTM Network
The dataset used for training is the recently proposed
Stanford Sentiment Treebank [11], which includes fine
grained sentiment labels for 215,154 phrases in the parse
trees of 11,855 sentences. In our experiment, we focus in Figure 6: The Interactive Query Interface for searching our
sentiment prediction of complete sentences with respect to collection, showing an example query. The sentiment we
the named entities contained within each sentence. derive from each article is represented as the sentiment of the
article.
For our LSTM, we use a use the softsign activation function
over tanh; it is faster than softmax and there is a smaller Detecting Ambiguous Named Entities
probability of saturation (i.e., having a gradient that To ensure we are tracking the correct named entity, when
approaches 0). We evaluated our training set over 20 epochs, appropriate, we need to disambiguate potentially
confounding entities. We use an API from Wikipedia to EXPERIMENT DESIGN
check for a disambiguation page on that user-provided Sentiment analysis is primarily associated with a named
named entity. If one is found, we obtain the different entity, so if multiple entities are described in the article text,
categories, if any, that are provided by Wikipedia. Figure 7 each with a different sentiment, this can convolute the true
shows an example of a search on “Michael Jackson” and the sentiment around each entity if not properly handled. Also,
categories containing entities named “Michael Jackson”. the sentiment of the article is a relative concept – if all
This allows users to narrow their search to the correct entity, articles are negative about a named entity, even a slightly
reducing the possibility of confounding results from positive article can look very positive in comparison. Our
mistakenly grouping disparate entities together. research question is to evaluate if machine generated
sentiment analysis is a strong predictor of article sentiment
from a user’s perspective. We accomplish this by evaluating
feedback on the sentiment rating the users provide.
Evaluating Sentiment
As with determining relevance in information retrieval,
humans widely known to be better than machines at
determining the correctness of article sentiment. We hired
293 crowdworkers from Amazon Mechanical Turk. These
crowdworkers performed 600 separate tasks (HITs) to
evaluate 1500 articles (approximately 0.35% of our
collection) by searching on 150 named entities. Each article
was evaluated by at least 3 different crowdworkers
Figure 7: The disambiguation page for Michael Jackson. (crowdworkers could not evaluate an article more than once).
Categories are pulled from Wikipedia through their API, The distribution of ratings made by crowdworkers is given
allowing the user to find the correct Michael Jackson. Note
the shortcut in the upper right-hand side linking to the most
in Figure 9. Most raters evaluated 5 articles and the mean
popular named entity. number of articles rated was 15.

Detecting Verified Sources
As described earlier, we allow users to search on only
verified news sources or all sources. This allows users to
examine both informal and formal sources. We describe how
we verify sources in the Data Collection section. Figure 8
shows search results without the verified sources only
checkbox checked, allowing unverified sources.

Figure 9: The number of articles rated (x-axis) by the number
of raters evaluating that number of articles (y-axis).

Figure 8: The Interactive Query Interface for searching our
collection, showing search results containing unverified
sources
Applying Sentiment Analysis
We use the LSTM method to detect and classify sentiment
analysis for each major named entity in each article as well
as the main keywords associated with that article. We
provide five classes of sentiment, from very negative to very
positive. We display this information to the user as the Figure 10: The interface used to evaluate the article’s
sentiment of the article. relevance and classification of the article’s sentiment.
Instructions to Users (either very positive or very negative) and looked at those
Each user (crowdworker) is asked to determine if the article articles which were extreme outliers, or a difference in
retrieved by their query is relevant to the search criteria. This ratings of 3 or more on our 5-point scale. Of the 150 named
is used to help refine the search criteria parameters provided entities examined in our study, we found 14 that had one or
in Indri. More importantly, the user is asked to evaluate the more articles meeting this condition. These 14 named entity
sentiment assigned to the article on a five-point scale (see searches yielded 29 articles, of which 28 were unverified
Figure 10). Users were also asked to take a survey on news articles.
usability of the interface and the perceived accuracy of the
LSTM classified sentiment. We ran a separate analysis of any quotations and facts raised
in each of these 28 articles. We then tried to find these facts
Intra-Rater Reliability mentioned in the other articles. Of the 31 quotations in these
To evaluate intra-rater reliability, we kept track of each articles about the named entity in question, we found 20
crowdworker’s ratings and the articles they rated without instances where the quotations did not exist in any other
identifying them personally. When the articles were article in our collection and 11 instances where these
presented to the crowdworker to rate, they were not made quotations were mentioned, but convoluted in a way to
aware of the overall rating previously made by our sentiment contort its context. Of the 89 facts raised in these articles, 77
analysis model. We also kept track so that a single user could of these were not mentioned in any other article, and 12 that
not evaluate any article more than once. were mentioned but taken out of context with respect to the
We understand that the raters’ personal bias can influence other articles in our collection. While we cannot confidently
their perspective on an article’s sentiment. Although we did conclude that these articles represent fake news, we believe
not attempt to recalibrate each crowdworker’s ratings based this approach can help identify articles that have a distinctly
on the pattern of their ratings, we did see if any crowdworker different sentiment from other articles and bring up
consistently selected the sentiment to be very positive or very quotations and facts not mentioned in other articles. We plan
negative, implying they were rushing through the task to explore this relationship in a future study.
instead of evaluating each article thoroughly. Of the 600 Last, we asked the crowdworkers to provide optional
tasks, only 3 tasks needed to be repeated due to this behavior. feedback on the interface both in terms of usability and in
Fake News terms of accuracy of sentiment classification on a five-point
We also wish to determine if outliers in sentiment on a named Likert scale. Of the 293 crowdworkers, we received
entity were good predictors of fake news. For example, if a feedback from 177 (60.4%). Survey takers scored the
large percentage of articles for an entity are slightly positive interface as 3.28 for usability (5 = best), with many providing
or very positive, those articles with sentiments rated very comments that more work needs to be done to reduce its
negative (particularly from unverified sources) are complexity. The survey takers scored the LSTM model’s
candidates to be fake news articles. To examine the details sentiment classification accuracy as 4.54, with many
further, we look at the most negative quotations or facts providing feedback indicating they concurred with the
provided in these articles using a separate process, and look LSTM model’s overall accuracy.
at the overlap between these sources and other articles in our Correlation of Rating Obtained by the LSTM
collection. We briefly report and analyze these findings. Ratings between Sentiment Analysis Model
RESULTS AND ANALYSIS Users and the
LSTM Model 1 2 3 4 5
Our primary research question was to examine how well the
sentiment analysis provided by our LSTM model correlates Average 1 122 70 6 3 0
with the sentiment rating made by users. Since each of the of User
articles was evaluated at least 3 times, we took the average 2 67 242 77 5 0
Ratings
rating of the users (rounded to the nearest integer) to be the (min of 3 3 4 84 192 79 5
correct article sentiment. Ratings
per 4 0 10 73 250 74
We performed a Pearson correlation coefficient, r, on the 5 Article) 5 0 2 11 45 79
sentiment classes determined by our LSTM network with the
5 sentiment classes provided by the users. There was a Table 1: Correlation of Ratings between the average supplied
positive correlation between the two variables [r = 0.823, n by the users and those obtained by sentiment analysis model.
= 1500, p < 0.001]. Therefore, based on the sample of 1500
news articles evaluated, we believe the sentiment analysis CONCLUSION AND FUTURE WORK
provided by the LSTM model is a reasonably good predictor In this paper, we describe an interactive query interface that
of an article’s sentiment. Table 1 shows the correlation makes use of sentiment analysis. This allows users
between the two sets of ratings. performing a named entity search to receive information on
the sentiment of the article and therefore find a wide diversity
To evaluate fake news articles, we examine named entities of opinions on a named entity search quickly and easily.
where the sentiment is skewed heavily in one direction
We describe the LSTM model we used, and how this can be 2. Bakshy, E., Messing, S., & Adamic, L. A. (2015).
used to classify sentiment of the news article text into five Exposure to ideologically diverse news and opinion on
classes, ranging from very negative to very positive. The Facebook. Science, 348(6239), 1130-1132.
advantage of this model is that even when multiple entities 3. Barthel, M., Michell, A, and Holcomb, J. (2016). Many
are mentioned in an article it can match the sentiment for the Americans Believe Fake News Is Sowing Confusion.
named entity in question. We have shown that this technique Pew Research Center. Available at:
can process news articles quickly, allowing emergent news http://www.journalism.org/2016/12/15/many-
to be covered quickly. americans-believe-fake-news-is-sowing-confusion/
We conducted a user study with 293 unique participants to 4. Gers, F. A., Schmidhuber, J., & Cummins, F. (1999).
answer a research question. They were instructed to classify Learning to forget: Continual prediction with LSTM.
the sentiment of 1500 articles and indicate how this Neural Computation, vol. 12, 2451–2471
sentiment correlates with the sentiment obtained from our
5. Gottfried, J, and Shearer, E. (2017) Americans’ online
model. Each article was evaluated by at least 3 users. With
news use is closing in on TV news use. 2017. Pew
a Pearson correlation coefficient, r=0.823, we found the
Research Center. Available at:
classification of article sentiment and the classification from
http://www.pewresearch.org/fact-tank/2017/09/07/
the LSTM sentiment analysis tool are strongly correlated.
americans-online-news-use-vs-tv-news-use/
Combining the sentiment classification techniques with 6. Graves, A. (2012). Supervised sequence labelling with
some additional analysis allows us to identify potentially recurrent neural networks (Vol. 385). Heidelberg:
fake news articles. We identified news articles where the Springer. Chicago
ratings were outliers from a majority of the other relevant
articles using the same named entity search. We find that 28 7. Hasell, A., & Weeks, B. E. (2016). Partisan
of the 29 articles identified using this approach were provocation: The role of partisan news use and
suspicious news articles and would need further emotional responses in political information sharing in
investigation. We leave this for a future study. social media. Human Communication Research, 42(4),
641-661.
There are some limitations to our work. First, out study only
8. Hochreiter, S., & Schmidhuber, J. (1997). Long short-
looks at queries on named entities, which are easier to
term memory. Neural computation, 9(8), 1735-1780.
retrieve and analyze semantically than general concepts.
Second, the study worked with a collection of 433,175 9. Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What
articles, with 84.1% of these being pulled from verified is Twitter, a social network or a news media? In
sources. With exposure to more unverified sources our Proceedings of the 19th international conference on
correlation may be lower, which we leave for future work. World wide web (pp. 591-600). ACM.
Another limitation has to do with the sentence complexity. 10. Purcell, K., Brenner, J., & Rainie, L. (2012). Search
Our model evaluated sentiment at the sentence level. We engine use. 2012. Pew Research Center. Available at:
found proximity to the named entity played a role; if more http://www.pewinternet.org/files/old-
than one named entity was mentioned in a sentence, such as media/Files/Reports/2012/PIP_Search_Engine_Use_20
“In the 1938 movie Carefree, Fred Astaire performed well 12.pdf
but Ralph Bellamy was forgettable.”, we would expect our 11. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning,
model to provide a positive sentiment for Fred Astaire, a C. D., Ng, A., & Potts, C. (2013). Recursive deep
neutral sentiment for “Carefree” and a negative sentiment for models for semantic compositionality over a sentiment
“Ralph Bellamy”; instead it provided a positive sentiment for treebank. In Proceedings of the 2013 conference on
“Fred Astaire” and “Carefree” and a neutral sentiment for empirical methods in natural language processing (pp.
“Ralph Bellamy”. Evaluating at the phrase level instead of 1631-1642).
the sentence level will improve the accuracy of our results. 12. Strohman, T., Metzler, D., Turtle, H., & Croft, W. B.
In other future work, we plan to examine the role of images (2005, May). Indri: A language model-based search
in articles and how this can be analyzed for sentiment as well. engine for complex queries. In Proceedings of the
We also plan to examine the choice of photos used to International Conference on Intelligent Analysis (Vol.
represent named entities in news articles. We plan to 2, No. 6, pp. 2-6).
examine searches that don’t contain named entities and 13. Yin, X., & Shah, S. (2010). Building taxonomy of web
evaluate if our methods are as accurate as they are with search intents for name entity queries. In Proceedings
named entities. of the 19th international conference on World wide
REFERENCES web (pp. 1001-1010). ACM.
1. Allcott, H., & Gentzkow, M. (2017). Social media and
fake news in the 2016 election (No. w23089). National
Bureau of Economic Research.