=Paper=
{{Paper
|id=Vol-443/paper-7
|storemode=property
|title=Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election in 2008
|pdfUrl=https://ceur-ws.org/Vol-443/paper7.pdf
|volume=Vol-443
}}
==Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election in 2008==
Visual Sentiment Analysis of RSS News Feeds Featuring
the US Presidential Election in 2008
Franz Wanner, Christian Rohrdantz, Florian Mansmann, Daniela Oelke, Daniel A. Keim
University of Konstanz, Germany
firstname.lastname@uni-konstanz.de
ABSTRACT making a purchase decision or afterwards to cope with the
The technology behind RSS feeds offers great possibilities product’s shortcomings or praise its functionality. Further-
to retrieve more news items than ever. In contrast to these more, politicians want to find out their public reputation, the
technical developments, human capabilities to read all these manner the news write about them, and the reaction of the
news items have not increased likewise. To bridge this gap, public on these articles.
this paper presents a visual analytics tool for conducting
semi-automatic sentiment analysis of large news feeds. While Since public opinion polls are an expensive undertaking, our
the tool automatically retrieves and analyzes RSS feeds with goal is to offer a semi-automatic approach by mining the
respect to positive and negative opinion words, the more de- web for particular key words, conducting sentiment analysis
manding news analysis of finding trends, spotting peculiar- on the text to assess how positive or negative a particular
ities and putting events into context is left to the human ex- news postings is, and then to present the information in a
pert. For a solid analysis the news similarity filter enables visual exploration tool. While our approach is not suitable to
highlighting of similar or redundant news items. A case completely replace a thoroughly conducted opinion poll due
study about news related to the US presidential election in to the lack of accuracy, it has also some unique advantages,
2008 shows how the visual interface of the tool empowers namely low costs and the possibility to continuously monitor
the analyst to draw meaningful conclusions without the ef- a particular subject in real-time. Knowing at an early stage
fort of reading all news postings. that consumers have a problem with a sub-component of a
product gives the company more time to react appropriately
Author Keywords and to avoid damage to valuable trade marks.
sentiment analysis, opinion mining, information visualiza-
tion, visual analytics In this paper, we demonstrate a novel way of using text anal-
ysis methods in combination with a visual representation.
On the one hand, this system automatically evaluates the
ACM Classification Keywords
emotional content of a news posting. On the other hand, the
H.5.2 Information Interfaces and Presentation: Miscellaneous
visual interface empowers the human expert to draw mean-
ingful conclusions, to selectively read a few news postings
INTRODUCTION with strong emotional content, to discover trends, and to gain
The web is the largest information source in the world. One an overview of the development of chosen topic in the me-
major aspect of the web is to bring news from all over the dia.
world via RSS feeds instantaneously on your screen. Apart
from passive usage of the web as a media, web 2.0 technol- To exemplify our tool we have a closer look at the news
ogy helps more and more people to actively contribute to this coverage in the web of the 2008 US presidential election.
valuable information source by creating content in an easy Out of 50 chosen political RSS newstickers, we retrieved
way. There are many possibilities to take an active part in all RSS articles containing at least one of the following key
the web: blogs, reviews and other ways to state comments. words: “Obama”, “McCain”, “Biden” and “Palin” as well as
“Democrat” and “Republican”. Thereupon, the articles are
Analyzing news stories and user generated content is of huge automatically evaluated with respect to the contained posi-
importance for many people and organizations. Economic tive and negative opinion words, resulting in a normalized
analysts, for example, would like to find consumer and pub- sentiment score for each article.
lic opinions on their products and services. Likewise, po-
tential consumers seek experiences of existing users before For presentation purposes, these articles are then visualized
on a daily timeline using symbols to encode the contained
key words. The vertical position of each symbol is defined
by the article’s sentiment score, which makes strong emo-
tional news more visible. Furthermore, we demonstrate an
interactive feature to show relations between the news items
to track the development of a specific topic.
Workshop on Visual Interfaces to the Social and the Semantic Web
(VISSW2009), IUI2009, Feb 8 2009, Sanibel Island, Florida, USA. Copy- The rest of this paper is structured as follows: In section
right is held by the author/owner(s).
1
Related Work text and sentiment analysis methods and vi- Two further approaches being related to our work are [2] and
sual interfaces for them are discussed. The next section Vi- [11]. Both of them analyze blogs and / or newspaper articles
sual Sentiment Analysis then presents our processing, visu- with respect to their political orientation. However, none of
alization, and interaction approaches for analyzing the news the approaches explores the development over time as we
coverage of the 2008 US presidential election. Afterwards, do. Instead they both focus on analyzing the link structure
section Results shows how some interesting topics about the between the different blogs respectively the citation patterns
candidates and their parties manifest in our visualization. By for newspaper articles. In addition, [11] takes into account
summarizing our contributions we draw our conclusions in how emotionally charged a post is.
the last section.
Sentiment Analysis
Within the abundant literature that exists in the context of
RELATED WORK sentiment analysis and opinion mining, some major tasks
can be identified:
Text Analysis
The visualization and visual analysis of textual data is in- • Classification of the statements of a document (or a sen-
creasingly attracting interest in different application domains. tence) as subjective or objective. (e.g. [29, 14])
Many of the early approaches in that area dealt with the vi-
sualization of retrieval results (see e.g., VIBE [22] or In- • Classification of a document (or a sentence) as expressing
foCrystal [27]). Furthermore, a variety of techniques con- a negative or positive sentiment (or opinion). (e.g. [25,
centrate on the visualization of large document collections, 5])
most of which are based on dimensionality-reduction meth-
ods (see e.g. WebSOM [23], Galaxies and ThemeScape of • Feature-based opinion mining made up by two successive
IN-SPIRET M [30], or [9]). In contrast to this, text feature steps: First, the features (or attributes), that have been
visualization techniques visualize single documents in de- commented on, are identified. Secondly, the respective
tail and show the distribution of specific text features across opinion that has been expressed on them is detected. (e.g.
the text. Prominent examples among these are e.g. TileBars [17, 18, 26, 21, 20])
[16], Seesoft [3], the FeatureLens [6], and Literature Finger-
printing [19]. But also [1] and the Compus system of Fekete Note that our approach is not contributing to the area of au-
and Dufournaud [7] are worth being mentioned: As opposed tomatic sentiment analysis but makes use of some of its stan-
to the other techniques they offer the possibility to visualize dard techniques. However, we contribute to the development
several text features at once. of visualizations for sentiment analysis. Related work in
this respect includes [10, 24, 13]. The visualization, which
Relatively few approaches tackle the problem of visualizing shows to have the highest resemblance to our work, can be
temporal variations across a set of documents as we do in found in [24]. The authors suggest to use bars to visualize
this paper. One example for such an approach is the well- how many positive respectively negative statements – that
known ThemeRiver visualization [15] that reveals the devel- comment on one of the analyzed attributes of a product – ex-
opment of topics over time in a river-like graphic. Accord- ist within the document corpus. Our work is similar in that
ing to the metaphor each topic is represented as one colored we also use the vertical deflection of bars to encode the opin-
“current” in the “river” that flows in the direction of the time- ion that is expressed. In contrast to [24] however, in our case
line from left to right. To allow for several different themes one bar represents one document instead of the summary of
to be displayed at once the currents are stacked on top of all sentences talking about a specific attribute of a product.
each other. The thickness of a current at a specific point Moreover, in our visualization the development over time is
in time represents the strength of the topic in the associated central, something that is completely omitted in all of the
documents. TimeMines [28] and Narratives [8] are exam- above mentioned approaches for sentiment analysis / opin-
ples for visualizations that are based on standard line charts. ion mining. In [10] customer reviews are visualized, too,
TimeMines automatically determines keywords and judges but a Treemap representation is used to display the result of
those keywords with respect to their temporal significance the analysis. Finally, [13] presents an adaptation of the Rose
in the context of the corpus. Furthermore, keywords that Plot visualizations to illustrate the affective content of a doc-
show to have a similar development over time are grouped ument. In addition to positive and negative sentiments, the
to form a topic. Narratives presents the development of a documents are also analyzed with respect to the categories
specific topic over time and searches for correlated terms. virtue, vice, pleasure, pain, power cooperative, and power
conflict.
A similar concept is reported in [12]. The system BlogPulse
(that can be found at www.blogpulse.com) monitors blogs VISUAL SENTIMENT ANALYSIS
and displays timelines that show how many blogs talk about Data Processing
a specific topic at a specific point in time. In addition, hot The data we used was gathered from 50 different RSS news
topics are detected automatically. All of the mentioned time- feeds, that mainly dealt with the 2008 US presidential elec-
oriented approaches have a common limitation: They merely tions. The RSS feeds were retrieved every 30 minutes during
display the development of the significance of keywords or a time interval of one month (10/09/2008 - 11/10/2008). For
topics over time. Our approach goes beyond that by means every news item in each feed we saved date, title and descrip-
of additionally revealing the sentiment of the documents. tion, as well as the id of the feed. Next, noise was eliminated
2
out of the title and description. With noise we refer to strings (negative). Horizontal lines mark the position that a news
that do not carry any content, such as URLs or strings con- item would have that is neither positive nor negative.
sisting of special characters. The concatenation of title and
description was then considered to be the content of the news
item. Finally, we filtered out those documents that contained Coloring
none of the following signal words: “Obama”, “McCain”, Everything that is solely related to the conservatives (Repub-
“Biden”, “Palin”, “Democrat” and “Republican”. More than lican party) is colored in red and everything purely related to
23,000 news items contained at least one of the six strings. the liberals (Democratic party) in blue. Gray news objects
relate both to the liberals and the conservatives, which basi-
Pairwise similarities between news items were calculated by cally means that both camps are mentioned within the news’
applying a similarity measure, which counts the number of content.
non-stopwords that two items have in common (normalized
by the length of the larger item). Although this is a relatively
simple measure it works quite well for the short descriptive Shape
texts in the RSS news feeds. The use of different shapes for the object allows us to make
a distinction between news items in which the first candi-
Another aspect of interest is the sentiment context of a news date of a party was mentioned, the second candidate but not
item, which is done by enriching each item with a sentiment the first candidate or none of them but only the name of the
score. For this purpose we make use of a freely available party. Figure 1 shows the visual appearance of the different
list of words that evoke positive or negative associations [4]. shapes. Please note that we keep the horizontal interruptions
We count the number of positive and negative words and that are utilized to mark news items that talk about the sec-
evaluate the whole news item as rather positive if it contains ond candidate always at the same vertical position of each
in total more positive than negative words. Likewise, the line (regardless of the vertical shift of the object that encodes
item is evaluated as rather negative if it contains more neg- the emotional score). This leads to a clear visual pattern of
ative than positive words. The absolute relation of positive continuous white horizontal lines, if several neighboring ob-
against negative words normalized by the item’s length, pro- jects refer to the second candidates only.
vides our sentiment score. One important point to mention
here is that the appearance of a candidate, e.g., in a negative
context, does not necessarily mean, that the item contains
negative publicity for the candidate, but simply that he ap-
pears in a negatively connoted context. This becomes clear
when we consider the example of news telling that racists
planned to assassinate Obama (see section “Results”). This
was bad news for Obama not about Obama, with a visibly
negative connotation.
Data Visualization
The visualization on the one hand aims to give a meaningful
representation of the data and on the other hand is intended
to be an appropriate starting point for the interactive explo-
ration and discovery of interesting patterns. Figure 4 shows a Figure 1. Symbols used to represent news items according to the ap-
screenshot of the visualization. Each line represents one day pearance of certain keywords.
and each colored object depicts one news item. The news
item’s emotional score is encoded by a vertical displacement
of the news item. Colors encode whether the text mentions Opacity
the Democratic party, the Republican party or both. Addi- We paint our news objects with a relatively low opacity. That
tionally, the shape of the news objects visualizes whether the means they are partly transparent, which comes with two ad-
first candidate, the second candidate or only the name of the vantages: First, the problem of overlapping news objects is
party itself was mentioned. The following passages describe reduced. In most cases every object is visible and can be
each of those aspects in detail. differentiated clearly from its overlapping neighbors. Sec-
ondly, if multiple news items are put on top of each other,
Placement the overall opacity at this position increases, resulting in an
Every news item is represented by an object in a 2D plane. object that is less opaque and can therefore be distinguished
The position of the object within the plane depends on the from objects that represent just one news item. The situation
date the news was published. Thereby, the day it was pub- that several feeds bring the same news nearly at the same
lished accounts for the line it will be placed in (as each line moment in time is often the case when the news is very im-
represents one day) and the time of day determines its hor- portant. That means that the less opaque news objects of-
izontal position within the line. The exact vertical position ten represent news that are more important and surely more
depends on the sentiment score of the object. According to widely spread. Figure 2 visually illustrates the above men-
this value an object is slightly shifted up (positive) or down tioned design decisions.
3
Republican. To exemplify our Visual Analytics technique,
higher α-value: same “Biden in highlighted we picked five interesting discussions in the monitored RSS
news item from neutral context” news item feeds.
different feeds
+ Palin abused power in Alaska
On Saturday, 10th October, many negative news postings oc-
two horizontal curred about Sarah Palin. Almost all articles deal with the
sentiment
lines represent topic whether Sarah Palin had abused her power in Alaska
shift
one day or not. As demonstrated in Fig. 5 there is a high density of
red shapes with two white bars symbolizing news postings
- about Palin. Their positions below the baseline denote that
mainly negative emotion words were used in these postings.
“Democrats in “McCain in Only one exceptionally positive red news item sticks out in
negative context” about one hour positive context” the visualization. A closer look at this posting reveals that it
of the day is a response from the McCain-Palin presidential campaign:
“Sarah Palin acted ‘within proper and lawful authority’ in
removing the state’s public safety commissioner”.
Figure 2. Semantics of the visualization
Fri Oct 10 19:41:49 CST 2008 (Feed 19): Fri Oct 10 22:15:22 CST 2008 (Feed 39):
Palin abused power Alaska 'Troopergate' Palin says report says she acted lawfully
probe finds: AFP - Republican vice- (Reuters): Reuters - Alaska Gov. Sarah Palin
presidential nominee Sarah Palin abused her acted "within proper and lawful authority" in
position as Alaska Governor by pressuring removing the state's public safety
Interactive Visual Analytics officials to dismiss a state trooper, an commissioner, the McCain-Palin Republican
investigator's report said. presidential ticket said on Friday in response
The visualization is designed for an interactive data explo- to a state report.
ration. There are several possibilities to interact with the
tool:
• Zooming: Continuous zooming allows to analyze certain
parts at a greater level of detail. Fri Oct 10 21:50:40 CST 2008 (Feed 32):
Alaska ethics probe says Palin abused her
Fri Oct 10 19:24:20 CST 2008 (Feed 49): power: CHILLICOTHE, Ohio (Reuters) - An
Alaska panel finds Palin abused power in Alaska ethics inquiry found on Friday that U.S.
• Details on demand: When the mouse is dragged over a firing: ANCHORAGE, Alaska (AP) -- A Republican vice presidential candidate Sarah
Palin abused her power as the state's
legislative committee investigating Alaska
news object, a tooltip appears containing date, time, feed Gov. Sarah Palin has found she unlawfully
abused her authority in firing the state's
governor, casting a cloud over John McCain's
controversial choice of running mate for the
November 4 election.
id, and content of the item. public safety commissioner. The investigative
report concludes that a family grudge wasn't
the sole reason for firing Public Safety Fri Oct 10 21:06:44 CST 2008 (Feed 18):
Commissioner Walter Monegan but says it Probe accuses Palin of abuse of power (AFP):
• Similarity search: With a mouse click on a news object, likely was a contributing factor.... AFP - Investigators found vice presidential
nominee Sarah Palin abused her powers as
Alaska governor, dealing another blow to
the search for similar news items is started. The news item Republican John McCain's struggling White
House bid.
itself and every other news object that is related to it is
highlighted (please refer to section “Data Processing” for
our definition of similarity). Figure 3 shows an example. Figure 5. Media coverage dealing with the topic of Sarah Palin’s abuse
of power as a governor of Alaska.
• Filtering: The user can select the different candidates /
parties he is interested in. Another possibility to reduce Bad news for the Democrats
the number of items that are displayed is to select one spe- Approximately one week before the US presidential elec-
cific RSS feed. Both filtering mechanisms can be used tion we detected a high appearance of news which included
to analyze in detail the behaviour of one specific news “Obama” (see Fig. 6). The sentiment scores of these post-
provider respectively the development of news for a sub- ings were mainly negative and dealt with a plot to assassi-
set of candidates and/or parties. nate Barack Obama and 102 blacks. Note that the news are
bad for him but not about him, meaning that a negative event
is related to him in the news postings although the negative
opinion words do not refer to him as a person.
The used emotion words were so strong, that even in the
overview it is possible to recognize the emergence of the
negative news of that event on 28th of October (see Fig. 4).
Figure 3. After selecting one news item, similar items are highlighted Note that although each RSS posting only consist of a few
in yellow enabling the user to track specific topics (low threshold) or sentences, the few contained positive or negative opinion
redundant postings (high threshold). words are sufficient to provide clear results. Further head-
lines of that day discuss the corruption scandal of a Demo-
RESULTS cratic senator and result in negative headlines for the Democrats.
First of all, we present an overview of all 50 monitored RSS
feeds over a time period of 31 days in Fig. 4. A prede- TV debate Obama vs. McCain
fined filter displays all news postings containing at least one In the middle of October the final TV debate between the
of the terms Obama, McCain, Biden, Palin, Democrat, and Democrat candidate Barack Obama and the Republican can-
4
A
B
C
D
E
Figure 4. 31 days of the 2008 US presidential election showing a scandal of power abuse by Palin (A), the TV debate McCain vs. Obama (B),
assassination plans against Obama (C), the election day (D), and a debate about Palin’s election wardrobe (E).
5
8). These outliers deal with some critical notes about the ex-
pensive wardrobe, which was bought by Sarah Palin for her
campaign, and her inappropriate use of language describing
her critics.
Fri Nov 07 15:40:35 CST 2008 (Feed 23): Fri Nov 07 17:56:01 CST 2008 (Feed 31):
GOP tries to sort out Palin's donor-funded Palin fires back at leaks questioning her
duds: WASHINGTON (AP) -- Republican smarts: WASHINGTON (Reuters) - Alaska
Party lawyers are still trying to determine Gov. Sarah Palin fired back on Friday
exactly what clothing was purchased for against post-election claims by aides to
Alaska Gov. Sarah Palin, what was Republican presidential candidate John
returned and what has become of the McCain that she thought Africa was a
rest..... country, not a continent, calling the
anonymous sources "jerks."
Mon Oct 27 14:24:25 CST 2008 Mon Oct 27 15:45:26 CST 2008 Mon Oct 27 16:45:39 CST 2008
(Feed 37): (Feed 38): (Feed 31):
ATF disrupts skinhead plot to Assassination plot targeting Skinheads held over Obama
assassinate Obama (AP): Obama disrupted (AP): AP - Law death plot: WASHINGTON
AP - The ATF says it has enforcement agents have broken (Reuters) - Two white
broken up a plot to assassinate up a plot by two neo-Nazi supremacist skinheads were
Democratic presidential skinheads to assassinate arrested in Tennessee over
candidate Barack Obama and Democratic presidential plans to go on a killing spree
shoot or decapitate 102 black candidate Barack Obama and and eventually shoot
people in a Tennessee murder shoot or decapitate 88 black Democratic presidential
spree. people, the Bureau of Alcohol, candidate Barack Obama, court
Tobacco Firearms and documents showed on Monday.
Explosives said Monday.
Fri Nov 07 16:38:59 CST 2008 (Feed 39):
Fri Nov 07 16:01:19 CST 2008 (Feed 37): Palin denounces her critics as cowardly
Figure 6. Democrats appears in “negative context”. Bad news for Palin denounces her critics as cowardly (AP): AP - Alaska Gov. Sarah Palin called
her critics cowards and jerks Friday for
Obama, but not about him. (AP): AP - Alaska Gov. Sarah Palin is
deriding her anonymously and insisted she
striking back at critics of the high-priced
wardrobe she wore as the Republican never asked for the expensive wardrobe
vice presidential candidate.... purchased for her use on the presidential
campaign.
didate John McCain was held. As shown in Fig. 7, news
postings of the event cover both candidates (gray) and gen- Figure 8. Palin under attack after the elections.
erally have low sentiment scores due to the criticism of both
candidates against each other. The debate revealed little nov-
elty with respect to each candidate’s political plans after the Further trends
election. Therefore, there were no strong positive statements The Democratic vice presidential candidate Joe Biden, who
about the event in the monitored feeds. is represented by blue bars with two interruptions, was not
referenced often. As it can be seen in Fig. 4, he appears very
rarely compared to the Republican vice presidential candi-
date Sarah Palin.
A further discovery was that some feeds show daily patterns.
For example, one RSS-feed only sent messages in the morn-
ing at about 7AM, others broadcast their news during work-
ing hours and some feeds even switched the coverage of po-
litical events within daily patterns, which is probably due to
Wed Oct 15 22:20:20 CST 2008 (Feed 32): Wed Oct 15 22:36:54 CST 2008 two editors each preferring news about one party and taking
McCain and Obama battle in contentious
debate: HEMPSTEAD, New York (Reuters)
(Feed 34): turns in writing news postings.
Obama, McCain Get Feisty in
- Republican John McCain and Democrat Final Presidential Debate:
Barack Obama battled fiercely on
Wednesday in their liveliest and most
Candidates mix it up on campaign Often, the same news story is broadcasted in many different
attacks economics, taxes, "Joe
contentious debate, with McCain attacking the plumber."
plumber. feeds (e.g., the above mentioned news about Palin’s wardrobe).
Obama's tax plan, campaign tone and
relationship with a 1960s radical.
This is mainly due to the fact that some feeds immediately
broadcast the news copied from a particular news agency,
whereas other feeds broadcasted this information later. An-
Figure 7. TV debate other feed resent the same news posting several times as
shown in Fig. 9.
Obama wins the election CONCLUSIONS
As you can see in annotation D in Fig. 4 the election day is The main contribution of this paper is the combination of a
dominated by gray bars. This is due to the fact that these sentiment analysis method with a visualization technique re-
news postings reported about election results in particular vealing the emotional content of RSS news feeds over time.
states, featuring scores of both candidates. In the evening of Through textual filters, we focused our analysis on the 2008
the election day lots of news postings were received about US presidential election featuring positive and negative news
the winner Barack Obama. The density of news about the items about the presidential candidates Obama and McCain,
Democrats increased rapidly after the result was known and the vice president candidates Biden and Palin and the two
dominate the news for several days. major parties. The timeline visualization builds upon three
basic elements, first the attribute color denotes the political
Palin’s wardrobe party featured in the news article, second, different shapes
Although after the election the blue shapes increased im- are used to distinguish between the discussed persons, and
mensely, some red negatively rated items stick out (see Fig. third, the emotional score of each RSS news article resulted
6
news items are copied from other news tickers, related RSS
postings are often based on the text of the same announce-
ment of a newswire and therefore often contain almost iden-
tical vocabulary. For the analysis of other content, such as
product reviews or the full articles linked in the RSS tick-
ers, more complex document similarity measures could be
employed. Furthermore, we believe that more sophisticated
sentiment analysis methods can be integrated into the pre-
sented analysis tool.
Acknowledgement
This work has been funded by the research center ”Compu-
Figure 9. Technical failure or search engine optimization resulting in
resending the same news postings over and over again.
tational Analysis of Linguistic Development” at the Univer-
sity of Konstanz and by the German Research Society (DFG)
under the grant GK-1042, Explorative Analysis and Visual-
in the vertical position of the representative symbol on the ization of Large Information Spaces, Konstanz.
time line. We thank the anonymous reviewers of the VISSW 2009 for
their valuable comments.
Within the result section, we showed how some emotional
discussions manifested in our news visualization: 1) Palin REFERENCES
abused power in Alaska, which resulted in many negative 1. A. Abbasi and H. Chen. Categorization and analysis of
news items and her own version sticking out as a highly pos- text in computer mediated communication archives
itive article. 2) The story about assassination plans against using visualization. In JCDL ’07: Proceedings of the
Obama dominated the news for several hours with highly 2007 conference on Digital libraries, pages 11–18,
negative sentiment scores. 3) The final TV debate consisted New York, NY, USA, 2007. ACM.
of mainly gray elements since reports featured both candi-
dates. In general, the accusations of both candidates against 2. L. A. Adamic and N. Glance. The political blogosphere
each other resulted in more negative than positive sentiment and the 2004 U.S. election: divided they blog. In
scores. 4) Obama wins the elections, which is documented LinkKDD ’05: Proceedings of the 3rd international
by the vast dominance of blue news elements on the eve of workshop on Link discovery, pages 36–43. ACM, 2005.
the election day and the following days. 5) Even after the 3. T. Ball and S. G. Eick. Software Visualization in the
election a discussion about the expensive wardrobe of Palin Large. IEEE Computer, 29(4):33–43, 1996.
fills negative headlines.
4. V. Buvac. Internet General Inquirer, 2008.
The tool’s interaction concept shows the corresponding RSS http://www.webuse.umd.edu:9090/ as retrieved on Nov.
news articles when the mouse is moved over a symbol on 14, 2008.
the timeline. To find redundant or similar news items in the
5. K. Dave, S. Lawrence, and D. M. Pennock. Mining the
process of analyzing particular events, we furthermore im-
peanut gallery: opinion extraction and semantic
plemented a simple document similarity filter, which after
classification of product reviews. In WWW ’03:
selecting a particular news item highlights all related news
Proceedings of the 12th international conference on
postings surpassing a certain threshold of similarity.
World Wide Web, pages 519–528. ACM, 2003.
We believe that the presented analysis tool can not only be 6. A. Don, E. Zheleva, M. Gregory, S. Tarkan, L. Auvil,
used to monitor public emotional discussions, but is also ca- T. Clement, B. Shneiderman, and C. Plaisant.
pable of evaluating product reviews, public opinions on a Discovering interesting usage patterns in text
particular subject, or to get hints about the reputation an en- collections: integrating text mining with visualization.
terprise. By offering sentiment analysis functionality of a In CIKM ’07: Proceedings of the sixteenth ACM
multitude of large RSS feeds in real-time, users of this tech- conference on Conference on information and
nique can take early action, such as reacting before a topic knowledge management, pages 213–222. ACM, 2007.
dominates news coverage. This strategic dimension of our
application is very valuable for public relation specialists 7. J.-D. Fekete and N. Dufournaud. Compus: visualization
and could be implemented in early warning systems. Fur- and analysis of structured documents for understanding
thermore, we expect the tool to be useful for monitoring social life in the 16th century. In DL ’00: Proceedings
the evolution of popularity of certain products, persons, or of the fifth ACM conference on Digital libraries, pages
views, ultimately answering the question about why a posi- 47–55, New York, NY, USA, 2000. ACM.
tive public image turned into a negative one. 8. D. Fisher, A. Hoff, G. Robertson, and M. Hurst.
Narratives: A Visualization to Track Narrative Events
Future Work as they Develop. In IEEE Symposium on Visual
For computing the similarity between news items we used Analytics and Technology (VAST 2007), pages
a simple word matching method. Due to the fact that many 115–122, 2008.
7
9. B. Fortuna, D. Mladenic, and M. Grobelnik. 23. K. Lagus, T. Honkela, S. Kaski, and T. Kohonen.
Visualization of Text Document Corpus. Informatica Self-organizing maps of document collections: A new
Journal, 29(4):497–502, 2005. approach to interactive exploration. In E. Simoudis,
J. Han, and U. Fayyad, editors, Proceedings of the
10. M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger. Second International Conference on Knowledge
Pulse: Mining Customer Opinions from Free Text. In Discovery and Data Mining, pages 238–243. AAAI
Advances in Intelligent Data Analysis VI, pages Press, 1996.
121–132. Springer, 2005.
24. B. Liu, M. Hu, and J. Cheng. Opinion observer:
11. M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst,
analyzing and comparing opinions on the Web. In
and A. C. König. BLEWS: Using Blogs to Provide
WWW ’05: Proceedings of the 14th international
Context for News Articles. In ICWSM, 2008.
conference on World Wide Web, pages 342–351. ACM,
12. N. Glance, M. Hurst, and T. Tomokiyo. BlogPulse: 2005.
Automated Trend Discovery for Weblogs. In WWW
2004 Workshop on the Weblogging Ecosystem. ACM, 25. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?:
May 2004. sentiment classification using machine learning
techniques. In EMNLP ’02: Proceedings of the ACL-02
13. M. L. Gregory, N. Chinchor, P. Whitney, R. Carter, conference on Empirical methods in natural language
E. Hetzler, and A. Turner. User-directed Sentiment processing, pages 79–86. Association for
Analysis: Visualizing the Affective Content of Computational Linguistics, 2002.
Documents. In Workshop on Sentiment and Subjectivity
in Text, pages 23–30, 2006. 26. A.-M. Popescu and O. Etzioni. Extracting product
features and opinions from reviews. In HLT ’05:
14. V. Hatzivassiloglou and J. Wiebe. Effects of adjective Proceedings of the conference on Human Language
orientation and gradability on sentence subjectivity, Technology and Empirical Methods in Natural
2000. Language Processing, pages 339–346. Association for
Computational Linguistics, 2005.
15. S. Havre, E. Hetzler, P. Whitney, and L. Nowell.
ThemeRiver: Visualizing Thematic Changes in Large 27. A. Spoerri. InfoCrystal: a visual tool for information
Document Collections. IEEE Transactions on retrieval & management. In CIKM ’93: Proceedings of
Visualization and Computer Graphics, 8(1):9–20, 2002. the second international conference on Information and
knowledge management, pages 11–20. ACM, 1993.
16. M. A. Hearst. TileBars: Visualization of Term
Distribution Information in Full Text Information 28. R. Swan and D. Jensen. TimeMines: Constructing
Access. In Proceedings of the Conference on Human Timelines with Statistical Models of Word Usage,
Factors in Computing Systems, CHI’95, 1995. 2000.
17. M. Hu and B. Liu. Mining and summarizing customer 29. B. Wang, B. Spencer, C. X. Ling, and H. Zhang.
reviews. In KDD ’04: Proceedings of the tenth ACM Semi-supervised Self-training for Sentence Subjectivity
SIGKDD international conference on Knowledge Classification, pages 344–355. Lecture Notes in
discovery and data mining, pages 168–177. ACM, Computer Science. Springer Berlin / Heidelberg, 2008.
2004.
30. J. A. Wise, J. J. Thomas, K. Pennock, D. Lantrip,
18. M. Hu and B. Liu. Mining Opinion Features in M. Pottier, A. Schur, and V. Crow. Visualizing the
Customer Reviews. In AAAI, pages 755–760, 2004. non-visual: spatial analysis and interaction with
19. D. A. Keim and D. Oelke. Literature Fingerprinting: A information from text documents. In INFOVIS ’95:
New Method for Visual Literary Analysis. In EEE Proceedings of the 1995 IEEE Symposium on
Symposium on Visual Analytics and Technology (VAST Information Visualization, pages 51–58, 1995.
2007), pages 115–122, 2007.
20. S.-M. Kim and E. Hovy. Extracting Opinions, Opinion
Holders, and Topics Expressed in Online News Media
Text. In Proceedings of the ACL Workshop on
Sentiment and Subjectivity in Text, pages 1–8, 2006.
21. N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, and
T. Fukushima. Collecting Evaluative Expressions for
Opinion Extraction. In IJCNLP, pages 596–605, 2004.
22. R. R. Korfhage. To see, or not to see – is That the
query? In SIGIR ’91: Proceedings of the 14th annual
international ACM SIGIR conference on Research and
development in information retrieval, pages 134–141.
ACM Press, 1991.
8