=Paper=
{{Paper
|id=None
|storemode=property
|title=Enhanced Visualization for Web-Based Summaries
|pdfUrl=https://ceur-ws.org/Vol-880/VLDS-p40-Wenerstrom.pdf
|volume=Vol-880
|dblpUrl=https://dblp.org/rec/conf/vlds/WenerstromK11
}}
==Enhanced Visualization for Web-Based Summaries==
Enhanced Visualization for Web-Based Summaries
Brent Wenerstrom Mehmed Kantardzic
Computer Eng. and Computer Science Dept. Computer Eng. and Computer Science Dept.
Duthie Center for Engineering Duthie Center for Engineering
Louisville, Kentucky 40292 Louisville, Kentucky 40292
brent.wenerstrom@louisville.edu mmkant01@louisville.edu
ABSTRACT tion of a web page or web site hand written by the content
For each search result presented by a search engine, a user creator. 3) Lastly the text could come from the Open Direc-
has a choice to click through for more information or to skip tory Project (http://www.dmoz.org). The Open Directory
the result. We aim to improve the accuracy of this click Project is a community built directory of websites with a
process by introducing a color-coding scheme built upon number of short, human-written website summaries.
our improved summary text selection approach called Re- When search results are presented to users, the user has
Close. Color-coding adds an additional level of context to the task of deciding which results are relevant to their search
the text without requiring additional screen space. Our re- and which are not. Within information science it has been
sults showed an improvement in click precision from 66% found that as many as 80 factors contribute to the decision
when using Google summaries to 80% when using color- of judge deciding which documents are relevant to a par-
coded ReClose summaries. Improvements in user click pre- ticular search [10]. Users typically make this decision in a
cision will lead to better user experiences, the more effi- matter of seconds. When a user decides to click on a search
cient finding of search results and higher confidence levels in result there are two possible outcomes that depend on a
search engine usage. user’s expectations for that web page: 1) the user’s expecta-
tions were not met leading to disappointment or 2) the user’s
expectations were met or exceeded resulting in satisfaction.
1. INTRODUCTION Users may incorrectly skip relevant content missing out
Search engine usage has become a part of every day life for on potentially important information, but it is the feeling
internet users. Every time a search is conducted on Google of disappointment (possibility 1) that will most negatively
or Bing a list of search results is presented to the user. One of affect a search experience. We aim to improve the user’s
the major challenges that users face as they search for that accuracy in click decisions for the purpose of decreasing oc-
needle of information in the Internet haystack is deciding currences of disappointment.
which of the search results presented is relevant to their
search needs and which are not. When conducting searches
for facts and information the choices are not always obvious.
Each search result is composed of a title, a short text
summary and an abbreviated URL. The title usually is re-
vealing about the overall message of a web page. However,
it is written by the web content creator and may be a slogan Figure 1: A top 10 search result for the query close-
of a company or an advertising pitch, which can be mislead- ness centrality on Google (5/11/2011).
ing. The URL can be very helpful when one is familiar with
the host contained in the URL, but many URLs encountered As an example of the kinds of disappointment that may be
are not familiar to us. realized consider the search result to the query closeness cen-
The text summary is extracted from three possible loca- trality pictured in Figure 1. Closeness centrality is a graph
tions [9, 4, 13]. 1) Spans of text may be taken directly theory measure used for ordering nodes. The search result
from the content of a web page. 2) It may come from the shown in Figure 1 has a title of “Social Network Analysis”.
HTML meta description. The meta description is embed- This page is dedicated to the analysis of social networks.
ded in the HTML of a web page. It is not displayed to Closeness centrality as is shown in the summary is clearly
users visiting a web site, but is usually a general descrip- mentioned. One also finds an example description of close-
ness centrality in a social network. One may expect that
this page contains a lengthy description of closeness central-
Permission to make digital or hard copies of all or part of this work for ity followed by this example. However, clicking through to
personal or classroom use is granted without fee provided that copies are the result page leads to Figure 2. The web page does dis-
Permission
not made ortodistributed
make digital or hard
for profit or copies of all advantage
commercial or part of and
thisthat
work for
copies cuss social network analysis as would be expected by the
personal
bear this or classroom
notice and theuse
fulliscitation
grantedonwithout
the firstfee provided
page. To copythatotherwise,
copies areto title, but there is only a single paragraph on closeness cen-
not made ortodistributed
republish, post on for profitoror to
servers commercial
redistributeadvantage
to lists,and that copies
requires prior trality. This single paragraph only describes a brief example
bear this permission
specific notice and the full acitation
and/or fee.Thisonpaper
the first
waspage. To copy
presented at otherwise, to barely longer than the text summary given by the search re-
republish,
Very LargetoData
post on servers
Search or to redistribute
(VLDS) 2011. to lists, requires prior specific
permission and/or a fee. sult. This web page did not meet the previously detailed
Copyright 2011. expectations and would lead to disappointment on the part
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
that highlighted keywords both within a single document
and document lists. Each of the keywords was given a score
with a matching bar showing the strength of that score vi-
sually.
Kaugars [7] used thumbnails and zoomed views to show
keywords in context for a number of documents. Initially all
search results are displayed as web page thumbnails, with
keyword locations highlighted. A user may zoom to a level
where keywords are shown in context and other paragraphs
are compressed. Users may again zoom in again to view the
full, scrollable contents of a document.
Hemmje et al. [6] presented Lyberworld, which displayed
documents in a three dimensional sphere with keywords shown
at the edge of the sphere. Documents were presented closest
to the keywords contained in those documents.
Keyword highlighting has improved information retrieval
result scanning for more than 25 years [8]. Highlighting
has proved useful in several interfaces developed since that
Figure 2: Web page at http://www.orgnet.com/sna. time [1, 2]. However, no other research to the best of our
html (5/11/2011). knowledge has proposed the use of color depth or warning
colors within summary text to provide additional context.
of the searcher. 3. COLOR-CODED RECLOSE SUMMARIES
The user in the previous search example would be aided
by the two main features of color-coded ReClose summaries. The goal of color-coded ReClose summaries is to increase
First, keywords are highlighted with color depth to provide the accuracy (precision) with which users click on search re-
global context rather than just the local context of one or sults to find relevant documents. Increasing accuracy will in
two sentences surrounding a keyword. This “global context” turn lead to fewer disappointments and a better user experi-
refers to the extent of discussion on a web page containing ence. Color-coded ReClose summaries aim to improve upon
the query topic. In the previous example, the user would current search result summaries using three main parts. First,
have been aware before clicking that there were very few oc- we build upon our previous work on text summary gener-
currences of the terms “closeness” and “centrality” by visual ation approach called ReClose [12]. Second, we highlight
clues of color enhanced query keyword highlighting. query keywords using variable shades of blue to show the
Secondly, major departures from the main topics of a web depth of usage of those query keywords on a web page.
search are flagged. If the main subject of a web page is Third, we display in red terms central to the web page’s
different from the intent of the search user, then a topic term topic which potentially differ from the topic of the keywords
is shown in red. This warns the user that the keywords may searched for.
be peripheral to the main subject of the web page. Both 3.1 ReClose
color depth and topic word flagging are shown in this paper
to effectively improve user click precision and decrease user The ReClose approach [12] combines two sentence rank-
disappointment. This in turn will improve the efficiency of ings into a single summary with two parts. It combines the
the user and lead to better user experiences with the search benefits of query-biased and query-independent summaries.
engine. Query-biased summaries show keywords in context focusing
the summaries on content most relevant to search. Query-
independent summaries provide an overview of a single doc-
2. RELATED WORKS ument.
The highlighting of keywords has been used in a number of Query-independent summarization is achieved using close-
settings where users scan documents or lists of documents. ness centrality [3] of graph theory to rank sentences as repre-
Highlighting attracts the user’s attention to these keywords sentative to the whole document. Closeness centrality ranks
using bolding, reverse video or coloring the background of the centrality of nodes in a graph with the highest rank going
the text. In each case it has been shown to be useful to the to the node with the smallest average distance to all other
scanning and examination of documents and document lists nodes. Documents are converted to graphs by turning each
[8]. sentence into a node, then comparing each sentence to each
A number of useful approaches exist for highlighting key- other sentence using word overlap.
words. Baudisch et al. [1] compresses highlighted documents The second part of the ReClose approach involves learning
using Fishnet to a single screen for visual search. Byrd [2] from the summary generation techniques of the top ranking
proposed the use of different colors for each keyword within search engines, namely Google, Yahoo and Bing. To im-
a single document, which also was used to designate location prove upon the query-biased summaries of current search
of keywords on the scrollbar by color. engines, we learn from the summaries generated by all three
Veerasamy and Belkin [11] proposed a table of bar charts top search engines. We generated training data by observing
to show term importance visually. Each row designated which sentences were chosen by each of these search engines.
a single document, each column represented a word. The We trained a linear regression model to score sentences to
words selected included both query terms and terms used for match the sentence selection of Google, Yahoo and Bing. Af-
relevance feedback. Graham [5] presented Reader’s Helper ter training, a new document is split into sentences and each
sentence is ranked by the linear regression model. The top Keywords
ranking sentences are chosen to represent the query-biased building a database
portion.
In this way we now have a two part summary taking Title
advantage of both query-biased and query-independent ap- database
Summary
database
proaches to summary generation. Each portion of the sum-
mary is labeled so that users of the summaries are aware of
building database
building
Select
the different intentions with each of the two text spans.
database
database
database
Color
building
3.2 Color-Coded Keywords Frequencies
Web Page
We color-code keywords to provide additional context about
the usage of keywords. The query-biased summaries of say
Google or Bing will provide one or two text spans generally Figure 3: Process of color-coding query keywords.
that show one or two usages of the keywords searched. In
this way the context on a scale of say plus or minus ten words Web Page
from the keywords are shown. Our color-coding of the key- Title Top
words adds depth to each keyword just as colors can provide Term
term term term term Threshold
Percent
terrain depth on a topographical map. Many topographical Summary
term
term term term term
Frequencies
term
term term term term term
maps will provide a key that shows the elevation range of term
term term term term
contains
10%
the map and provide different colors for each subdivision of term term term term
Other Search Results
term term
elevation. This “color-coding” provides users of these maps a
contains Yes contains No
more intuitive view than simply a set of contour lines to un- Title Title
derstand depth. Our depth refers to the frequency of query JDBC
keywords on a web page. This gives a user a greater appre-
ciation for how long discussions involving the keywords may
be compared to other search results.
Figure 4: Process of flagging terms.
The key used in our surveys is shown in “Select Color”
step of Figure 3. We count the frequency of each keyword
on a web page after the removal of stop words and use of belongs to, whether the tail end 0-20 or the top end of 60+,
Porter stemming. Then for each possible frequency between which is where the real value is had.
zero and 63 a different shade of blue is used. (A keyword
may be contained in a summary and not on a web page 3.3 Flagged Words
if it is contained in the meta description but not the web The goal of the flagging module is to visually differenti-
page’s content). A diagram of color-coding query keywords ate web pages in which the search keywords are the main
is shown in Figure 3. Now summaries of web pages that talk topic from those web pages where the search keywords are
at great lengths about say “canines” will be distinguishable peripheral to the main topic of the page.
from a web page that has very little text which mentions We assume that the most frequent term(s) in a document
“canines”. is central to the main topic of a document. We are not
concerned with presenting to the user the exact topic of a
document, but instead are intent upon finding the depar-
Table 1: Colors used to create the color scale.
RGB Values tures of document topics from the searched topic. Generally
Color Names Frequencies R G B only a single term is considered for flagging to limit the in-
Duke blue 63 0 26 87 formation overload of the user. A single term should allow a
Egyptian blue 30 16 52 166 user to discern the potential topic of a document in addition
deep sky blue 0 0 191 255 to the summary text.
We have designed an algorithm to determine if we should
flag any terms within a document summary. Often due to
The exact colors used are in Table 1. We chose to use a the nature of search the most frequent term in a document
light blue (deep sky blue) for the smallest frequency value is one of the keywords. These terms should not be flagged.
of zero. Then to make the range between 0 and 30 more Additionally, many terms belong to the same topic as the
pronounced we chose an intermediate, but fairly dark blue query keywords and should not be flagged. Our algorithm
(Egyptian blue) at a frequency of 30. A dark blue (Duke does not flag terms highly related to the queried topic. The
blue) was used for a frequency of 63+ which was still dis- steps in our algorithm are diagrammed in Figure 4 and are
tinguishable from regular text in black. To calculate the outlined below:
RGB values for frequencies in between these specific values, 1. Determine the most frequent term in a document.
one divides the difference in color values by the number of 2. Obtain a count of the top ranking documents also in-
different frequencies. cluding this top term.
It is unlikely that most users will be able to know exactly 3. Threshold the percentage of documents containing the
what color represents which frequency, but it will be obvious top term.
which summaries contain more frequent keywords. For ex- The algorithm begins by first determining the most fre-
ample in the summary in Figure 3 the keyword “database” is quent term in a document (step 1). This involves count-
more frequent in the document than the keyword “building”. ing term usage within a document after the removal of stop
It will also be obvious which end of the scale each keyword words.
Once we have determined the most frequent term in a
80
Skipped
document, we then consider all other top ranking documents Clicked
returned for the search (step 2). In our case we used the top Irrelevant Relevant
60
28 documents (not including the current document), since
this is the maximum number of documents returned through
Count
40
Google’s Web Search API (http://code.google.com/apis/
websearch/).
20
The percentage of top ranking documents for the current
search containing the most frequent term is then thresh-
olded (step 3). We used a threshold of 60%. Terms that
0
None Sent. Para. Pages Book
occur in more than half of the top documents for a search
Relevance Expectations
generally are highly related to the search terms. As an ex-
ample consider the terms by percentage for the query al-
gorithms. Terms above the 60% threshold include: “algo- Figure 5: Distribution of expected relevant content
rithms” at 100%, “computer” at 80% and “number” at 60% divided by clicked and skipped documents.
which are all related to algorithms. Examples of terms be-
low the threshold are “privacy”, “course”, “heap” and “2007”
with only “heap” being a term associated with algorithms. expected.
Terms found in 60% of documents are both rare and highly Second, users were provided links to each destination page
related. and viewed these pages one at a time. A user marked down
Terms that do not meet the threshold will be displayed the actual amount of relevant content using the same options
in the summary colored red. For example see the summary presented for expectations. In this way rather than finding
in Figure 4 where the term “JDBC” is flagged. JDBC refers out if a user believes a page is relevant or not to their search,
to one method in Java for connecting to databases. It is we can also monitor lesser disappointments, such as a user
distantly related to the query building a database, but clearly expecting to find pages and pages of relevant content but in
shows that this particular document is less focused on the actuality only finding a couple of sentences. In this case the
building of the database, and more focused on Java related document is still relevant, but the user is likely not satisfied
issues. with the results.
After we have determined that a term should be flagged Survey participants were shown 5 summaries per summary
for a particular summary, we must ensure that the flagged type for a total of 15 summaries.
term is included in the summary. To accomplish this we
filter the query-independent sentence ranking to only include 4.2 Summary Data
sentences including the flagged terms. This ensures that the Survey participants were randomly assigned three queries
flagged term will appear in at least one sentence included in out of a pool of 15 queries. These queries were chapter titles
the summary. and project titles from an introductory course in computer
science so that all query topics were familiar to the survey
4. EXPERIMENTAL RESULTS participants. Some example queries were logic gates and
We hypothesize that color-coding ReClose generated sum- creating a web page.
maries that users will have more accurate expectations of For each of the 15 queries, 28 search results were obtained
the web pages summarized. To test this we created a survey from Google. We downloaded each linked web page in the
that allow us to compare the accuracy of user expectations search results resulting in 400 successfully downloaded and
based on summaries. We mainly compare color-coded Re- parsed web pages out of 420 possible. We only used 5 search
Close summaries against Google summaries. We addition- results per query. To decide which search results to use,
ally compare ReClose summaries with and without color- we randomly selected web pages from two pools. The first
coding to ensure that the color-coding made a difference, pool was likely to have search results with flagged summaries
and that text selection alone was not the main cause for because when the frequencies of terms in a document was
improvement. ranked the query keywords had a low rank. The second pool
contained the top 5 search results as ranked by Google.
4.1 Survey Participants and Survey Design After determining the pool of search results most likely to
For our survey we recruited 21 volunteers among under- be flagged and the top Google search results, randomly we
graduate and graduate students in the Computer Engineer- select 2-4 results from the pool of results likely to be flagged.
ing and Computer Science department at the University of Then the remaining results are taken starting starting with
Louisville. Surveys were conducted exclusively online. the top ranked Google result from the second pool.
The summary analysis was broken down into two parts
and repeated for each of the three summary techniques un- 4.3 Results and Discussion
der comparison. First a user would be shown 5 summaries First we verify the relationship between user click behavior
for a randomly selected query. For each summary a user and the relevance markings. Figure 5 shows the distribution
would mark if they would click on that summary. Then they of expected relevance for search results clicked and skipped.
would mark the amount of relevant content expected. The This figure shows that no user would click on a result if
choices available were “None”, “Sentences”, “Paragraphs”, they expected no relevant content. If a user expected only
“Pages” or “Book”. Rather than just obtaining which results a sentence or two of relevant data, users were unlikely to
a user would click on, we obtain a finer grained understand- click (72% or 64/89). A natural division emerges from the
ing of the process through how much relevant content a user expectation results. Users expecting “Sentences” or “None”
would skip the result 82% (116/141) of the time, leading us coded ReClose summaries than either Google (66%) or Re-
to call this section “irrelevant”. The other half of the rele- Close summaries highlighting with bold (75%). When users
vant spectrum we labeled “relevant”. Users clicked through used Google summaries they clicked through to relevant web
84% (146/174) of the time when expecting “Paragraphs” or pages only about 2/3 of the time that they clicked. With
more of relevant information. Performing a χ2 test on the more precise clicks, users using color-coded ReClose sum-
count data revealed by this dividing line resulted in χ2 value maries also clicked on more of the relevant content having
of 134.8 and a p-value < 0.001, clearly showing a significant a click recall score of 70%. Individuals using Google and
difference between these two groups. Click through and ex- bolded ReClose summaries skipped more relevant content
pectation have a lot in common, but expectations provide having recall scores of 60% and 64% respectively.
more insight into the mental process of the search users. In practice a higher click precision will be more notice-
The expectations of survey participants was fairly inaccu- able to users. Users are aware of clicks to irrelevant content,
rate. Only 34% (108/315) of expectations matched exactly experiencing disappointment. However, there is no form of
the actual relevant content of web pages. In another 34% feedback for click recall. Users are not aware that they have
(108/315) of expectations resulted in actual content being skipped over a relevant document. One of the main objec-
opposite of expectations in terms of the relevant/irrelevant tives of color-coded ReClose summaries was to improve the
split mentioned earlier. For example there were 16 occur- click precision for users. From the numbers in Table 3 it is
rences where a survey participant marked a relevant expec- clear that color-coded ReClose summaries improve the pre-
tation of “Paragraphs” or higher only to find no relevant cision of users, both over Google summaries and ReClose
content. summaries without color-coding. This leads to fewer disap-
In our survey color-coded ReClose summaries achieved pointments in practice.
a much lower percentage of disappointment at 23% than
Google summaries achieved at 34% as shown in Table 2. 4.4 Color-Coded Results and Discussion
Disappointment was recorded when the relevant content was
We now consider the effectiveness of the two color-coding
lower than what their expectations. When we conduct a χ2
features: color-coded keywords and flagged words. In this
test on the count data comparing Google and color-coded
section comparisons are only made between bolded and color-
ReClose we obtain a χ2 value of 2.8 and a p-value of 0.09.
coded ReClose summaries. You can be assured both outper-
This p-value does not fall below the usual threshold value
formed Google summaries, but here the focus is just on the
of 0.05. However, there still remains an obvious difference
added color-coding features. We first consider the color-
between the results of Google summaries and color-coded
coded keywords. The scale we used allowed for usage count
ReClose summaries that would become more pronounced
differentiation from 0-63. Summaries were not evenly dis-
with the additional survey participants.
tributed across this range. Nearly half (49% or 37/75) of
the summaries used had at most a keyword with 0-9 usages
on the web page summarized. We would expect that users
Table 2: Disappointment counts and percentages for
would have low expectations for summaries that at most
three summary techniques.
contained keywords on the low end of the scale. Looking
Summary Disap - Satisfied or Total
at the results, there was no perceived change in behavior
Source pointment Surprised Summaries
for summaries containing low count query keywords (0-9) to
Google 36 (34%) 69 (66%) 105
medium count (10-59). Only in the case of high count query
Color-Coded 24 (23%) 81 (77%) 105
keywords (60+) was there a noticeable change in behavior.
There were 13 summaries (17%) with at least one query
We now look at the precision with which users chose to keyword with a usage count of 60+. For these 13 sum-
click on a result. Considering that a majority of users did maries, participants found the actual relevant content to be
not click when expectations were a couple sentences or less, high. For example no matter the summary type, more than
we label all web page views with a few sentences or less of 50% of views led to actual relevant content in the “Pages”
relevant content as “irrelevant.” Survey participant mark- level. This was rarely expected when using bolded ReClose
ing more than a few sentences worth of relevant content summaries, see Table 4. Bolded ReClose summaries led to
are labeled as “relevant.” Dividing clicks into relevant and 23% of pages views in the “Pages” level expectations. The
irrelevant allows for us to calculate click precision. We de- color-coded ReClose summaries more often led to higher ex-
fine click precision as the percentage of summary views with pectations in line with the actual content. In 44% of views,
clicks that led to relevant web pages. Click recall is the color-coded users identified an expectation in the “Pages”
percentage of relevant documents that were clicked. The re- range. Color-coded ReClose summaries also led to the high-
sults of these calculations for each summary technique can est actual relevant content as well at 67%. In the case of
be seen in Table 3. high usage count keywords, color-coded ReClose summaries
Table 3 shows that users clicked more often (61 times) led to justifiably higher expectations.
and had a higher click precision (80%) when using color- First we compare the effect that flagging had on expecta-
tions which can be seen in Table 5 in the column marked
“Expected Relevant”. In this table documents were bro-
ken into two groups, documents that had terms flagged by
Table 3: Click precision and recall comparison. color-coded ReClose (rows marked “Flaggable”) and docu-
Approach Click Precision Click Recall ments that did not (rows marked “Not Flaggable”). When
Google 66% (38/58) 60% (38/63) color-coded ReClose summaries had flagged terms, the ex-
ReClose 75% (39/52) 64% (39/61) pectations were much lower (29% expected to be relevant)
Color-Coded 80% (49/61) 70% (49/70) than those same summaries without color-coding (40% ex-
lighting techniques of color-coding keywords and flagging di-
Table 4: Expected and actual relevant content for vergent topic terms both were effective. Color-coding sum-
web pages with a query keyword count of 60+. maries is an effective way to enhance the summary informa-
≤Para Pages Book
tion to users without increasing the screen space. We plan
Expected 10 77% 3 23% 0 0% on making further improvements to the selection algorithm
ReClose
Actual 4 31% 8 62% 1 8% for flagged terms. We also plan to enhance summaries with
Color- Expected 10 56% 8 44% 0 0% the use of multimedia.
Coded Actual 5 28% 12 67% 1 6%
6. REFERENCES
pected to be relevant). A similar pattern was found for [1] P. Baudisch, B. Lee, and L. Hanna. Fishnet, a fisheye
color-coded summaries without flagged terms having higher web browser with search term popouts: a comparative
expectations. This shows that the flagging of terms directly evaluation with overview and linear view. In
affected the expectations of the user. Proceedings of the working conference on Advanced
visual interfaces, AVI ’04, pages 133–140, New York,
NY, USA, 2004. ACM.
Table 5: Expected and actual relevant content for [2] D. Byrd. A scrollbar-based visualization for document
documents that would (Flaggable) and would not navigation. In Proceedings of the fourth ACM
(Not Flaggable) have summaries with flagged terms. conference on Digital libraries, DL ’99, pages 122–129,
Expected Actual Click New York, NY, USA, 1999. ACM.
Relevant Relevant Prec. [3] L. C. Freeman. Centrality in social networks
F 21/53 (40%) 24/53 (45%) 70% conceptual clarification. Social Networks,
ReClose
NF 30/52 (58%) 37/52 (71%) 78% 1(3):215–239, 1978-1979.
Color- F 15/52 (29%) 25/52 (48%) 57% [4] Google. Changing your site’s title and description in
Coded NF 44/53 (83%) 45/53 (85%) 87% search results. http://www.google.com/support/
webmasters/bin/answer.py?hl=en&answer=35264.
There is a much lower percentage of documents found to [5] J. Graham. The reader’s helper: a personalized
be relevant that had flagged terms. Even in the case where document reading environment. In Proceedings of the
flagged terms were not shown to users (bolded ReClose sum- SIGCHI conference on Human factors in computing
maries), 45% of documents that could have been flagged systems: the CHI is the limit, CHI ’99, pages 481–488,
were found to be relevant compared to 71% of documents New York, NY, USA, 1999. ACM.
that would not have had flagged terms. What is interesting [6] M. Hemmje, C. Kunkel, and A. Willett. Lyberworld -
is how flagging affects the click precision of users. Those that a visualization user interface supporting fulltext
saw the flagged terms had a click precision of 57% on flagged retrieval. In Proceedings of the 17th Annual
summaries compared to 70% that did not see the flagging for International Conference on Research and
these same summaries. However, users expected more and Development in Information Retrieval, ACM SIGIR,
were more precise when color-coding was available and no pages 249–259. Berlin: Springer, 1994.
flagged terms appeared in a summary achieving a click preci- [7] K. Kaugars. Integrated multi scale text retrieval
sion of 87% compared to 78% without color-coding. Overall visualization. In CHI 98 conference summary on
with far fewer clicks among flagged summaries, the over- Human factors in computing systems, CHI ’98, pages
all click precision was higher for the color-coded version of 307–308, New York, NY, USA, 1998. ACM.
ReClose (see Table 3). [8] G. Marchionini. Information seeking in electronic
environments. Cambridge University Press, 1995.
5. CONCLUSION [9] Microsoft. Anatomy of a bing caption.
In this paper we outline color-coded ReClose summaries. http://www.bing.com/community/site_blogs/b/
Web-based summaries were visually enhanced using two tech- webmaster/archive/2010/10/25/
niques. The first technique was to provide global context anatomy-of-a-bing-caption.aspx.
for the query keywords, by using varying color to highlight [10] L. Schamber. Relevance and information behavior.
these keywords. The second technique highlighted in red Annual review of information science and technology
terms that topically differed from the topics of a query. This (ARIST), 29:3–48, 1994.
provided a warning mechanism to aid users avoid clicking [11] A. Veerasamy and N. J. Belkin. Evaluation of a tool
through to results less likely to be relevant. We hypoth- for visualization of information retrieval results. In
esized that color-coded ReClose summaries would increase Proceedings of the 19th Annual International
the accuracy of user click decisions, thus reducing disap- Conference on Research and Development in
pointments and improving user experiences. Information Retrieval, SIGIR, pages 85–92, 1996.
Survey results showed that color-coded ReClose summaries [12] B. Wenerstrom and M. Kantardzic. ReClose: Web
(80%) led to an improvement in user click precision over page summarization combining summary techniques.
Google summaries (66%). This in turn led to color-coded 2011. Accepted for publication in the International
ReClose summaries resulting in fewer disappointments (24) Journal of Web Information Systems on 4/27/2011.
compared to Google summaries (36). Improved precision [13] Yahoo! How to change a page title or description in
and decreased disappointment will result in a better user yahoo! search results. http://help.yahoo.com/l/us/
experience. yahoo/search/indexing/indexing-11.html.
A closer look at the survey results showed that both high-