=Paper= {{Paper |id=None |storemode=property |title=Enhanced Visualization for Web-Based Summaries |pdfUrl=https://ceur-ws.org/Vol-880/VLDS-p40-Wenerstrom.pdf |volume=Vol-880 |dblpUrl=https://dblp.org/rec/conf/vlds/WenerstromK11 }} ==Enhanced Visualization for Web-Based Summaries== https://ceur-ws.org/Vol-880/VLDS-p40-Wenerstrom.pdf
             Enhanced Visualization for Web-Based Summaries

                                 Brent Wenerstrom                                                         Mehmed Kantardzic
              Computer Eng. and Computer Science Dept.                                        Computer Eng. and Computer Science Dept.
                   Duthie Center for Engineering                                                   Duthie Center for Engineering
                     Louisville, Kentucky 40292                                                      Louisville, Kentucky 40292
                   brent.wenerstrom@louisville.edu                                                    mmkant01@louisville.edu


ABSTRACT                                                                                      tion of a web page or web site hand written by the content
For each search result presented by a search engine, a user                                   creator. 3) Lastly the text could come from the Open Direc-
has a choice to click through for more information or to skip                                 tory Project (http://www.dmoz.org). The Open Directory
the result. We aim to improve the accuracy of this click                                      Project is a community built directory of websites with a
process by introducing a color-coding scheme built upon                                       number of short, human-written website summaries.
our improved summary text selection approach called Re-                                          When search results are presented to users, the user has
Close. Color-coding adds an additional level of context to                                    the task of deciding which results are relevant to their search
the text without requiring additional screen space. Our re-                                   and which are not. Within information science it has been
sults showed an improvement in click precision from 66%                                       found that as many as 80 factors contribute to the decision
when using Google summaries to 80% when using color-                                          of judge deciding which documents are relevant to a par-
coded ReClose summaries. Improvements in user click pre-                                      ticular search [10]. Users typically make this decision in a
cision will lead to better user experiences, the more effi-                                   matter of seconds. When a user decides to click on a search
cient finding of search results and higher confidence levels in                               result there are two possible outcomes that depend on a
search engine usage.                                                                          user’s expectations for that web page: 1) the user’s expecta-
                                                                                              tions were not met leading to disappointment or 2) the user’s
                                                                                              expectations were met or exceeded resulting in satisfaction.
1.     INTRODUCTION                                                                              Users may incorrectly skip relevant content missing out
   Search engine usage has become a part of every day life for                                on potentially important information, but it is the feeling
internet users. Every time a search is conducted on Google                                    of disappointment (possibility 1) that will most negatively
or Bing a list of search results is presented to the user. One of                             affect a search experience. We aim to improve the user’s
the major challenges that users face as they search for that                                  accuracy in click decisions for the purpose of decreasing oc-
needle of information in the Internet haystack is deciding                                    currences of disappointment.
which of the search results presented is relevant to their
search needs and which are not. When conducting searches
for facts and information the choices are not always obvious.
   Each search result is composed of a title, a short text
summary and an abbreviated URL. The title usually is re-
vealing about the overall message of a web page. However,
it is written by the web content creator and may be a slogan                                  Figure 1: A top 10 search result for the query close-
of a company or an advertising pitch, which can be mislead-                                   ness centrality on Google (5/11/2011).
ing. The URL can be very helpful when one is familiar with
the host contained in the URL, but many URLs encountered                                         As an example of the kinds of disappointment that may be
are not familiar to us.                                                                       realized consider the search result to the query closeness cen-
   The text summary is extracted from three possible loca-                                    trality pictured in Figure 1. Closeness centrality is a graph
tions [9, 4, 13]. 1) Spans of text may be taken directly                                      theory measure used for ordering nodes. The search result
from the content of a web page. 2) It may come from the                                       shown in Figure 1 has a title of “Social Network Analysis”.
HTML meta description. The meta description is embed-                                         This page is dedicated to the analysis of social networks.
ded in the HTML of a web page. It is not displayed to                                         Closeness centrality as is shown in the summary is clearly
users visiting a web site, but is usually a general descrip-                                  mentioned. One also finds an example description of close-
                                                                                              ness centrality in a social network. One may expect that
                                                                                              this page contains a lengthy description of closeness central-
Permission to make digital or hard copies of all or part of this work for                     ity followed by this example. However, clicking through to
personal or classroom use is granted without fee provided that copies are                     the result page leads to Figure 2. The web page does dis-
Permission
not made ortodistributed
                 make digital    or hard
                           for profit    or copies   of all advantage
                                             commercial       or part of and
                                                                           thisthat
                                                                                work  for
                                                                                    copies    cuss social network analysis as would be expected by the
personal
bear this or  classroom
           notice and theuse
                           fulliscitation
                                   grantedonwithout
                                                the firstfee  provided
                                                           page.  To copythatotherwise,
                                                                               copies areto   title, but there is only a single paragraph on closeness cen-
not made ortodistributed
republish,       post on for    profitoror to
                           servers          commercial
                                                redistributeadvantage
                                                                to lists,and  that copies
                                                                           requires  prior    trality. This single paragraph only describes a brief example
bear this permission
specific  notice and the  full acitation
                       and/or     fee.Thisonpaper
                                               the first
                                                     waspage.    To copy
                                                            presented  at otherwise, to       barely longer than the text summary given by the search re-
republish,
Very LargetoData
               post on servers
                    Search       or to redistribute
                            (VLDS)     2011.           to lists, requires prior specific
permission   and/or a fee.                                                                    sult. This web page did not meet the previously detailed
Copyright 2011.                                                                               expectations and would lead to disappointment on the part
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
                                                                  that highlighted keywords both within a single document
                                                                  and document lists. Each of the keywords was given a score
                                                                  with a matching bar showing the strength of that score vi-
                                                                  sually.
                                                                    Kaugars [7] used thumbnails and zoomed views to show
                                                                  keywords in context for a number of documents. Initially all
                                                                  search results are displayed as web page thumbnails, with
                                                                  keyword locations highlighted. A user may zoom to a level
                                                                  where keywords are shown in context and other paragraphs
                                                                  are compressed. Users may again zoom in again to view the
                                                                  full, scrollable contents of a document.
                                                                    Hemmje et al. [6] presented Lyberworld, which displayed
                                                                  documents in a three dimensional sphere with keywords shown
                                                                  at the edge of the sphere. Documents were presented closest
                                                                  to the keywords contained in those documents.
                                                                    Keyword highlighting has improved information retrieval
                                                                  result scanning for more than 25 years [8]. Highlighting
                                                                  has proved useful in several interfaces developed since that
Figure 2: Web page at http://www.orgnet.com/sna.                  time [1, 2]. However, no other research to the best of our
html (5/11/2011).                                                 knowledge has proposed the use of color depth or warning
                                                                  colors within summary text to provide additional context.

of the searcher.                                                  3.    COLOR-CODED RECLOSE SUMMARIES
   The user in the previous search example would be aided
by the two main features of color-coded ReClose summaries.          The goal of color-coded ReClose summaries is to increase
First, keywords are highlighted with color depth to provide       the accuracy (precision) with which users click on search re-
global context rather than just the local context of one or       sults to find relevant documents. Increasing accuracy will in
two sentences surrounding a keyword. This “global context”        turn lead to fewer disappointments and a better user experi-
refers to the extent of discussion on a web page containing       ence. Color-coded ReClose summaries aim to improve upon
the query topic. In the previous example, the user would          current search result summaries using three main parts. First,
have been aware before clicking that there were very few oc-      we build upon our previous work on text summary gener-
currences of the terms “closeness” and “centrality” by visual     ation approach called ReClose [12]. Second, we highlight
clues of color enhanced query keyword highlighting.               query keywords using variable shades of blue to show the
   Secondly, major departures from the main topics of a web       depth of usage of those query keywords on a web page.
search are flagged. If the main subject of a web page is          Third, we display in red terms central to the web page’s
different from the intent of the search user, then a topic term   topic which potentially differ from the topic of the keywords
is shown in red. This warns the user that the keywords may        searched for.
be peripheral to the main subject of the web page. Both           3.1    ReClose
color depth and topic word flagging are shown in this paper
to effectively improve user click precision and decrease user        The ReClose approach [12] combines two sentence rank-
disappointment. This in turn will improve the efficiency of       ings into a single summary with two parts. It combines the
the user and lead to better user experiences with the search      benefits of query-biased and query-independent summaries.
engine.                                                           Query-biased summaries show keywords in context focusing
                                                                  the summaries on content most relevant to search. Query-
                                                                  independent summaries provide an overview of a single doc-
2.   RELATED WORKS                                                ument.
   The highlighting of keywords has been used in a number of         Query-independent summarization is achieved using close-
settings where users scan documents or lists of documents.        ness centrality [3] of graph theory to rank sentences as repre-
Highlighting attracts the user’s attention to these keywords      sentative to the whole document. Closeness centrality ranks
using bolding, reverse video or coloring the background of        the centrality of nodes in a graph with the highest rank going
the text. In each case it has been shown to be useful to the      to the node with the smallest average distance to all other
scanning and examination of documents and document lists          nodes. Documents are converted to graphs by turning each
[8].                                                              sentence into a node, then comparing each sentence to each
   A number of useful approaches exist for highlighting key-      other sentence using word overlap.
words. Baudisch et al. [1] compresses highlighted documents          The second part of the ReClose approach involves learning
using Fishnet to a single screen for visual search. Byrd [2]      from the summary generation techniques of the top ranking
proposed the use of different colors for each keyword within      search engines, namely Google, Yahoo and Bing. To im-
a single document, which also was used to designate location      prove upon the query-biased summaries of current search
of keywords on the scrollbar by color.                            engines, we learn from the summaries generated by all three
   Veerasamy and Belkin [11] proposed a table of bar charts       top search engines. We generated training data by observing
to show term importance visually. Each row designated             which sentences were chosen by each of these search engines.
a single document, each column represented a word. The            We trained a linear regression model to score sentences to
words selected included both query terms and terms used for       match the sentence selection of Google, Yahoo and Bing. Af-
relevance feedback. Graham [5] presented Reader’s Helper          ter training, a new document is split into sentences and each
sentence is ranked by the linear regression model. The top                       Keywords
ranking sentences are chosen to represent the query-biased                    building              a database
portion.
  In this way we now have a two part summary taking                                              Title
advantage of both query-biased and query-independent ap-                                        database
                                                                                                                                                                            Summary
                                                                                                                    database
proaches to summary generation. Each portion of the sum-
mary is labeled so that users of the summaries are aware of
                                                                                               building database

                                                                                                                     building
                                                                                                                                                                   Select
the different intentions with each of the two text spans.
                                                                                          database

                                                                                                                 database
                                                                                                                                        database
                                                                                                                                                                   Color
                                                                                                                                             building


3.2    Color-Coded Keywords                                                                                                         Frequencies
                                                                               Web Page
   We color-code keywords to provide additional context about
the usage of keywords. The query-biased summaries of say
Google or Bing will provide one or two text spans generally      Figure 3: Process of color-coding query keywords.
that show one or two usages of the keywords searched. In
this way the context on a scale of say plus or minus ten words         Web Page
from the keywords are shown. Our color-coding of the key-                              Title                                      Top
words adds depth to each keyword just as colors can provide                                                                      Term
                                                                              term     term term          term                              Threshold
                                                                                                                                             Percent
terrain depth on a topographical map. Many topographical                                                                                                                     Summary
                                                                       term
                                                                               term     term       term      term



                                                                                                                                             Frequencies
                                                                       term
                                                                              term term term          term term

maps will provide a key that shows the elevation range of              term
                                                                                term       term term term
                                                                                                                      contains
                                                                                                                                     10%
the map and provide different colors for each subdivision of                   term term        term term




                                                                                                                                Other Search Results
                                                                                 term term


elevation. This “color-coding” provides users of these maps a
                                                                                                                         contains           Yes         contains       No
more intuitive view than simply a set of contour lines to un-                                                                       Title                          Title

derstand depth. Our depth refers to the frequency of query                                                                          JDBC


keywords on a web page. This gives a user a greater appre-
ciation for how long discussions involving the keywords may
be compared to other search results.
                                                                                       Figure 4: Process of flagging terms.
   The key used in our surveys is shown in “Select Color”
step of Figure 3. We count the frequency of each keyword
on a web page after the removal of stop words and use of         belongs to, whether the tail end 0-20 or the top end of 60+,
Porter stemming. Then for each possible frequency between        which is where the real value is had.
zero and 63 a different shade of blue is used. (A keyword
may be contained in a summary and not on a web page              3.3          Flagged Words
if it is contained in the meta description but not the web          The goal of the flagging module is to visually differenti-
page’s content). A diagram of color-coding query keywords        ate web pages in which the search keywords are the main
is shown in Figure 3. Now summaries of web pages that talk       topic from those web pages where the search keywords are
at great lengths about say “canines” will be distinguishable     peripheral to the main topic of the page.
from a web page that has very little text which mentions            We assume that the most frequent term(s) in a document
“canines”.                                                       is central to the main topic of a document. We are not
                                                                 concerned with presenting to the user the exact topic of a
                                                                 document, but instead are intent upon finding the depar-
   Table 1: Colors used to create the color scale.
                                   RGB Values                    tures of document topics from the searched topic. Generally
     Color Names     Frequencies R     G      B                  only a single term is considered for flagging to limit the in-
     Duke blue           63        0   26    87                  formation overload of the user. A single term should allow a
     Egyptian blue       30       16   52 166                    user to discern the potential topic of a document in addition
     deep sky blue        0        0 191 255                     to the summary text.
                                                                    We have designed an algorithm to determine if we should
                                                                 flag any terms within a document summary. Often due to
   The exact colors used are in Table 1. We chose to use a       the nature of search the most frequent term in a document
light blue (deep sky blue) for the smallest frequency value      is one of the keywords. These terms should not be flagged.
of zero. Then to make the range between 0 and 30 more            Additionally, many terms belong to the same topic as the
pronounced we chose an intermediate, but fairly dark blue        query keywords and should not be flagged. Our algorithm
(Egyptian blue) at a frequency of 30. A dark blue (Duke          does not flag terms highly related to the queried topic. The
blue) was used for a frequency of 63+ which was still dis-       steps in our algorithm are diagrammed in Figure 4 and are
tinguishable from regular text in black. To calculate the        outlined below:
RGB values for frequencies in between these specific values,        1. Determine the most frequent term in a document.
one divides the difference in color values by the number of         2. Obtain a count of the top ranking documents also in-
different frequencies.                                                 cluding this top term.
   It is unlikely that most users will be able to know exactly      3. Threshold the percentage of documents containing the
what color represents which frequency, but it will be obvious          top term.
which summaries contain more frequent keywords. For ex-             The algorithm begins by first determining the most fre-
ample in the summary in Figure 3 the keyword “database” is       quent term in a document (step 1). This involves count-
more frequent in the document than the keyword “building”.       ing term usage within a document after the removal of stop
It will also be obvious which end of the scale each keyword      words.
   Once we have determined the most frequent term in a




                                                                                    80
                                                                                          Skipped
document, we then consider all other top ranking documents                                Clicked
returned for the search (step 2). In our case we used the top                               Irrelevant              Relevant




                                                                                    60
28 documents (not including the current document), since
this is the maximum number of documents returned through




                                                                            Count
                                                                                    40
Google’s Web Search API (http://code.google.com/apis/
websearch/).




                                                                                    20
   The percentage of top ranking documents for the current
search containing the most frequent term is then thresh-
olded (step 3). We used a threshold of 60%. Terms that




                                                                                    0
                                                                                         None       Sent.   Para.    Pages     Book
occur in more than half of the top documents for a search
                                                                                                    Relevance Expectations
generally are highly related to the search terms. As an ex-
ample consider the terms by percentage for the query al-
gorithms. Terms above the 60% threshold include: “algo-           Figure 5: Distribution of expected relevant content
rithms” at 100%, “computer” at 80% and “number” at 60%            divided by clicked and skipped documents.
which are all related to algorithms. Examples of terms be-
low the threshold are “privacy”, “course”, “heap” and “2007”
with only “heap” being a term associated with algorithms.         expected.
Terms found in 60% of documents are both rare and highly            Second, users were provided links to each destination page
related.                                                          and viewed these pages one at a time. A user marked down
   Terms that do not meet the threshold will be displayed         the actual amount of relevant content using the same options
in the summary colored red. For example see the summary           presented for expectations. In this way rather than finding
in Figure 4 where the term “JDBC” is flagged. JDBC refers         out if a user believes a page is relevant or not to their search,
to one method in Java for connecting to databases. It is          we can also monitor lesser disappointments, such as a user
distantly related to the query building a database, but clearly   expecting to find pages and pages of relevant content but in
shows that this particular document is less focused on the        actuality only finding a couple of sentences. In this case the
building of the database, and more focused on Java related        document is still relevant, but the user is likely not satisfied
issues.                                                           with the results.
   After we have determined that a term should be flagged           Survey participants were shown 5 summaries per summary
for a particular summary, we must ensure that the flagged         type for a total of 15 summaries.
term is included in the summary. To accomplish this we
filter the query-independent sentence ranking to only include     4.2    Summary Data
sentences including the flagged terms. This ensures that the         Survey participants were randomly assigned three queries
flagged term will appear in at least one sentence included in     out of a pool of 15 queries. These queries were chapter titles
the summary.                                                      and project titles from an introductory course in computer
                                                                  science so that all query topics were familiar to the survey
4.    EXPERIMENTAL RESULTS                                        participants. Some example queries were logic gates and
   We hypothesize that color-coding ReClose generated sum-        creating a web page.
maries that users will have more accurate expectations of            For each of the 15 queries, 28 search results were obtained
the web pages summarized. To test this we created a survey        from Google. We downloaded each linked web page in the
that allow us to compare the accuracy of user expectations        search results resulting in 400 successfully downloaded and
based on summaries. We mainly compare color-coded Re-             parsed web pages out of 420 possible. We only used 5 search
Close summaries against Google summaries. We addition-            results per query. To decide which search results to use,
ally compare ReClose summaries with and without color-            we randomly selected web pages from two pools. The first
coding to ensure that the color-coding made a difference,         pool was likely to have search results with flagged summaries
and that text selection alone was not the main cause for          because when the frequencies of terms in a document was
improvement.                                                      ranked the query keywords had a low rank. The second pool
                                                                  contained the top 5 search results as ranked by Google.
4.1    Survey Participants and Survey Design                         After determining the pool of search results most likely to
  For our survey we recruited 21 volunteers among under-          be flagged and the top Google search results, randomly we
graduate and graduate students in the Computer Engineer-          select 2-4 results from the pool of results likely to be flagged.
ing and Computer Science department at the University of          Then the remaining results are taken starting starting with
Louisville. Surveys were conducted exclusively online.            the top ranked Google result from the second pool.
  The summary analysis was broken down into two parts
and repeated for each of the three summary techniques un-         4.3    Results and Discussion
der comparison. First a user would be shown 5 summaries              First we verify the relationship between user click behavior
for a randomly selected query. For each summary a user            and the relevance markings. Figure 5 shows the distribution
would mark if they would click on that summary. Then they         of expected relevance for search results clicked and skipped.
would mark the amount of relevant content expected. The           This figure shows that no user would click on a result if
choices available were “None”, “Sentences”, “Paragraphs”,         they expected no relevant content. If a user expected only
“Pages” or “Book”. Rather than just obtaining which results       a sentence or two of relevant data, users were unlikely to
a user would click on, we obtain a finer grained understand-      click (72% or 64/89). A natural division emerges from the
ing of the process through how much relevant content a user       expectation results. Users expecting “Sentences” or “None”
would skip the result 82% (116/141) of the time, leading us      coded ReClose summaries than either Google (66%) or Re-
to call this section “irrelevant”. The other half of the rele-   Close summaries highlighting with bold (75%). When users
vant spectrum we labeled “relevant”. Users clicked through       used Google summaries they clicked through to relevant web
84% (146/174) of the time when expecting “Paragraphs” or         pages only about 2/3 of the time that they clicked. With
more of relevant information. Performing a χ2 test on the        more precise clicks, users using color-coded ReClose sum-
count data revealed by this dividing line resulted in χ2 value   maries also clicked on more of the relevant content having
of 134.8 and a p-value < 0.001, clearly showing a significant    a click recall score of 70%. Individuals using Google and
difference between these two groups. Click through and ex-       bolded ReClose summaries skipped more relevant content
pectation have a lot in common, but expectations provide         having recall scores of 60% and 64% respectively.
more insight into the mental process of the search users.           In practice a higher click precision will be more notice-
   The expectations of survey participants was fairly inaccu-    able to users. Users are aware of clicks to irrelevant content,
rate. Only 34% (108/315) of expectations matched exactly         experiencing disappointment. However, there is no form of
the actual relevant content of web pages. In another 34%         feedback for click recall. Users are not aware that they have
(108/315) of expectations resulted in actual content being       skipped over a relevant document. One of the main objec-
opposite of expectations in terms of the relevant/irrelevant     tives of color-coded ReClose summaries was to improve the
split mentioned earlier. For example there were 16 occur-        click precision for users. From the numbers in Table 3 it is
rences where a survey participant marked a relevant expec-       clear that color-coded ReClose summaries improve the pre-
tation of “Paragraphs” or higher only to find no relevant        cision of users, both over Google summaries and ReClose
content.                                                         summaries without color-coding. This leads to fewer disap-
   In our survey color-coded ReClose summaries achieved          pointments in practice.
a much lower percentage of disappointment at 23% than
Google summaries achieved at 34% as shown in Table 2.            4.4    Color-Coded Results and Discussion
Disappointment was recorded when the relevant content was
                                                                    We now consider the effectiveness of the two color-coding
lower than what their expectations. When we conduct a χ2
                                                                 features: color-coded keywords and flagged words. In this
test on the count data comparing Google and color-coded
                                                                 section comparisons are only made between bolded and color-
ReClose we obtain a χ2 value of 2.8 and a p-value of 0.09.
                                                                 coded ReClose summaries. You can be assured both outper-
This p-value does not fall below the usual threshold value
                                                                 formed Google summaries, but here the focus is just on the
of 0.05. However, there still remains an obvious difference
                                                                 added color-coding features. We first consider the color-
between the results of Google summaries and color-coded
                                                                 coded keywords. The scale we used allowed for usage count
ReClose summaries that would become more pronounced
                                                                 differentiation from 0-63. Summaries were not evenly dis-
with the additional survey participants.
                                                                 tributed across this range. Nearly half (49% or 37/75) of
                                                                 the summaries used had at most a keyword with 0-9 usages
                                                                 on the web page summarized. We would expect that users
Table 2: Disappointment counts and percentages for
                                                                 would have low expectations for summaries that at most
three summary techniques.
                                                                 contained keywords on the low end of the scale. Looking
    Summary     Disap -   Satisfied or   Total
                                                                 at the results, there was no perceived change in behavior
     Source    pointment   Surprised   Summaries
                                                                 for summaries containing low count query keywords (0-9) to
  Google       36 (34%)    69 (66%)       105
                                                                 medium count (10-59). Only in the case of high count query
  Color-Coded 24 (23%)     81 (77%)       105
                                                                 keywords (60+) was there a noticeable change in behavior.
                                                                    There were 13 summaries (17%) with at least one query
   We now look at the precision with which users chose to        keyword with a usage count of 60+. For these 13 sum-
click on a result. Considering that a majority of users did      maries, participants found the actual relevant content to be
not click when expectations were a couple sentences or less,     high. For example no matter the summary type, more than
we label all web page views with a few sentences or less of      50% of views led to actual relevant content in the “Pages”
relevant content as “irrelevant.” Survey participant mark-       level. This was rarely expected when using bolded ReClose
ing more than a few sentences worth of relevant content          summaries, see Table 4. Bolded ReClose summaries led to
are labeled as “relevant.” Dividing clicks into relevant and     23% of pages views in the “Pages” level expectations. The
irrelevant allows for us to calculate click precision. We de-    color-coded ReClose summaries more often led to higher ex-
fine click precision as the percentage of summary views with     pectations in line with the actual content. In 44% of views,
clicks that led to relevant web pages. Click recall is the       color-coded users identified an expectation in the “Pages”
percentage of relevant documents that were clicked. The re-      range. Color-coded ReClose summaries also led to the high-
sults of these calculations for each summary technique can       est actual relevant content as well at 67%. In the case of
be seen in Table 3.                                              high usage count keywords, color-coded ReClose summaries
   Table 3 shows that users clicked more often (61 times)        led to justifiably higher expectations.
and had a higher click precision (80%) when using color-            First we compare the effect that flagging had on expecta-
                                                                 tions which can be seen in Table 5 in the column marked
                                                                 “Expected Relevant”. In this table documents were bro-
                                                                 ken into two groups, documents that had terms flagged by
  Table 3: Click precision and recall comparison.                color-coded ReClose (rows marked “Flaggable”) and docu-
     Approach      Click Precision Click Recall                  ments that did not (rows marked “Not Flaggable”). When
     Google         66% (38/58)    60% (38/63)                   color-coded ReClose summaries had flagged terms, the ex-
     ReClose        75% (39/52)    64% (39/61)                   pectations were much lower (29% expected to be relevant)
     Color-Coded    80% (49/61)    70% (49/70)                   than those same summaries without color-coding (40% ex-
                                                                   lighting techniques of color-coding keywords and flagging di-
Table 4: Expected and actual relevant content for                  vergent topic terms both were effective. Color-coding sum-
web pages with a query keyword count of 60+.                       maries is an effective way to enhance the summary informa-
                      ≤Para     Pages    Book
                                                                   tion to users without increasing the screen space. We plan
           Expected 10 77%     3 23% 0 0%                          on making further improvements to the selection algorithm
  ReClose
           Actual     4 31%    8 62% 1 8%                          for flagged terms. We also plan to enhance summaries with
  Color-   Expected 10 56%     8 44% 0 0%                          the use of multimedia.
  Coded    Actual     5 28% 12 67% 1 6%
                                                                   6.   REFERENCES
pected to be relevant). A similar pattern was found for             [1] P. Baudisch, B. Lee, and L. Hanna. Fishnet, a fisheye
color-coded summaries without flagged terms having higher               web browser with search term popouts: a comparative
expectations. This shows that the flagging of terms directly            evaluation with overview and linear view. In
affected the expectations of the user.                                  Proceedings of the working conference on Advanced
                                                                        visual interfaces, AVI ’04, pages 133–140, New York,
                                                                        NY, USA, 2004. ACM.
Table 5: Expected and actual relevant content for                   [2] D. Byrd. A scrollbar-based visualization for document
documents that would (Flaggable) and would not                          navigation. In Proceedings of the fourth ACM
(Not Flaggable) have summaries with flagged terms.                      conference on Digital libraries, DL ’99, pages 122–129,
                  Expected      Actual    Click                         New York, NY, USA, 1999. ACM.
                   Relevant    Relevant   Prec.                     [3] L. C. Freeman. Centrality in social networks
             F   21/53 (40%) 24/53 (45%) 70%                            conceptual clarification. Social Networks,
   ReClose
            NF 30/52 (58%) 37/52 (71%) 78%                              1(3):215–239, 1978-1979.
   Color-    F   15/52 (29%) 25/52 (48%) 57%                        [4] Google. Changing your site’s title and description in
   Coded    NF 44/53 (83%) 45/53 (85%) 87%                              search results. http://www.google.com/support/
                                                                        webmasters/bin/answer.py?hl=en&answer=35264.
   There is a much lower percentage of documents found to           [5] J. Graham. The reader’s helper: a personalized
be relevant that had flagged terms. Even in the case where              document reading environment. In Proceedings of the
flagged terms were not shown to users (bolded ReClose sum-              SIGCHI conference on Human factors in computing
maries), 45% of documents that could have been flagged                  systems: the CHI is the limit, CHI ’99, pages 481–488,
were found to be relevant compared to 71% of documents                  New York, NY, USA, 1999. ACM.
that would not have had flagged terms. What is interesting          [6] M. Hemmje, C. Kunkel, and A. Willett. Lyberworld -
is how flagging affects the click precision of users. Those that        a visualization user interface supporting fulltext
saw the flagged terms had a click precision of 57% on flagged           retrieval. In Proceedings of the 17th Annual
summaries compared to 70% that did not see the flagging for             International Conference on Research and
these same summaries. However, users expected more and                  Development in Information Retrieval, ACM SIGIR,
were more precise when color-coding was available and no                pages 249–259. Berlin: Springer, 1994.
flagged terms appeared in a summary achieving a click preci-        [7] K. Kaugars. Integrated multi scale text retrieval
sion of 87% compared to 78% without color-coding. Overall               visualization. In CHI 98 conference summary on
with far fewer clicks among flagged summaries, the over-                Human factors in computing systems, CHI ’98, pages
all click precision was higher for the color-coded version of           307–308, New York, NY, USA, 1998. ACM.
ReClose (see Table 3).                                              [8] G. Marchionini. Information seeking in electronic
                                                                        environments. Cambridge University Press, 1995.
5.   CONCLUSION                                                     [9] Microsoft. Anatomy of a bing caption.
  In this paper we outline color-coded ReClose summaries.               http://www.bing.com/community/site_blogs/b/
Web-based summaries were visually enhanced using two tech-              webmaster/archive/2010/10/25/
niques. The first technique was to provide global context               anatomy-of-a-bing-caption.aspx.
for the query keywords, by using varying color to highlight        [10] L. Schamber. Relevance and information behavior.
these keywords. The second technique highlighted in red                 Annual review of information science and technology
terms that topically differed from the topics of a query. This          (ARIST), 29:3–48, 1994.
provided a warning mechanism to aid users avoid clicking           [11] A. Veerasamy and N. J. Belkin. Evaluation of a tool
through to results less likely to be relevant. We hypoth-               for visualization of information retrieval results. In
esized that color-coded ReClose summaries would increase                Proceedings of the 19th Annual International
the accuracy of user click decisions, thus reducing disap-              Conference on Research and Development in
pointments and improving user experiences.                              Information Retrieval, SIGIR, pages 85–92, 1996.
  Survey results showed that color-coded ReClose summaries         [12] B. Wenerstrom and M. Kantardzic. ReClose: Web
(80%) led to an improvement in user click precision over                page summarization combining summary techniques.
Google summaries (66%). This in turn led to color-coded                 2011. Accepted for publication in the International
ReClose summaries resulting in fewer disappointments (24)               Journal of Web Information Systems on 4/27/2011.
compared to Google summaries (36). Improved precision              [13] Yahoo! How to change a page title or description in
and decreased disappointment will result in a better user               yahoo! search results. http://help.yahoo.com/l/us/
experience.                                                             yahoo/search/indexing/indexing-11.html.
  A closer look at the survey results showed that both high-