<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Are Web User Comments Useful for Search?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wai Gen Yee</string-name>
          <email>waigen@ir.iit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew Yates</string-name>
          <email>ayates@iit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shizhu Liu</string-name>
          <email>sliu28@iit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ophir Frieder</string-name>
          <email>ophir@ir.iit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Illinois Institute of Technology Chicago</institution>
          ,
          <addr-line>IL 60616</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We consider the potential impact of comments on search accuracy in social Web sites. We characterize YouTube comments, showing that they have the potential to distinguish videos. Furthermore, we show how they could be incorporated into the index, yielding up to a 15% increase in search accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;search</kwd>
        <kwd>comments</kwd>
        <kwd>YouTube</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright © 2009 for the individual papers by the papers’ authors.
Copying permitted for private and academic purposes. Re-publication of
material from this volume requires permission by the copyright owners.
This volume is published by its editors.</p>
      <p>LSDS-IR Workshop. July 2009. Boston, USA.
a video returned the corresponding video.</p>
      <p>If content is poorly described by the title/description/keywords,
however, comment information may supplement or replace
traditional forms of search. The title/description/keywords of a
Westminster Kennel Show video, for example, may fail to
mention “dog” (not to mention particular dog breeds), and thus
not turn up in the results for “dog show.” Searching through
comment information will almost certainly solve this problem.
In this paper, we explore the “nature” of user comments and how
they may aid in search. Specifically, we analyze the term
distributions of user comments and attempt to apply this
information to improve search accuracy.</p>
      <p>The hazard associated with the use of comments to improve
search accuracy is that they may contain noisy terms that hurt
performance as well as significantly increase the size of the index.
Our experimental results, however, suggest that while some
queries are negatively affected by comments, overall, they can
improve query accuracy by nearly 15%. Furthermore, we apply
techniques that can reduce the cost of using comments by up to
70%.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ANALYSIS OF THE YOUTUBE DATA</title>
      <p>
        We crawled YouTube during February, 2009 and collected the
text associated with the 500 most popular and 3,500 random
videos. Popular videos were identified using the YouTube API.
Random videos were retrieved by randomly selecting results of
queries consisting of terms selected randomly from the SCOWL
English word list [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For each video, we retrieved several
information “fields,” including:
• Title – A title assigned to the video by the user who posted it.
• Description – A video description by the user who posted it.
• Keywords – Video “tags” by the user who posted it.
• Comments – Comments by viewers of the video.
      </p>
      <p>In total, for the 4,000 videos, we retrieved over 1 million
comments made by over 600,000 users. We refer to the random
3,500 videos and the popular 500 videos together as the “small”
data set.</p>
      <p>
        We also similarly created a “large” data set, also consisting of a
random and a popular part, crawled in May, 2009. This data set
consists of 10,000 randomly crawled videos and 1,500 popular
videos. The four data sets are thus:
•
rand3500: This data set contains data on 3,500 videos,
randomly crawled from YouTube in February, 2009. This
•
•
•
data was found on YouTube by issuing random queries from
the SCOWL word list [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
pop500: This data set contains data on the 500 most popular
videos according to YouTube as of February, 2009.
rand10K: This data set contains data on 10,000 videos
randomly crawled from YouTube (is the same way that
rand3500 was collected) in May, 2009.
pop1500: This data set contains data on the 1,500 most
popular videos according to YouTube as of May, 2009.
In our experiments, we pre-processed the data using the Porter
stemming algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We also tried a more conservative
stemming algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in anticipation of problems with
overstemming from the unique language usage found in video
comments. However, the different stemmer had little effect on the
final results. We also remove stop words using the Lucene stop
word list.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Basic Statistics</title>
      <p>As shown in Table 1a, popular videos have more than 3 times the
number of viewers than do random videos and more than 6 times
the number of comments. Comment length for both types of
videos is about 12 to 15 terms. On average, there are 2,280 terms
describing a video from the rand3500 data set and 12,132 terms
describing a video in the pop500 data set.. In the large data set,
there is an even greater disparity between the random and popular
videos, with more viewers and more comments.</p>
      <p>The length statistics of the title, description and keyword fields,
shown in Table 2, indicate that on average only 34 to 58 terms are
used to describe a (random) video (assuming that comments are
not used to describe videos). Including the comment field in the
search returns a potential richer database of information because
the average number of comment terms is at least 1,485.</p>
    </sec>
    <sec id="sec-4">
      <title>3. MEASURING INFORMATION</title>
    </sec>
    <sec id="sec-5">
      <title>CONTENT</title>
      <p>
        As demonstrated in opinion-mining applications, many comments
often describe something’s “quality,” rather than its “content”
(e.g., how good a product is rather than what the product is) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
If we assume that quality-based comments come largely from a
restricted vocabulary (i.e., adjectives, such as “good” or “bad”),
then comments will have only a limited ability to distinguish one
video from another apart from the subjective impression it left on
the viewer. Specifically, comments from different videos in this
case will have similar term distributions and therefore have poor
discriminating power from the perspective of a search system.
Furthermore, because queries generally contain content-based
terms, they do not “match” the quality-based terms in the
comments. In other words, comments contain little information
useful to search.
      </p>
      <p>
        To measure the discriminating power of each field, we compute
each field’s language model and then compute the average
KLdivergence [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] of the individual field values to its corresponding
language model. This metric is one way of identifying the
potential of the field to distinguish one video from others in a
search system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The results shown in Table 3 confirm that the comment field is
generally the least discriminating based on KL-divergence. For
the most part, the title and the keyword fields are the most
discriminating.</p>
    </sec>
    <sec id="sec-6">
      <title>4. DISTILLING INFORMATION</title>
    </sec>
    <sec id="sec-7">
      <title>CONTENT FROM COMMENTS</title>
      <p>A consideration of the relative length of the average comment
field explains its low KL-divergence. Intuitively, as a document
(i.e., the comment field) gets longer, its divergence from the
“background” language model decreases. (In separate
experiments – not shown – we verified this phenomenon on the
comment field and on the WT10G Web corpus.) In other words,
the comment field becomes the language model if its size relative
to the other fields is great enough.</p>
      <p>We contend that, as a document gets longer, however, it will
contain more discriminating information – as well as less
discriminating information. To verify this, we identify the terms
“most associated” with the comment field and see if these terms
are unique to the field. We do this by pruning all but the “top
terms” of each video’s comment field and compare these terms to
the background language model. We identify top terms with a
variation of TF-IDF score (where TF measures the number of
times a term appears in the video’s comment field and IDF
measures the number of videos’ comment fields in which the term
appears, as analogous to the typical definition of TF-IDF). We
consider the top 68 unique terms to make the number comparable
to that which is typically available in the title, description and
keyword fields, combined. (Recall our discussion on the results
shown in Table 2.)
As shown in Figure 1, the KL-divergence of the top 68 comment
terms increases quickly with the number of comment terms. The
KL-divergence stabilizes at approximately 7.6 when the number of
comment terms reaches 250 (when most of the terms are unique to
the comment). This KL-divergence exceeds that of all the other
fields (Table 3), indicating its potential in discriminating videos.
This result shows that longer comment fields contain more
discriminating information. However, it is also likely that the rate
of discriminating terms in comment fields decreases with
comment length. Therefore, while we claim that longer comment
fields contain more discriminating information, the rate at which
we yield this information should decrease as the comment field
gets longer. In any case, the long comment fields are more
discriminating than the other fields.</p>
      <p>Note that we only consider comment fields with at least 100
terms. With fewer terms, the comments often lacked 68 unique
terms, making their KL-divergences as a function of length
unstable, obscuring the results. Also, experiments with different
numbers of top terms yielded similar, predictable results.</p>
    </sec>
    <sec id="sec-8">
      <title>4.1 Potential Impact of Comments on Query</title>
    </sec>
    <sec id="sec-9">
      <title>Accuracy</title>
      <p>
        To estimate the potential of using comment terms to improve
search accuracy, we use a technique described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that
effectively identifies the terms that are most likely to occur in a
query that retrieves a given document. For each video, we extract
the top N of these terms and calculate their overlap with the
various video information fields. Note that the overlap is not
necessarily disjoint, so the overlap percentages may exceed 100%.
The results in Table 4 show that most of these terms come from
the comments. Of course, the comment field contains many more
terms than the other fields, so the overlap will be greater. (For
example, the title field’s overlap is limited because titles generally
contain fewer than 30 terms.) But the point is that it is exactly the
size of the comment field that is the source of its potential.
Although it contains many meaningless terms, it also contains a
lion’s share of the top terms. This suggests including comment
terms in queries can improve search accuracy.
      </p>
    </sec>
    <sec id="sec-10">
      <title>4.2 Waiting for Comments</title>
      <p>One of the problems with using comments in search is that they
take time for users to generate. In the results discussed in Section
4, we need about 250 comment terms before the KL-divergence
stabilizes. If we assume each comment is 13 terms long, then we
would need about 20 comments to yield 250 terms.</p>
      <p>Based on our data, popular and random videos receive
approximately 20 and 1.4 comments per day, respectively.
Therefore, popular videos collect enough comments in one day
and random videos require about 2 weeks to yield enough terms to
be useful for search. In Figure 2, we show the number of
comments for the data set as a function of time. Popular videos
are commented at a higher rate as expected, but both types of
videos have a consistent increase in the number of comments.</p>
    </sec>
    <sec id="sec-11">
      <title>5. EXPERIMENTAL RESULTS</title>
    </sec>
    <sec id="sec-12">
      <title>5.1 Data Set and Metrics</title>
      <p>We use data sets mentioned in Section 1 for our experiments. We
simulate user queries by removing the keyword field from the
video data set and using them to generate known-item queries.
From the keyword set, we generate queries in two ways:</p>
      <p>
        In the alternatives above, we use K values 2, 3, and 4 as these are
the most common query lengths in Web and P2P applications
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        We generate queries in this way because keywords are meant to
help users index content. Top-IDF queries are meant to simulate
users who generate very specific queries and their generation is
similar to the query generation techniques described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Random queries are appropriate if we assume that all keywords
are appropriate for queries.
      </p>
      <p>
        Note that the choice of using the keyword field to create queries is
somewhat arbitrary. Recent work shows that the terms used as
keywords do not necessarily match those used in user queries
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For example, people tagging music would use the terms
associated with genre, such as “pop,” whereas people generally do
not search for music via genre – title and artist are more likely in
cases of known-item search. In future work, we will consider
queries generated by other techniques described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Because we use keywords to generate queries, we also strip the
keywords from the data set that we index. If we did not do this,
then the baseline query accuracy would be so high – given our
experimental setup – that we would not be able to reasonably run
experiments that would show any meaningful positive change.
One might worry that stripping keywords from the data set will
result in an artificially low baseline for performance because
keywords are expected to match queries very precisely. However,
referring again to the results from [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], keywords do not
necessarily match query behavior. Furthermore, the title and
      </p>
      <p>In any case, our goal is to show whether the addition of comments
can improve query performance over not using them. We could
have therefore generated queries from any field provided that we
remove that field from the data that is indexed. A positive result,
therefore, would suggest that indexing comments in addition to all
of the other fields is beneficial to query accuracy.</p>
      <p>Because we are assuming known-item search, we use MRR as our
main performance metric, defined as the average reciprocal rank
of the desired result over all queries:</p>
      <p>MRR =
1 ∑NQ 1</p>
      <p>NQ i=1 ri
In the expression above, NQ is the number of queries issued and ri
is the rank of the known item in the result set of query i. MRR is a
metric that ranges from 0 to 1, where MRR = 1 indicates ideal
ranking accuracy.</p>
      <p>
        The data are indexed in the Terrier search engine [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Each of the
videos is searched for via their respective queries.
      </p>
    </sec>
    <sec id="sec-13">
      <title>5.2 Basic Results</title>
      <p>In our first experiment, we test the impact of indexing queries.
We issue the queries twice. First, we issue the queries on an
index that contains the title and description (a setup that we refer
to as DT) of each video, but not the comments. Second, we issue
the queries on an index that contains the title, description, and
comments (a setup that we refer to as DTC) of each video.
In Table 6, we show query performance for different query
lengths when the index does not and does contain comments. The
results show that there is a consistent difference in MRR in the
range of about 1% to 2% when using the comments on the
rand3500 data set.</p>
      <p>We search for the source of the performance improvement by
dividing the results of each query into buckets based on the
impact that the comment field has on the query results and then
search for correlations between the change in MRR and “features”
of the video data to which the query corresponds. The goal is to
find some correlation between MRR improvement and a video
feature. We considered several features of the video data,
including, the lengths of the various fields in terms of the number
of unique terms and the similarities between the fields.
A subset of our correlation analysis is shown in Table 5. Each
bucket corresponds to a 0.25 point difference in MRR. We see
that approximately 450 videos have their MRRs improved and
about the same number have their MRRs worsened. Most videos
(2,576 or about 75%) are not affected by the addition of
comments.</p>
      <p>We see that the length of the title and description fields have little
impact on MRR. There is no clear correlation between them and
change in MRR.</p>
      <p>On the other hand, both the length of the comment field and the
similarity between the comment and keyword fields are correlated
with MRR change. Note that the similarity between the comment
and keyword field is measured by how much the comment field
covers the keyword field:</p>
      <p>C ∩ K</p>
      <p>K
The coefficient of correlation between the similarity of the
comment and keyword fields and the change in MRR is 0.7589.
The coverage of the keyword field is also related to the length of
the comments. If we remove the comment length of the first row
of Table 5, then the coefficient of correlation between the change
in MRR and the length of the comment field is 0.7214. (With the
first row, the coefficient of correlation is 0.2964.) Finally, the
coefficient of correlation between the length of the comment field
and the similarity between the comment and keyword fields is
0.9351 without the first row of data and 0.7552 with the first row
of data.</p>
      <p>There is also a negative correlation between the similarity of the
title and description fields with the keyword field (|DT∩K| / |K|)
and MRR (-0.5177) and between |DT∩K| / |K| and |C∩K| / |K|
(0.8077). These results show that in the cases where titles and
descriptions do not contain enough information to match the
queries, then the long comment field is able to compensate. (We
observe, for example, some videos with non-English descriptions
and English keywords.)
The conclusion that we draw from these results is that comments
help:
•</p>
      <p>MRR improves when the comments contain keywords
(equivalently, query terms, since we generate queries from
the keywords).</p>
      <p>Pct
Change
20.14%
12.12%
8.99%
11.44%
7.48%
8.93%</p>
      <p>Longer comment fields are more likely to contain keywords.
So, despite all of the irrelevant terms contained in the comments –
particularly long comments – the existence of the relevant terms
helps.</p>
      <p>In our next experiments, we run the same test on the pop500 data
set. The results of this experiment show how comments affect the
search for videos that users actually want to find (i.e., popular
videos).</p>
      <p>An analysis of the MRR change table for rand10K (Table 11)
reveals that there is a again correlation between the length of the
comments and the change in MRR (0.7515), and the similarity
between the comment and keyword fields and the change in MRR
(0.7064). In this case, most of the videos (72%) are unaffected by
comments, however, while 13% have their MRRs worsened and
15% have their MRRs improved.</p>
      <p>We again see a correlation between the similarity between the
comment and keyword fields and the change in MRR. The
coefficient of correlation between these two variables is even
greater than that of the rand3500 data set: 0.8295 versus 0.7589.
The correlation between the length of the comment field and the
change in MRR is 0.7404 with the pop500 data set versus 0.7214
with the rand3500 data set.</p>
      <p>We summarize the performance results on the rand3500 and
pop500 data sets in Table 9. We see that comments are clearly
more effective on popular data. The change in MRR is greater and
the number of videos whose MRR improves is greater. This is
likely because of the similarity between the comment and
keyword fields.</p>
      <sec id="sec-13-1">
        <title>5.2.1 Results on Larger Data Sets</title>
        <p>To simplify our explication, in this section, we only report results
using the top-IDF-type queries. Also, as done above, if no query
length is specified, we use queries of length 3.</p>
        <p>Again, as shown in Table 12, the improvement in MRR with
popular data is greater than that with random data. With the
pop1500 data set, the percentage MRR improvement ranges from
3% to 7% compared with 3% for the rand10K data set. In this
case, 47% of the videos MRRs are unaffected by the comments,
23% are worsened, and 30% are improved.</p>
        <p>The coefficient of correlation between MRR change and comment
length is 0.6769 and the coefficient of correlation between MRR
change and similarity of comment and keyword fields is 0.9192.
Again, long comment fields are able to substitute for keywords in
search.</p>
        <p>
          The fact that MRR is better for popular data has been shown in
other work (e.g., [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). This is clearly due to the fact that popular
data have more comments. This result is significant as it shows
that increasing the number of comments does not only increase
the ability for videos to naively match queries, but also increases
the ability for queries to distinguish the relevant videos.
        </p>
        <sec id="sec-13-1-1">
          <title>Query Length 2 3</title>
          <p>4</p>
          <p>MRR
Improvement</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>5.3 Improving our Results</title>
      <p>Our next goal is to improve on our improvement-in-MRR results
based on our observations. If we detect a correlated feature, we
use the correlation in our indexing strategy.</p>
      <p>Our main observation is that as the length of the comments field
increases, so does its effectiveness in search. Therefore, we
should only index comments if they are above a certain length.
We also acknowledge that there is a correlation between the
change in MRR and the similarity between the comment and
keyword fields. However, as there is also a correlation between
comment length and similarity, we roughly cover both features by
considering just the comment length.</p>
      <p>Our first strategy is to index comments only if they are above a
given length threshold. We refer to this strategy as
“lengthselective indexing.” We show the experimental results in Table
14, where the threshold is in terms of number of terms (words).
The performance of length-selective indexing is negative. MRR
consistently decreases with increasing thresholds. The problem
with this strategy is that it creates a situation where certain videos
are too eager to match queries. In other words, videos that have
their comments indexed are ranked higher than other videos
compared with the base case regardless of whether they are
relevant to the query or not. Because videos are only relevant to a
single query (by definition of MRR), MRR must decrease with
this type of indexing.
500 -1.95%
This was not a problem in the case where all videos’ comments
were indexed because the “eager matching” problem is offset by
the fact that all videos (with comments) have additional terms
associated with them. We expect that the additional terms
contained in the comments are more likely to help match relevant
queries.</p>
      <sec id="sec-14-1">
        <title>5.3.1 Comment Pruning</title>
        <p>The problem with length-selective indexing is that it un-uniformly
adds “noise” to the description of videos making them match
irrelevant queries. If noise were applied uniformly to all videos,
then such a problem would be attenuated. The problem is that
noise still causes the incorrect matching of query to results.
This observation inspires a solution whereby we index each video
with its comments, but then prune away noise from the comments,
leaving only the most relevant terms in the description of each
video. This solution is expected to do two things:
1.
2.</p>
        <sec id="sec-14-1-1">
          <title>Reduce the irrelevant matches of a query, and</title>
        </sec>
        <sec id="sec-14-1-2">
          <title>Decrease the size of the index.</title>
          <p>
            The technique we use to prune the comment field is that which
was proposed to shrink indices in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], known as document-centric
pruning. With document-centric pruning, each term in each
document is ranked based on its contribution to the
KLdivergence [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] of the document to the background language
model. The lower-ranked terms are removed from the documents
before they are indexed. This technique was shown to be able to
shrink the index by up to 90% with little loss in precision.
In these experiments, we prune a percentage of the comments of
each video. We assume that there is a “fixed rate” at which terms
that are useful to search accuracy appear in the comments. If this
rate is r, then a comment field of length len(C) will have rlen(C)
useful terms. If we pick a pruning rate of r, then all of the terms
left in the comment field will be useful.
          </p>
          <p>In Figure 3, we see the effect that comment pruning has on MRR.
The data on the left of the figure corresponds to a complete
comment fields, whereas the data on the right corresponds to no
comments. We see that pruning initially increases the MRR for
all data sets. MRR then drops dramatically as the comment field
size decreases to zero.</p>
          <p>The effect of pruning is more pronounced for the popular data sets
than for the random data sets. With the random data set, the
maximum MRR percentage increase is about 0.7% (60% pruning
on the rand10K data set), while with the popular data set, the
maximum MRR percentage increase is 2.4% (50% pruning with
the pop500 data set).</p>
          <p>The reason for this is that the random data sets’ comment fields
contain so few comments in the first place. They are therefore
less likely to contain terms that make eagerly match irrelevant
results. Second of all, the MRR improvement with using
comments with random videos is low in the first place, suggesting
the marginal impact that such comments have. We do not expect
there to be much of an increase in performance with pruning.
Based on these results, a pruning rate of 50% is reasonable
choice. We are able to eliminate half of the index overhead
introduced by the comments and are safe from losing MRR
performance. MRR starts to decrease first with the rand3500 data
set with 70% pruning.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>6. RELATED WORK</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the authors consider the impact that indexing blog
comments have on query recall. Their conclusion is that recall is
boosted by the comments, that they are useful. This result is
expected, but little consideration was given to the precision of the
results.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], it was shown the comments do have discriminating
power. The authors clustered Blog data and by using high
weights for comments, were able to improve the purity and
decrease the entropy of their clusters significantly.
      </p>
      <p>
        Much of the work on “social” Web sites – where users are free to
modify the metadata associated with shared data – focus on “tag”
analysis, where a tag is a keyword that a user can associate with
data to, say, make it easier to index. Findings related to tag
analysis are they indicate data popularity and are useful in
describing content [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ][
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This is somewhat
orthogonal to our goal of determining if casual user comments
can help improve search accuracy.
0.6775
0.6872
0.7068
0.6612
      </p>
    </sec>
    <sec id="sec-16">
      <title>7. CONCLUSION</title>
      <p>Our results show that comments indeed improve the quality of
search compared with just using titles and descriptions to describe
videos. They are particularly useful with popular videos, where
the MRR is lower than with random videos (Table 15).
This result is not a given, however, as some queries actually do
worse with comments. The reason for these cases of decreased
accuracy is that the videos with fewer comments become “buried”
by those with more comments in search results.</p>
      <p>The problem of skew in result sets toward videos with larger
comment fields can be addressed by well-known index pruning
techniques – which also shrink the size of the index. Index
pruning technique work by removing terms deemed less
distinguishing or relevant to the particular “document.” Applying
index pruning to the comments further improves accuracy by up
to about 2% (with a decrease in index size of up to 70%).
Overall, accuracy improved by up to about 15% as shown in
Table 15.</p>
      <p>Our ongoing work includes further analyses and characterizations
of comment terms and their impact on search accuracy. For
example, our observation that comments work best when they
contain query terms (Section 5.2) and when the title and
description fields do not may suggest that we should only index
comments when they are “different” than the title and description.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Jindal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>and</article-title>
          and
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B</given-names>
          </string-name>
          , “
          <article-title>Identifying Comparative Sentences in Text Documents,”</article-title>
          <source>In Proc. ACM SIGIR</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zhao</surname>
            <given-names>Y.</given-names>
          </string-name>
          , “
          <article-title>Tag-based Social Interest Discovery,”</article-title>
          <source>In Proc. WWW</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Heymann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , “
          <article-title>Can Social Bookmarks Improve Web Search?”</article-title>
          <source>In Proc. ACM Conf. Web Search and Data Mining (WSDM)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Buttcher</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C. L. A.</given-names>
          </string-name>
          ,
          <article-title>“A Document Centric Approach to Static Index Pruning in Text Retrieval Systems,”</article-title>
          <source>In Proc. ACM CIKM</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ponte</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W. B.,
          <article-title>“A language modeling approach to information retrieval,”</article-title>
          <source>In Proc. ACM SIGIR</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Terrier</given-names>
            <surname>Search Engine Web Page</surname>
          </string-name>
          . http://ir.dcs.gla.ac.uk/terrier/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Beitzel</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>E. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chowdhury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossman</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frieder</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          “
          <article-title>Hourly analysis of a very large topically categorized web query log</article-title>
          .”
          <source>In Proc. ACM SIGIR</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Yee</surname>
            ,
            <given-names>W. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>L. T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Frieder</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <article-title>“A View of the Data on P2P File-sharing Systems</article-title>
          .” In Jrnl.
          <source>Amer. Soc. of Inf. Sys. and Tech (JASIST)</source>
          , to appear.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Rijke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          “
          <article-title>Building Simulated Queries for Known-Item Topics: An Analysis Using Six European Languages</article-title>
          .”
          <source>In Proc. ACM SIGIR</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Porter</given-names>
            <surname>Stemming Web</surname>
          </string-name>
          <article-title>Site</article-title>
          . http://tartarus.org/~martin/PorterStemmer/
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jenkins</surname>
            ,
            <given-names>M.-C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , “
          <article-title>Conservative Stemming for Search and Indexing</article-title>
          .”
          <source>In Proc. ACM SIGIR</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Atkinson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>SCOWL Word</surname>
          </string-name>
          <article-title>List</article-title>
          . http://wordlist.sourceforge.net/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kullback</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , “Information theory and statistics.” John Wiley and Sons, NY,
          <year>1959</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Mishne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glance</surname>
          </string-name>
          , N. “
          <article-title>Leave a Reply: An Analysis of Weblog Comments</article-title>
          .”
          <source>In Third Workshop on the Weblogging Ecosystem</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          “
          <article-title>Optimizing Web Search Using Social Annotations</article-title>
          .”
          <source>In Proc. WWW</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zha</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          “
          <article-title>Exploring Social Annotations for Information Retrieval</article-title>
          .”
          <source>In Proc. WWW</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, J. “Enhancing Clustering Blog Documents by Utilizing Author/Reader Comments.”
          <source>In Proc. ACM Southeast Regional Conf.</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Dmitriev</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiron</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fontoura</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shekita</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          “
          <article-title>Using Annotations in Enterprise Search</article-title>
          .”
          <source>In Proc. WWW</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Heymann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutrika</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
          </string-name>
          , H..
          <article-title>”Can Social Bookmarks Improve Web Search?”</article-title>
          <source>In Proc. Int'l. Conf. on Web Search and Web Data Mining</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Bischoff</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Firan</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nejdl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paiu</surname>
          </string-name>
          , R. “
          <article-title>Can All Tags be Used for Search?”</article-title>
          <source>In Proc. ACM CIKM</source>
          ,
          <volume>200</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>