<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Topic Representations for Social Book Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marijn Koolen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hugo Huurdeman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <email>P@10</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Archives and Information Studies, Faculty of Humanities, University of Amsterdam</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ISLA, Faculty of Science, University of Amsterdam</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Logic, Language and Computation, University of Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we describe our participation in the INEX 2013 Social Book Search Track. We compare the impact of di erent query representations for book search topics derived from the LibraryThing discussion forums, including the title and full narrative provided by the topic creator, the name of the discussion group in which the topic was posted, and a mediated search query provided by a trained annotator. Our ndings are that 1) the mediated queries are short and do not improve performance over the titles, but combining titles and mediated queries does, 2) the discussion group name adds relevant new terms to the representation and further improves performance, but adding the narrative is not e ective, and 3) for the majority of topics retrieval e ectiveness is the same across all topic representations. Our ndings suggest that writing a good search query for the complex information needs in social book search is far from trivial, even for trained annotators.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>For the INEX 2013 Social Book Search Track we focused our attention on query
representations. The search topics in this track are based on discussion threads
from the LibraryThing (LT) discussion forums and contain both the title of the
topic threads, the narrative in the rst message of the thread and a mediated
query created by a trained annotator. The latter one is provided by the track
organisers to compensate for non-representative thread titles for some of the
forum topics.</p>
      <p>The topic statements of the SBS Track contain rich representations of the
book search information needs. The LT member who starts the topic thread
describes her information need both in the thread title and in detail in the rst
message of the thread. In addition, she choses a discussion group in which to
start the thread, which broadly categorises her information need, with the aim
to attract responses from LT members who are knowledgeable about relevant
books and can recommend the best ones.</p>
      <p>These di erent representations may each re ect di erent aspects of the
information need. In our participation we investigate how these representations
a ect retrieval. Speci cally, we want to know:
{ How di erent are the thread title and the mediated query and how does that
a ect retrieval performance?
{ What is the importance of the detailed narrative, that explains the
information need in detail, for representing the information need?
{ What is the role of the discussion group name in representing the information
need?</p>
      <p>In addition, we experiment with a document prior based on the book ratings
of LT members. We crawled a large set of user pro les from LT that includes
which book each member added to her catalogue and the rating she assigned
to it. The average rating of a book may re ect its overall quality, in which case
it could be used to push low quality and non-rated (and therefore unpopular)
books down the ranking.</p>
      <p>The paper is structured as follows. We rst discuss the di erent topic
representations that are available in this year's topic set in Section 2. Then, we
describe our experimental setup in Section 3 and discuss results in Section 4.
Next, in Section 5 we present a per-topic analysis. In Section 6, we discuss our
ndings and draw conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Topic Representations</title>
      <p>The topics for the SBS task are based on topic threads on the LT discussion
forums. Each thread starts with a message from the topic creator and is posted
in one of the thousands of discussion groups. The 2013 topic set only contains
topic threads that are started with a book search information need. The thread
has a title and the rst message can be seen as a narrative of the information
need. For instance, topic 25244 has title Why Republic vs. Democracy and is
posted in the Political Conservatives discussion group. The narrative explains
that the user wants to know more about forms of government and the logic
behind choosing one or the other.</p>
      <p>What is a good topic representation to use as a search query? The title is often
a concise summary of the information need, but is not always comprehensive,
especially for very detailed needs. Sometimes titles are conversational and reveal
nothing about the topic of the information need, such as topic 45940 with title
Request for recommendations.... The narrative explains the user is looking for
books about the miracles of Jesus that are not based on the Bible. The title
is a bad representation of the information need, while the narrative contains
much more than just the information need. Because these titles and narrative
are not intended as search engine queries, this year the task organisers provided
a mediated search query with each topic, created by a trained annotator. This
query is meant to be both concise and comprehensive.</p>
      <p>We want to investigate the value of this mediated query with respect to the
thread title and the narrative. Does it provide a better representation than the
thread title? Does it cover all the ne-grained aspects expressed in the narrative?
And what is the role of the Group name of the discussion group that the user
selected? This group broadly categorises the information need with, we assume,
the aim to nd LT members who are knowledgeable on books about the subject.
But it may also be useful as an additional representation of the topic.</p>
      <p>The topic set contains 386 topics and each topic has ve elds: title (T), query
(Q), group (G), member and narrative (N). We ignore the member eld, which
contains the name of the topic creator and is probably not useful for representing
the information need. To understand some of the di erences between these elds
as possible topic representations, we analyse them in terms of the number query
terms they contain.</p>
      <p>In Table 1 we see statistics on the number of query terms in (combinations
of) elds, based on the text in those elds after parsing, stopword removal and
Krovetz stemming. This processing corresponds to the way documents are
processed before indexing. Columns 3{7 show the total number of content terms
and columns 8{12 show the number of distinct terms. The title eld (T) has a
mean (median) of 3.90 (4) content terms. The number of distinct terms is very
similar, showing that content terms are rarely repeated in the title. There is one
topic, number 28304, which has zero content terms, for which the thread title is
Who am I? Why am I here?. This is a topic posted in the Amateur Historians
group asking about books on exploration. Apart from the title containing only
highly frequent words, it also does not re ect the information need at all. Here
the mediated query, exploration books, improves the query representation. The
query eld (Q) is in general somewhat shorter|the median is 3|but there is
always at least one content term. This poses the question whether the mediated
query is more comprehensive than the title, re ecting aspect from the narrative
not covered in the title. Again, terms in the eld are rarely repeated. The
combination of the T and Q elds results in an almost doubling of the number of
content terms. The number of distinct query terms is lower but still higher than
the number of distinct terms in either the title or query eld. This means that
many but not all of content terms in the title and query overlap. It is plausible
that the most relevant terms from the title are repeated in the query, which
results in higher term frequencies for the most important terms. This might be
bene cial for retrieval.</p>
      <p>Next, we add the group and narrative elds to the combined title and query
eld. The group adds only one or two terms on average, while the narrative
adds dozens of content terms, with some repeated terms. However, the narrative
usually contains some conversational language, with many content terms not
directly related to the information need. It is not clear to what extent the possibly
larger number of relevant content terms can increase performance and to what
extent its conversational distractor terms hurt performance.</p>
      <p>In Section 4 we discuss how these di erent elds a ect retrieval e ectiveness.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <p>
        We used Indri [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for indexing, removed stopwords and stemmed terms
using the Krovetz stemmer. Based on the results from the 2011 Social Search
for Best Books task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] we include all the social metadata. From the
Amazon/LibraryThing (A/LT) collection we use the booktitle, author name, subject
headings, LT tags and Amazon user reviews for indexing. In addition, we use
the Library of Congress Subject Headings (LCSH) from the catalogue records of
the British Library and the Library of Congress. These subject headings are less
noisy than the headings from Amazon, and there are more headings per book.
      </p>
      <p>The topics are taken from the LibraryThing discussion groups and contain a
title eld which contains the title of a topic thread, a group eld which contains
the discussion group name and a narrative eld which contains the rst message
from the topic thread. New this year is a mediated query eld, which is provided
by the organisers as an additional representation of the information need and is
meant to be a more precise expression of it than the thread title.</p>
      <p>In our experiments we used di erent combinations of topic elds as queries.
For the language model our baseline has default settings for Indri (Dirichlet
smoothing with = 2500). We created six base runs:
T : a standard LM run using only the Title eld of the topic.</p>
      <p>Q : a standard LM run using only the Query eld of the topic.</p>
      <p>TQ : a standard LM run using the Title and Query elds of the topic.
TQG : a standard LM run using the Title, Query and Group elds of the topic.
TQN : a standard LM run using the Title, Query and Narrative elds of the
topic.</p>
      <p>TQGN : a standard LM run using the Title, Query, Group and Narrative elds
of the topic.</p>
      <p>Last year we crawled a large set of user pro les from LT members and used
member catalogues and book ratings to rerank retrieval results based on
nearestneighbourhood recommendation. This year, we use the Bayesian average book
ratings as document priors. That is, books that received ratings from LT
members are boosted up the ranking with respect to books that received no ratings
and books with high ratings are boosted more than books with low ratings.</p>
      <p>To normalise the ratings, we compute the Bayesian average of all the book
ratings in the top 1000 results per topic. The Bayesian Average (BA) takes into
account how many users have rated a work. As more users rates the same work,
the average becomes more reliable and less sensitive to outliers. We make the BA
dependent on the query, such that the BA of a book is based on books related
to the query. The BA of a book b is computed as:</p>
      <p>BA(b) =
n^ m^ + X r</p>
      <p>r2R(b)
n + n^
where R(b) is the set of ratings for b m^ is the average unweighted rating over all
books in the top 1000 results and n^ is the average number of ratings over all the
books in the top 1000.</p>
      <p>A rating BA(b) for book b can range from 0.5 up to 5, with increments of
0.5. For books with no rating we use BA = 0. a base score of 1, for books with
ratings we use 1 + BA. Each rating can be turned into a prior probability by
dividing BA by the maximum rating BAmax = 5. For books with no rating this
would results in a prior probability of zero. To avoid multiplying by zero, we use
the Add-One smoothing method and compute the prior as:</p>
      <p>The nal document score is then:</p>
      <p>PBA(d) =
1 + BA(d)
1 + BAmax</p>
      <p>SBA(d) = P (djq) PBA(d)</p>
      <p>We submitted six runs:
inex13SBS.ti qu : the TQ run.
inex13SBS.ti qu gr na : the TQGN run.
inex13SBS.ti.bayes avg.LT rating : the T run with the Bayes LT rating
prior.
inex13SBS.qu.bayes avg.LT rating : the Q run with the Bayes LT rating
prior.
inex13SBS.ti qu.bayes avg.LT rating : the TQ run with the Bayes LT
rating prior.
inex13SBS.ti qu gr na.bayes avg.LT rating : the TQGN run with the Bayes</p>
      <p>LT rating prior.</p>
      <p>In the next section we discuss the evaluation results of the o cial submission
and separately all our own runs.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>We rst show the evaluation results over the whole topic set. Then we present a
per-topic analysis of the di erences in performance between the di erent topic
representations.
(1)
(2)
(3)
1 run3.all-plus-query.all-doc- elds
2 inex13SBS.ti qu gr na.bayes avg.LT rating
2 inex13SBS.ti qu.bayes avg.LT rating
4 run1.all-topic- elds.all-doc- elds
5 inex13SBS.ti qu gr na
6 inex13SBS.ti qu
7 run ss bsqstw stop words free member free 2013
8 run ss bsqstw stop words free 2013
8 inex13SBS.qu.bayes avg.LT rating
10 inex13SBS.ti.bayes avg.LT rating
mrr</p>
      <p>map
mrr %
map
%</p>
      <p>This year, eight groups participated in the track submitting a total of 32
runs. Our o cial submissions are all among the top 10 systems, as shown in
Table 2. The top four systems are close together in terms of performance, as are
the systems on ranks ve up to nine. Our systems perform on par with the best
other systems.</p>
      <p>We show the evaluation results of our own runs in Table 3. Signi cant di
erences are tested using the bootstrap method (one-tailed with 100,000 samples).
Signi cance levels are 0.05 ( ), 0.01 ( ) and 0.001 ( ). In the top half of the table
we see the base runs without Bayes Average ratings priors. Signi cance tests are
with respect to the title-only (T) run. Somewhat surprisingly, the title-only (T)
and query-only (Q) representations lead to similar performance. The mediated
query does not improve the representation of the information need. However,
the combination of title and mediated query (TQ) gives signi cantly better
performance than either in isolation. This re ects the fact that the query is not</p>
      <p>S(Q)
S(T Q)
S(T Q)</p>
      <p>S(T )</p>
      <p>S(T )</p>
      <p>S(Q)
S(T QG)
S(T QN )
S(T QGN )</p>
      <p>S(T Q)
S(T Q)</p>
      <p>S(T Q)
simply a copy of the thread title, but either adds complementary relevant terms
or gives more weight to the most relevant terms by repeating them, or both.</p>
      <p>Adding the group name to the title and query (TQG) further improves
performance, re ecting the users ability to pick relevant discussion groups for their
needs. However, adding the more detailed narrative hurts performance for early
precision (nDCG@10, P@10 and Mean Reciprocal Rank (MRR)) while
improving Mean Average Precision (MAP). It seems the narrative is not focused enough
to precisely pinpoint the suggested books but its larger set of query terms does
lead to better recall.</p>
      <p>In the bottom half of Table 3 we the six runs with Bayes Average rating
priors. Again, signi cant di erences are with respect to the title-only TBA. The
rating priors lead to improvements on all reported measures for all six baseline
runs. Among the runs with rating priors we see the same patterns as among the
baseline runs. The T and Q representations lead to similar performance but their
combination leads to better performance. The group name improves the topic
representation but the narrative hurts early precision while improving MAP. We
also tested the improvements of the prior ratings runs over their baseline forms
and found that all improvements are signi cant for p &lt; 0:001, except for the
TQGN run where the improvements are signi cant for p &lt; 0:05. This shows the
reliability of the rating priors.</p>
      <p>In sum, the title and query representations are equally e ective but
complementary to each other. The group name can further improve performance
while the narrative seems to add too many partly relevant and irrelevant terms.
The LT ratings, if normalised by taking the Bayesian average, forms a reliable
document prior probability of relevance.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Per-Topic Analysis</title>
      <p>
        We show the per topic di erences between two runs for ndcg@10 in Table 4.
The Q run has lower scores for 74 topics compared to the T run (column 2),
higher scores for 69 topics and the same scores as the T run for 237 topics.
These two runs are balanced, which explains why they lead to similar average
scores, but the large number of topics for which the two runs get the same score
suggests that in most cases the mediated query is very similar to the thread
title. It also suggests that creating an e ective representation of the information
need is far from trivial, even for trained annotators. Some of their mediated
queries improve upon thread titles that do not or only partly re ect the often
complex information needs in social book search [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. But even more mediated
queries express the search topic less well than the title created by the topic
creator. Next we compare the TQ with the T and Q runs (rows 3 and 4). These
are less balanced, with TQ outperforming T on 74 topics and Q on 76 topics
while T outperforms TQ for only 50 topics and Q outperforms TQ for 49 topics.
This explains why the combination of the two representations scores higher on
average than either on its own. Because T and Q are often very similar, their
combination also often results in the same score.
      </p>
      <p>Finally, we compare the per topic scores of the TQ representations with the
richer representations TQG, TQN and TQGN. The TQG run improves
performance on more topics than on which it decreases performance, which corresponds
with an improvement on the average score. The representations that include the
narrative, TQN and TQGN, both worsen performance with respect to the TQ
representations on more topics than on which they improve, corresponding to a
drop in performance in ndcg@10. What is surprising is that including the much
longer narrative in the representation does not a ect the per topic score for the
majority of topics. There are several possible explanations for this. It could be
that additional terms often provide the same relevance signal as the TQ terms,
or introduce a random noise. Another explanations is that the TQ terms are
frequently repeated in the narrative and therefore have a dominant impact on
the retrieval score.</p>
      <p>To summarise, the di erent query representations often carry the same signal,
which may be because the same content terms dominate in the representations.
However, it seems hard to improve upon the title created by the topic starter,
but combining the concise representations of topic starter and annotator more
often results in an improved representation than in a worse one.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper we discussed our participation in the INEX 2013 Social Book
Search Track in which we focus on the impact of di erent query representations
of the information needs on retrieval e ectiveness. The LT members who start
a topic thread to ask for book suggestions on the discussion forums provide
multiple types of perspectives on their information needs. The thread title is a
short summary, the rst message in the thread is a detailed description and the
choice of the particular discussion group reveals the relevant general category
of books for which they hope to nd knowledgeable members. In addition the
task organisers provided mediated queries that aim to be both concise and
comprehensive expressions of the information need, and that are suitable as search
engine queries.</p>
      <p>The mediated query in general slightly shorter than the thread title, and
typically contains a few overlapping terms and one or a few di erent content
terms. By combining the representations, the overlapping terms in the title and
query|which we assume are the most relevant terms|receive extra weight.</p>
      <p>The group name is short but also tends to add a few new terms to the
representation with respect to the title and query. The narrative is much longer
and adds many terms, relevant or not to the representation.</p>
      <p>In terms of the impact of representations on retrieval e ectiveness, the title
and mediated query are equally e ective. Their combination, however, leads to
signi cant improvements over using the title alone, which is either due to the
higher frequency of the most important terms or to the complementary content
terms. However, for most topics, the title, query and their combination lead to
the same retrieval performance. Adding the group name improves performance,
indicating that the user selected a relevant discussion group for her information
need. Adding narrative degrades performance slightly, which may be because of
the addition of irrelevant or partly terms that broaden the scope of the query.
These ndings suggest that creating a comprehensive and e ective topic
representations that identify all the important relevance aspects in social book search
information needs is not easy, even for trained annotators. Such topics often
contain complex, multi-faceted aspects, which may be the reason why users turn
to the forum in the rst place, as current book search systems provide limited
options to express complex needs.</p>
      <p>We also experimented with reranking results by combining the retrieval score
with a prior probability based on the Bayesian average of a book's LibraryThing
ratings. These average ratings provide a reliable probability of relevance and
lead to signi cant improvements in performance.</p>
      <p>
        In future work we will look in more detail at the overlap and complementarity
of the title and mediated query and the role of term frequencies in topic
representations of the complex information needs in social book search. We will also
study the role of the detailed narrative and experiment with extracting the most
salient additional terms to improve the topic representations. One way would
be to use parsimonious language models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to remove common conversational
terms.
      </p>
      <p>Acknowledgments This research was supported by the Netherlands
Organization for Scienti c Research (NWO projects # 612.066.513, 639.072.601, and
640.005.001) and by the European Communitys Seventh Framework Program
(FP7 2007/2013, Grant Agreement 270404).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Andriaans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koolen</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Kamps.</surname>
          </string-name>
          <article-title>The importance of document ranking and user-generated content for faceted search and book suggestions</article-title>
          . In S. Geva,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          , and R. Schenkel, editors,
          <source>Focused Retrieval of Content and Structure: 10th International</source>
          <article-title>Workshop of the Initiative for the Evaluation of XML Retrieval (INEX</article-title>
          <year>2011</year>
          ), volume
          <volume>7424</volume>
          <source>of LNCS</source>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <article-title>Parsimonious language models for information retrieval</article-title>
          .
          <source>In Proceedings SIGIR</source>
          <year>2004</year>
          , pages
          <fpage>178</fpage>
          {
          <fpage>185</fpage>
          . ACM Press, New York NY,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Koolen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Kazai. Social Book</surname>
          </string-name>
          <article-title>Search: The Impact of Professional and User-Generated Content on Book Suggestions</article-title>
          .
          <source>In Proceedings of the International Conference on Information and Knowledge Management (CIKM</source>
          <year>2012</year>
          ). ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Turtle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Indri: a language-model based search engine for complex queries</article-title>
          .
          <source>In Proceedings of the International Conference on Intelligent Analysis</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>