<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SOCIAL BOOK SEARCH TRACK: ISM@CLEF'16 SUGGESTION TASK</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ritesh Kumar</string-name>
          <email>ritesh4rmrvs@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guggilla Bhanodai</string-name>
          <email>bhanodaig@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajendra Pamula</string-name>
          <email>rajendrapamula@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Indian School of Mines Dhanbad</institution>
          ,
          <addr-line>826004</addr-line>
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the work that we did at Indian School of Mines towards Social Book Search Track for CLEF 2016. As per requirement of CLEF-2016 we submitted six runs in its Suggestion Task. We investigated individual e ect of title, group, request, as well as combined e ect of title, request and group elds of the topics in our runs. For all the runs we used language modeling technique with Dirichlet smoothing. The run using combined e ect of title, request and group eld was our best. Overall, our performance is good but it needs some improvement, our scores are encouraging enough to work for better results in future.</p>
      </abstract>
      <kwd-group>
        <kwd>Book Search</kwd>
        <kwd>Social Book Search</kwd>
        <kwd>Language modeling</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>re-ranking</kwd>
        <kwd>Normalization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        With growing numbers of online portals and book catalogues, our current time
sees a rapid evolution in the way we acquire, share and use books. In order to
enable users for searching the relevant books, Social Book Seach Track at CLEF [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
provides a relevant experimental platform to investigate techniques of searching
and navigating professional metadata. These metadata are provided by
publishers/booksellers and user-generated content from social media [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In CLEF
2016 at Social book Search Lab, they o ered three di erent tracks: Suggestion
Track, Interactive Track and Mining Track. We participated in the suggestion
track where we were supposed to recommend books based on user's request and
her personal catalogue data (list of books with rating and tags maintained for
the user in the social cataloguing site). We were also provided with a large set
of anonymised user pro les from LibraryThing forum members, consisting of
almost 93,976 anonymised user pro les from LibraryThing with over 33
million cataloguing transaction. Each user request is provided in the form of topics
containing di erent elds like title, request, group, examples and catalogue
information.
      </p>
      <p>Our goal is to investigate the contribution of di erent topic elds as well as
combining e ect of some elds for book recommendation. We only considered
title, request, group elds from each topic.We did not consider topic-creator’s
catalogue information nor did we consult the user pro les.</p>
      <p>
        We submitted six runs (ISMD16all eds, ISMD16title eld, ISMD16request eld,
ISMD16titlewithoutreranking, similaritytitle eldreranked, ISMD16group eld) in
the Suggestion Task. For all the runs, Language modelling with Dirchlet
smoothing was used in Lemur's Indri search system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The organization of rest of the paper is as follows. Section 2 describes about
dataset. we describe our methodology: eld categories and indexing, which
document and topic elds we used for retrieval in section 3. Section 4 describes what
approaches we have used, Section 5 reports results. In Section 6 we analyse our
results. Finally, we conclude in Section 7 with directions for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>
        The test collection provided by CLEF 2016 SBS orgainzers for Suggestion Task
had a document collection and a topicset. The document collection consists of
2.8 million book description with metadata from Amazon and LibraryThing. In
Amazon there is formal metadata like booktitle, author, publisher, publication
year, library classi cation codes, Amazon categories, similar product information
and user-generated content in the form of user ratings and reviews. In Amazon,
there are user tags and user-provided metadata on awards, book characters,
locations and blurbs. There are additional records from the British Library and
the Library of Congress. The entire collection was 7.1 GB in size [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The topic-set contains 120 topics each describing a user's request for
suggestion of books. Each topic has a set of elds like title, request, group, example
and user's personal catalogue at the time of topic creation. The catalogue
contains a list of book-entries with information like LibraryThing id of the book,
its entry-date, rating and tags.</p>
      <p>The organizers also supplied 94,000 anonymised user pro les from
LibraryThing.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>3.1</p>
      <p>Field categories and Indexing
We are provided by Amazon/LibraryThing data collection(corpus) which
consists of 2.8 million book descriptions with metadata. There are so many elds in
the corpus, we took some of them for indexing which are as follow:
Metadata In our metadata index, we used these metadata eld: &lt;title&gt;
&lt;creator&gt;, &lt; rstwords&gt;, &lt;lastwords&gt;.</p>
      <p>Content In our content index, we used these metadata eld:&lt;content&gt; of
provided corpus containing, &lt;blurbs&gt;, &lt;epigraph&gt;, &lt;quotation&gt;.</p>
      <p>Tags In our tags index, we used &lt;tags&gt; eld for indexing.</p>
      <p>Reviews In our reviews index, we used &lt;reviews&gt; eld from corpus.
3.2</p>
      <p>Topics
This year's Suggestion task has provided 120 topics, With help of these we built
four set of queries which are:
Topic-Title: Only the&lt;title&gt; eld of each topic.</p>
      <p>Topic-Request: It contains only the &lt;request&gt; eld.</p>
      <p>Topic-group: Only the &lt;group&gt; eld.</p>
      <p>Topic-All-Fields: It contains &lt;title&gt;, &lt;request&gt;, &lt;group&gt; eld.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Approach</title>
      <p>In our approach we analyzed two methods rst one i.e. Content Based Retrieval
and secondly re-ranking approach after rank normalization of the scores of the
retrieved documents. For both retrieving approaches we used Language
modeling with Dirichlet smoothing. The document collection provided was
stopwordremoved using SMART stop word list and then stemmed using Krovetz
stemmer. We did not remove stopwords from provided topics. For retrieving and
indexing we used Lemur 5.9 search system. We also removed punctuation marks
from all the textual content of these elds and used only free text queries in all
the runs.We did not consider any other information like catalogue information
and user pro le during retrieval. For each topic, we submitted up to 1000 book
suggestions in the form of ISBNs.
4.1</p>
      <p>
        Content Based Retrieval
During retrieval, we tried to see the e ect of di erent components of a topic
one by one as well as combined contribution of all the topics except &lt;example&gt;
eld. It is simply based on adhoc retrieval. We can see the result given in Table
1.
In this method we are inspired by Social Feature Re-ranking Method proposed
by Toine Bogers in 2012 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In order to improve the initial ranking, we
perform re-ranking by two di erent strategies after analyzing the structure of XML:
Item-Rerank (I) and RatingReview-Rerank (R), For re-ranking we have used
following stages:
      </p>
      <p>Similarity Calculation : The similarity of two documents based on feature I
is calculated by equation (1)
simij (I) =
(1 : i is j0s similar product or j is i0s similar product</p>
      <p>0 : otherwise
score0(i) =
score(i) + (1
)</p>
      <p>N
X simij score(j)(j 6= i)
j=1
(1)
(2)</p>
      <p>Re-Ranking : We re-rank the top 1000 list of initial ranking for the above
mentioned features by Equation (2).</p>
      <p>Pr2Ri r
jreviews(i)j
score(i) (3)
score0(i) =
score(i) + (1</p>
      <p>
        ) log(jreviews(i)j)
For feature R, we use Equation (3) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Before re-ranking we apply rank normalization on the retrieved results to map
the score into the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The balance between the original retrieval score,
score(i) and the contributions of the other books in the results list is controlled
by the parameter, which takes values in the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], but in our experiment
we have taken xed value i.e. = 0.96. Due to lack of time, we couldn't try with
any other value.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        The scores obtained by our six runs are given in Table 2. The o cial evaluation
measure provided by CLEF'16 is nDCG@10 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The performance of our runs
are in decreasing order. Our best performance is by ISMD16all eds where we
use title,request and group eld. We also show the best score in the task
demonstrated by run-id run1.keyQuery active combineRerank(*), for the sake of
comparison.
ISMD16all eds 24 0.1722
      </p>
      <p>ISMD16title eld 28 0.1197</p>
      <p>ISMD16request eld 29 0.1454
ISMD16titlewithoutreranking 33 0.1114
similaritytitle eldreranked 35 0.0966</p>
      <p>ISMD16group eld 43 0.0527
best* 1 0.5247
Although our performance is not up to the mark, there are few take-home lessons.
In our run id:ISMD16all eds, ISMD16title eld, ISMD16request eld and
ISMD16group eld, we have reranked the retrieved score based on reviews(R)
by taking =0.96.</p>
      <p>In our top score i.e. ISMD16all eds, we have taken combination of all the
elds title, request, group except example eld from topic, In ISMD16title eld,
we have taken onlytitle eld, In ISMD16request eld we have taken only eld
of the topic, For ISMD16group eld we have taken only group eld. For run id:
ISMD16titlewithoutreranking we simply used as content based retrieval.
For run id: similaritytitle eldreranked we have used similarity as well as
reranking by taking = 0.96.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        This year we participated in the Suggestion Task of Social Book Search. We tried
to see the individual e ect as well as combined e ect of di erent topic- elds on
book recommendation. We considered only a handful of elds like request,
title, group etc from the topics. While there can be no denial of the fact that our
overall performance is average, initial results are suggestive as to what should
be done next. We need to consult other elds like book catalogue of the topic
creators, ratings of the books in the catalogue during retrieval. We also need to
take into account pro les of other users. It is also imperative to see the learning
to rank for di erent elds, and taking the parameter range between [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ], this
time we have taken xed vale of = 0.96. We will also use other elds in user
catalogues and user pro les. We shall be exploring some of these tasks in the
coming days.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          , Gabriella Kazai, Jaap Kamps, Michael Preminger,
          <article-title>Antoine Doucet and Monica Landoni, Overview of the INEX 2012 Social Book Search Track</article-title>
          . INEX'12 Workshop Pre-proceedings, Shlomo Geva, Jaap Kamps, Ralf Schenkel (editors),
          <source>September 17-20</source>
          ,
          <year>2012</year>
          , Rome , Italy.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. INEX,
          <article-title>Initiative for the Evaluation of XML Retrieval</article-title>
          . https://inex.mmci.unisaarland.de/data/documentcollection.jsp
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. INDRI:
          <article-title>Language modeling meets inference networks</article-title>
          , Available at http://www.lemurproject.org/indri/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jarvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kekalainen</surname>
          </string-name>
          , J.:
          <article-title>Cumulated Gain-based Evaluation of IR Techniques</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          <volume>20</volume>
          (
          <issue>4</issue>
          ) (
          <year>2002</year>
          )
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. CLEF,
          <article-title>Conference and labs of the Evaluation Forum</article-title>
          . http://clef2016.clefinitiative.eu/index.php
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Bogers</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          . Rslis at inex 2012:
          <article-title>Social book search track</article-title>
          .
          <source>In INEX'12 Workshop</source>
          Pre-proceedings, pages
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>R. D. Ludovic</surname>
            Bonnefoy and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Bellot</surname>
          </string-name>
          .
          <article-title>Do social information help book search</article-title>
          ? In INEX'12 Workshop Pre-proceedings, pages
          <fpage>109</fpage>
          -
          <lpage>113</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Renda</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straccia</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          : Web Metasearch:
          <article-title>Rank vs. Score-based Rank Aggregation Methods</article-title>
          .
          <source>In: SAC 03: Proceedings of the 2003 ACM Symposium on Applied Computing</source>
          , New York, NY, USA, ACM (
          <year>2003</year>
          )
          <fpage>841846</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>