<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Book Recommendation based on Social Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chahinez Benkoussasy</string-name>
          <email>chahinez.benkoussas@lsis.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrice Belloty</string-name>
          <email>patrice.bellot@lsis.org</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix-Marseille University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>: In this paper, we present our contribution in INEX 2013 Social Book Search Track. This track aim to explore social information (users reviews, ratings, etc...) for the libraryThing and Amazon collections of real books. In our submissions for SBSTrack, we rerank books by combining the Sequential Dependence Model (SDM) and the use of social component that takes into account both ratings and helpful votes.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>XML retrieval</kwd>
        <kwd>controlled metadata</kwd>
        <kwd>book recommendation</kwd>
        <kwd>re-ranking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Previous editions of the INEX Book Track focused on the retrieval of
real out- of-copyright books [1]. These books were written almost a
century ago and the collection consisted of the OCR content of over 50, 000
books. The topics and the books of the collection have a di erent
vocabulary and writing style. Information Retrieval systems had di culties
to found relevant information, and assessors had di culties judging the
documents.</p>
      <p>The document collection is composed of the Amazon 1 pages of real
books. IR must search through editorial data, user reviews and
ratings for each book, instead of searching through the whole content of
the book. The topics were extracted from LibraryThing 2 forums and
represent real request from real users.</p>
      <p>We have chosen to use a Language Modeling approach to retrieval.
For our recommendation runs, we used the reviews and the ratings
attributed to books by Amazon users. We computed a \social score" for
1http://www.amazon.com/
2http://www.librarything.com/
each book, considering the amount of reviews and the ratings. This
score was then interpolated with scores obtained by a Marcov Random
Field (MRF) baseline. We also used the \helpfulvotes" and \totalvotes"
values for each rating given by users to modify the ranking obtained by
the combination of social and MRF scores.</p>
      <p>The rest of the paper is organized as follows. The following Section
gives an insight into the document collection whereas 2 describes the
our retrieval framework. Finally, we describe our runs in 3.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval model</title>
      <p>Sequential Dependence Model
We used a language modeling approach to retrieval [2]. We use
Metzler and Croft's Markov Random Field (MRF) model [3] to integrate
multi word phrases in the query. Speci cally, we use the Sequential
Dependence Model (SDM), which is a special case of MRF. In this model
three features are considered: single term features (standard unigram
language model features, fT ), exact phrase features (words appearing
in sequence, fO ) and unordered window features (require words to be
close together, but not necessarily in an exact sequence order, fU ).</p>
      <p>Finally, documents are ranked according to the following scoring
function:</p>
      <p>SDM (Q; D) =</p>
      <p>T X fT (q; D)</p>
      <p>q2Q
+ O
+ U
i=1
i=1
jQj 1
X fO(qi; qi+1; D)
jQj 1
X fU (qi; qi+1; D)
where the feature weights are set according to the author's
recommendation ( T = 0.85, O= 0.1, U = 0.05). fT , fO and fU are the log
maximum likelihood estimates of query terms in document D, computed
over the target collection with a Dirichlet smoothing.
2.2</p>
      <p>Modeling book likeliness
We modeled book likeliness basing on the following idea: more the
number of reviews it has, more interesting it is reading it (it may not be a
good or popular book but a book that has a high impact).</p>
      <p>Likeliness(D) =</p>
      <p>Pr2RD r
jReviewsDj
where RD is the set of all ratings given by the users for the book D,
and jReviews Djis the number of reviews.</p>
      <p>
        We further rerank books according to a linear interpolation of the
previously computed SDM score with the likeliness score, using a
coefcient ( ) to control the in uence of each model. The scoring function
of a book D given a query Q is thus de ned as follows:
SDM Likeliness(Q; D) =
(SDM (Q; D)) + (1
) (Likeliness(D))
where is a constant set according to previous results
        <xref ref-type="bibr" rid="ref1">(done on 2011
and 2012 datasets)</xref>
        , with a default value of 0,89.
2.3
      </p>
      <p>Modeling usefulness of ratings' books
Into the collection of books, we have a rating for each review given by
users, the rating value can or cannot be useful depending on user votes.
we have chosen to weight the value of rating with the value of helpful
votes according to this formula:</p>
      <p>U sef ulness(D) =</p>
      <p>P
r2RD;t2TD;h2HD r
jReviewsDj
( ht )
where RD , TD , HD are respectively, the sets of all ratings, totalvotes
and helpfulvotes given by the users for the book D, and jReviews Djis
the number of reviews.</p>
      <p>We further rerank books according to a linear interpolation of the
previously computed SDM score with the usefulness score, using a
coefcient ( ) to control the in uence of each model. The scoring function
of a book D given a query Q is thus de ned as follows:
SDM U sef ulness(Q; D) =
(SDM (Q; D))+(1</p>
      <p>
        ) (U sef ulness(D))
where is a constant set according to previous results
        <xref ref-type="bibr" rid="ref1">(done on 2011
and 2012 datasets)</xref>
        , with a default value of 0,93.
3
      </p>
      <p>Run
We submitted 3 runs for the Social Book Search Task. We used Indri 3
for indexing and searching. We did not remove any stopword and used
the standard Krovetz stemmer. Only query part of the topic has been
used for the three runs.</p>
      <p>SDM run : This run is the implementation of the Sequential
Dependence Model (SDM) described in Section 2.1.</p>
      <p>SDM Rating run : This run combine the implementation of the
Sequential Dependence Model and the use of social information which
is the \Ratings" given by users. Description is given in Section 2.2</p>
      <p>SDM HV run : For the last run we combine the implementation
of the Sequential Dependence Model and the use of social information
which are \ratings", \helpful votes" and \total votes" given by users. We
weighted the value of \rating" with the rate of helpful votes as presented
in Section 2.3.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This paper presents our contributions on the INEX 2013 Social Book
Search Track. We proposed a simple method for reranking books based
on their likeliness and an e ective way to take into account user
helpful votes. Finally we combine both methods with a linear interpolated
function.</p>
      <p>We are disappointed with the o cial results this year (nDCG@10
= 0.0571 for the baseline \SDM run") compared to those obtained last
year with the the approach that was our baseline this year (nDCG@10 =
0,1295) and we seek for explanations of a software problem. On the other
side, the proposed extensions (nDCG@10 = 0.596 for \SDM Rating run"
and nDCG@10 = 0.0576 for \SDM HV run") improved the results of
the baseline.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work was supported by the French program \Investissements d'Avenir
- Developpement de l'Economie Numerique" under the project
InterTextes #O14751-408983.</p>
    </sec>
    <sec id="sec-5">
      <title>References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Gabriella</given-names>
            <surname>Kazai</surname>
          </string-name>
          , Marijn Koolen, Antoine Doucet, and
          <string-name>
            <given-names>Monica</given-names>
            <surname>Landoni</surname>
          </string-name>
          .
          <article-title>Overview of the INEX 2010 Book Track: At the Mercy of Crowdsourcing</article-title>
          . In Shlomo Geva, Jaap Kamps, Ralf Schenkel, and Andrew Trotman, editors,
          <source>Comparative Evaluation of Focused Retrieval</source>
          , pages
          <volume>98</volume>
          {
          <fpage>117</fpage>
          . Springer Berlin / Heidelberg,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Combining the language model and inference network approaches to retrieval</article-title>
          . Inf. Process. Manage.,
          <volume>40</volume>
          :
          <fpage>735</fpage>
          {
          <fpage>750</fpage>
          ,
          <year>September 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Donald</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A markov random eld model for term dependencies</article-title>
          .
          <source>In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <source>SIGIR '05</source>
          , pages
          <fpage>472</fpage>
          {
          <fpage>479</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>