<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multimodal Social Book Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Melanie Imhof</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ismail Badache</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohand Boughanem</string-name>
          <email>boughanemg@irit.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT - Paul Sabatier University</institution>
          ,
          <addr-line>Toulouse</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universite de Neucha</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Zurich University of Applied Sciences</institution>
          ,
          <addr-line>Winterthur</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>tel</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Today's information retrieval applications have become increasingly complex. The Social Book Search (SBS) lab at CLEF 2015 allows evaluating retrieval methods on a complex search task with several textual and non-textual meta-data elds. The challenge is to incorporate the di erent information types (modalities) into a single ranked list. We build a strong textual baseline and combine it with a document prior based on social signals. Further, we include non-textual modalities in relation to the user preferences using random forest learning to rank. Our experiments show that both the social document prior and the learning to rank approach improve the search results.</p>
      </abstract>
      <kwd-group>
        <kwd>Relevance feedback</kwd>
        <kwd>random forest</kwd>
        <kwd>non-textual modalities</kwd>
        <kwd>social signals</kwd>
        <kwd>document prior</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The suggestion track of the INEX Social Book Search (SBS) lab at CLEF 2015
challenges researchers to nd methods to retrieve books as requested by real
users of LibraryThing. The complex collection consists of more than 50
metadata elds of real books from Amazon. Thus, the retrieval methods can not
rely on the content of the books but only on meta-data such as product
descriptions, user-generated reviews and ratings. The lab's evaluation metric nDCG@10
re ects the user behavior that in such an application only the rst few
"recommendations" are considered. Hence, to maximize the number of relevant books
in the rst few results both the textual description of the user's query and the
user's pro le including his personal catalog matter. For such a complex task with
that many information types, methods are required to handle and fuse them into
a single ranked list. Analogously to multimedia retrieval, we call these di erent
information types "modalities". Hence, our goal in this complex task was to fuse
a strong textual baseline approach with several non-textual and social
modalities that respect the user preferences. Therefore, we established and re ned a
textual baseline using traditional information retrieval weighting schemes, blind
relevance feedback, user-pro le based ltering and example book based relevance
feedback. We enhanced this with document priors based on social signals such
as the ratings and tags. Finally, we applied a random forest learning that further
improves the results by including the non-textual modalities price and number
of pages with respect to the user preferences.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Collection and Data</title>
      <p>The SBS collection consists of 2.8 million book records from Amazon, extended
with social meta-data from LibraryThing. Each book record is an XML le with
elds like isbn, title, review, summary, rating and tag. The full list of elds is
shown in Table 1.</p>
      <p>There are 208 topics in the SBS 2015 lab. Each topic is a query that was
posted on LibraryThing for a list of books and consists of ve elds: title,
mediated query, narrative, example and group. Hereby, the narrative is the textual
description of the query from which a hand-crafted mediated query is derived.
Further, the example eld contains a list of books that the user has mentioned as
positive or negative examples. Additionally, the personal LibraryThing catalog
of each topic creator is available, which includes a list of the books the user has
archived on LibraryThing along with his personal ratings.</p>
      <p>The relevance assessments are based on the actual suggestions to the original
query on the LibraryThing forum. The relevance values are weighted using a
decision tree that includes reliability information such as whether the user who
suggested a book has read it. The SBS 2015 topics are a subset of the topics used
in 2014. However, the relevance assessments have been extended with additional
book suggestions that have not been included in 2014.</p>
    </sec>
    <sec id="sec-3">
      <title>Retrieval Models</title>
      <sec id="sec-3-1">
        <title>Textual Models</title>
        <p>As a basis for our methods we employ a textual baseline using a traditional
information retrieval system. Therefore, we merge all textual elds of the document
into a single textual index eld. Further, we construct queries from the three
topic elds title, mediated query and narrative that are analogously merged into
a single textual representation.</p>
        <p>
          We extend the textual baseline with a query expansion (blind relevance
feedback) based on Rocchio's method [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Therefore, the n most characteristic terms
of the m top-ranked documents are added to the query. Hereby, the most
characteristic terms of a document are chosen by the term weight determined by the
weighting scheme.
        </p>
        <p>As described in Section 2 the topics contain example books mentioned by
the topic creators. We use the contents of the example books that are associated
with a positive or neutral sentiment to expand the queries similar to the blind
relevance feedback.</p>
        <p>
          Additionally, we lter the books already read by the topic creator from the
nal ranked list, since this is a hard criterion in the relevance assessments [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
Hereby, we determine the read books from the catalog of the topic creator as
well as from the example books that are marked as read.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Social Signals-Based Model</title>
        <p>Our approach consists of exploiting social data as a priori knowledge to take
into account in the retrieval model. We combine textual relevance of a given
document to a query and its social importance modeled as a prior probability.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1 Preliminaries</title>
        <p>The social information that we exploit within the framework of our model can be
represented by 3-tuple &lt; U; D; A &gt; where U, D and A are nite sets of instances
Users, Documents and Actions.</p>
        <p>Documents. We consider a collection C =fD1, D2,...Dng of n documents,
where each document D represents a book. We assume that a book can be
represented by both a set of textual keywords Dw=fw1, w2,...wyg and a set of
social actions A performed on the book, Da=fa1, a2,...azg.</p>
        <p>Actions. We consider a set A=fa1, a2,...amg of m types of actions (signals)
that users can perform on the documents. These actions represent the relation
between users U =fu1, u2,...uhg and documents C.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2 Social Document Prior</title>
        <p>We exploit textual models to estimate the relevance of a document to a query.
Our approach combines the social document prior P (D) and the relevance status
value RSVtextual(Q; D) between a query Q and document D as</p>
        <p>RSV (D; Q) ra=nk P (D) RSVtextual(Q; D)
ra=nk P (D)</p>
        <p>Y RSVtextual(wi; D);
wi2Q
where wi represents the terms in the query Q and RSVtextual(wi; D) can be
estimated with di erent models such as BM25 and language model. The
document prior P (D) is a query-independent probability of seeing the document.
It is useful for representing and incorporating other sources of evidence to the
retrieval process. Our main contribution is a method to estimate P (D) by
exploiting social signals.</p>
        <p>
          According to our previous approach [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the priors are estimated by simply
counting the number of actions performed on the documents. We assume that
the signals are independent. Thus the general formula for calculating P (D) is
where P (ai) is estimated using maximum-likelihood. It is calculated as
P (D) = Y P (ai);
        </p>
        <p>ai2A
P (ai) = log(1 + jDai j) ;
log(1 + jDaj)
(1)
(2)
(3)
(4)
(5)
(6)
(7)</p>
        <p>In addition to considering social features separately as described above, we
propose to incorporate the ratings as a measurement of the popularity and the
reputation of a book. For this purpose, we use the Bayesian average (BA) of the
ratings as a document prior, which takes into account how many users have rated
a book. As more users rate the same book, the average becomes more reliable
and less sensitive to outliers. Books that have many ratings are boosted with
respect to books that have little ratings and books with high ratings are boosted
more than books with low ratings. Hereby, the BA of a book is computed as
BA(D) =
avg(Dr) jDrj + PD02C avg(Dr0) jDr0j ;</p>
        <p>jDrj + PD02C jDr0j
where jDai j is the number of actions of type ai on document D and jDaj is the
total number of actions on document D. Further, we use Dirichlet to smooth
P (ai) by collection C to avoid zero probabilities. This leads to
where P (aijC), analogously to P (ai), is estimated using maximum-likelihood.</p>
        <p>P (D) = Y
ai2A
log(1 + jDai j) +
log(1 + jDaj) +</p>
        <p>P (aijC)</p>
        <p>;</p>
        <p>P (aijC) = lloogg((11++PPDD22CC jjDDaaijj))
where avg is the average function and Dr is the set of ratings of document D.
We note that considering logarithmic priors helps to compress the score range
and thereby reduces the impact of the priors on the global score.</p>
        <p>PBA(D) =</p>
        <p>log(1 + BA(D))
log(1 + PD02C BA(D0))
For books with no ratings this would result in a prior probability of zero. In
order to avoid a multiplication by zero and thus ignoring the textual score, we
use the Add-One smoothing method:</p>
        <p>PBA(D) =</p>
        <p>1 + log(1 + BA(D))
1 + log(1 + PD02C BA(D0))
:
(8)
(9)
3.3</p>
      </sec>
      <sec id="sec-3-5">
        <title>Learning to Rank (Random Forests)</title>
        <p>
          Besides the textual modalities, the SBS collection contains several non-textual
modalities. We use random forests [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to learn how to combine not only the
di erent textual runs but also the non-textual modalities into a single ranked
list. In particular, we use the price and number of pages of a book with respect to
the user's preference as well as the book's ratings. Hereby, the user's preference
is estimated by the average of the attributes in the topic creator's catalog; e.g.
a user that only has short books in his catalog prefers short books. We assume
that a user prefers to retrieve books that have similar attributes as the books he
has read in the past. To achieve this, we add the di erence between the average
of the book prices in the topics creator's catalog and the price of the book to
the random forest algorithm as an additional feature. Similarly, we add such
a feature for the number of pages. For the ratings we assume that a general
preference towards higher rated books exists for all users. Thus, we add the
absolute average rating of a book as an additional feature to the random forests.
To allow the algorithm to incorporate the signi cance of the average rating, we
also add the number of ratings as a separate feature. The ratings are the ratings
of the reviews of the book as well as the ratings in the catalogs of all topic
creators. In order to combine these ratings, we divide the ratings in the catalogs
by two, so that all ratings are in the same range.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>We evaluated our approaches based on a series of experiments on the SBS 2015
task. Our goals in these experiments are to evaluate whether social signals (tags
and rating ) and other non-textual modalities can improve the search results.
4.1</p>
      <sec id="sec-4-1">
        <title>Experimental Setup</title>
        <p>For the textual baseline we used Lucene4 for indexing and searching. We used
the EnglishAnalyzer, which removes a small set of stopwords and stems terms
4 https://lucene.apache.org/core/
using the Porter stemming algorithm. The weighting scheme used for most of the
o cial runs is BM25 with b = 0:75 and k1 = 1:2. We have also ran some
experiments using language model with Dirichlet smoothing with = 2500, however,
we found that the BM25 achieved a better mean average precision (MAP) and
nDCG@10 for the textual baseline. In order to validate the e ectiveness of our
approaches we used the topics and relevance assessments from SBS 2014.</p>
        <p>For the blind relevance feedback, we experimented with the number of
topranked documents used for the relevance feedback as well as with the number of
terms extracted. However, we found that none of the combinations improve the
textual baseline.</p>
        <p>Since the topics from SBS 2015 are a subset of the topics from 2014, we
were able to automatically add the example books from the 2015 topics to the
corresponding topics in 2014. We found that expanding the queries with 35 terms
extracted from the example books maximizes the nDCG@10 on the topics from
2014. Since we only have the example books for about 30% of the 2014 topics,
the overall performance gain was not very big, however we have seen that the
performance for the topics with example books has increased signi cantly.</p>
        <p>Lucene does not provide a lter implementation that allows rejecting a list of
documents, which is required to lter the read books. Thus we implemented our
own lter with a similar concept as the Lucene's FieldCacheTermsFilter, which
rejects all the documents that are not in the given list of documents.</p>
        <p>As described in Section 3.2, we integrated social signals into the traditional
textual model by re-ranking the results. The social signals are modeled as an a
priori probability P (D). We ran di erent experiments using all available social
signals on the SBS collection (ratings, totalvotes, helpfulvotes, tags, etc.), but
we found that the signals tags and ratings, estimated based on the formulas 5
and 9, achieved a better MAP and nDCG@10 compared to the other signals. We
conducted our experiments in two ways: for Run3 and Run4 we multiplied P (D)
by the textual language model score; for Run5 and Run6, we combined the social
signals score (P (tags) multiplied by PBA(D)) linearly with Run1, respectively
with random forests trained with 100 trees. We set the smoothing parameter
of formula 5 to 200, although more experiments will be necessary to get the best
parameter. Experiments showed that the best combination parameter for the
social score is 0.25 for Run5 and 0.2 for Run6.</p>
        <p>We used RankLib5 to train the random forests. For all the experiments,
we left the default parameters unchanged except for the number of trees and
the train metric which was set to nDCG@10. Unsurprisingly, increasing the
number of trees results in a longer computation time, but also higher nDCG@10
values when training and testing on the SBS 2014 topics. However, with a higher
number of trees the risk of over- tting the data increases. The input for the
random forests was built from the top 500 documents of six di erent textual
runs together with the three non-textual modalities as described in Section 3.3.
The textual runs were the textual baseline, the textual baseline with the read
book lter, the textual baseline plus example based relevance feedback with and
5 http://sourceforge.net/p/lemur/wiki/RankLib/
without ltering the read books and two runs using blind relevance feedback
(total of 80 terms from 10 documents and total of 40 terms from 5 documents).
Even though the blind relevance feedback runs on their own did not improve
the textual baseline, we decided to add two runs using di erent parameters to
the random forest in order to increase the variance of the input ranked lists.
As training data we used the SBS 2014 topics and relevance assessments with
the example books added from the 2015 topics. This is not an ideal situation,
since the training data and the test data have an overlap. However, since we do
not have example books for all the 2014 topics, we were not able to exclude the
topics which are also in 2015 without losing the bene t of our example based
relevance feedback.</p>
        <p>For our participation to INEX SBS 2015 track, we built six runs by applying
di erent con gurations:
{ Run1: Textual baseline using BM25 with example based relevance feedback
using 35 terms and read book ltering.
{ Run2: Random forests trained with 10 trees based on six textual runs and
three non-textual modalities (price, number of pages and ratings).
{ Run3: Run1 using language model combined with Bayesian average
reranking based on ratings.
{ Run4: Run1 using language model combined with re-ranking based on the
tags.
{ Run5: Run1 combined with re-ranking based on the tags and Bayesian
average of ratings.
{ Run6: Random forests trained with 100 trees based on six textual runs and
three non-textual modalities (price, number of pages and ratings) combined
with re-ranking based on the tags and Bayesian average of ratings.</p>
        <p>In the next section we discuss the evaluation results of our o cial submission.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results and Discussion</title>
        <p>We can see that the runs (Run2 and Run6) using random forest training far
exceed the e ectiveness of the runs using no training. During our experiments
we saw that including the three non-textual modalities in the learning helps
to increase the nDCG@10, which means that these modalities contain relevant
information regarding the book suggestions.</p>
        <p>
          Our textual baseline, although not submitted, achieves an nDCG@10 of
0.0768. Thus, the ltering together with the example based relevance feedback
(Run1) signi cantly improves the nDCG@10 by 6.7% with a signi cance level of
58.4% calculated using the signi cance paired randomization test [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>According to our experiments, Run3 and Run4 improve Run1 with language
model (nDCG@10 of 0.0834) signi cantly (signi cance level = 18:4%,
respectively = 15:3%). Using both the ratings and the tags (Run5) improves the
e ectiveness more than just using one of them. We note that the Run3 provides
slightly better results in terms of MRR and MAP compared to Run4. One of the
reasons of this is that the signal (rating) for Run3 that quanti es the reputation
may be seen as expressing the engagement of a user who provides his explicit
endorsement. For example, the document having more positive signals (ratings,
likes, etc.) are more trustworthy than the ones that do not possess these social
signals. If multiple users have found that the document is useful, then it is more
likely that other users will nd this document useful too. The social signals that
quantify the popularity (number of reviews, tags, etc.) do not represent approval
votes, as for example the reviews can be positive or negative, but they represent
trend factors and a measure of information propagation. Therefore, a popular
information always arouses the interest of the user.</p>
        <p>The R@1000 is approximately the same for all runs, since they mostly are
based on a re-ranking of Run1, for which we only retrieved the top 1000
documents. Since the learning based runs only used slight variations of Run1, they
do not retrieve additional relevant documents beyond the top 1000 documents
of Run1. For a recall-centric application, using a higher variety of runs as well
as more documents per run would be bene cial.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper, we described our participation to the suggestion track of the INEX
SBS 2015 lab. We showed how to build a textual baseline and how to improve
this using blind relevance feedback as well as example book based relevance
feedback. Further, we proposed a method to include the social signals as a priori
social knowledge that further enhanced the e ectiveness of our system. The
learning based approach using random forests, allowed us to incorporate the
user preferences with respect to the book price and the number of pages as well
as to combine the best aspects of the di erent variations of our textual methods.</p>
      <p>So far, we did not use the anonymized user pro les from LibraryThing which
would allow us to add additional ratings to the social model. Also we would
like to test our learning approach with completely separated training and test
datasets. Hence, we need to extract the example books for all the topics of SBS
2014. As a long term goal however, we think it is important to nd methods that
do not rely on learning. Although it might help to develop these by investigating
the output of the random forests in order to better understand the modalities
including their importance and their dependencies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Badache</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boughanem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Social priors to estimate relevance of a resource</article-title>
          .
          <source>In: IIiX Conference</source>
          . pp.
          <volume>106</volume>
          {
          <fpage>114</fpage>
          . IIiX'14, ACM, NY, USA (
          <year>2014</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/2637002.2637016
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bogers</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koolen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaap</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Preminger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of the inex 2014 social book search track</article-title>
          .
          <source>In: Conference and Labs of the Evaluation Forum</source>
          . pp.
          <volume>462</volume>
          {
          <issue>479</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random forests</article-title>
          .
          <source>Machine learning 45(1)</source>
          ,
          <volume>5</volume>
          {
          <fpage>32</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Rocchio</surname>
            ,
            <given-names>J.J.:</given-names>
          </string-name>
          <article-title>Relevance feedback in information retrieval</article-title>
          .
          <source>In: The SMART Retrieval System: Experiments in Automatic Document Processing</source>
          . pp.
          <volume>313</volume>
          {
          <fpage>323</fpage>
          . PrenticeHall, Englewood Cli s NJ (
          <year>1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Smucker</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carterette</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A comparison of statistical signi cance tests for information retrieval evaluation</article-title>
          .
          <source>In: CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management</source>
          . pp.
          <volume>623</volume>
          {
          <fpage>632</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>