=Paper=
{{Paper
|id=Vol-1391/11-CR
|storemode=property
|title=LaHC at CLEF 2015 SBS Lab
|pdfUrl=https://ceur-ws.org/Vol-1391/11-CR.pdf
|volume=Vol-1391
|dblpUrl=https://dblp.org/rec/conf/clef/AmerG15
}}
==LaHC at CLEF 2015 SBS Lab==
<pdf width="1500px">https://ceur-ws.org/Vol-1391/11-CR.pdf</pdf>
<pre>
                   LaHC at CLEF 2015 SBS Lab

                     Nawal Ould-Amer1 and Mathias Géry2
               1
                   Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France
                         CNRS, LIG, F-38000 Grenoble, France
                 2
                    Université de Lyon, F-42023, Saint-Étienne, France,
    CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000, Saint-Étienne, France
      Université de Saint-Étienne, Jean-Monnet, F-42000, Saint-Étienne, France
          Nawal.Ould-Amer@imag.fr, Mathias.Gery@univ-st-etienne.fr


       Abstract. This paper describes the work of the LaHC lab of Saint-
       Étienne for the Social Book Search lab at CLEF 2015. Our goals were
       i) to study a field-based retrieval model (BM25F), exploiting various
       topics and documents fields, in order to build a strong baseline for fur-
       ther experiments, ii) to compare it with a Log logistic (LGD) retrieval
       model, and iii) to exploit some documents related to each topic (i.e. the
       documents given as negative or positive examples for a topic).
       The official results show that LGD outperforms BM25F, and that our ap-
       proaches exploiting documents related to the topic requesters are based
       on a different interpretation of this additional information than the in-
       terpretation of the Social Book Search organizers.

       Keywords: Field-based Information Retrieval, Re-ranking, Relevance
       Feedback


1    Introduction

This paper describes the work of the LaHC lab of Saint-Étienne for the Sugges-
tion track of the Social Book Search lab at CLEF 2015. The goal is to investigate
techniques to support users in searching and navigating the full texts of digitised
books and complementary social media as well as providing a forum for the ex-
change of research ideas and contributions [3]. Participants to this track have to
suggest books based on rich search requests combining several topical and contex-
tual relevance signals, as well as user profiles and real-world relevance judgments.
The dataset is based on 1.5 million books descriptions and metadata, some of
them user-generated, crawled from Amazon and LibraryThing.
    Our participation to the Social Book Search lab at CLEF 2014 [2] has shown
us that the SBS dataset contains many various kind of data, and that before
experimenting our Social Information Retrieval models, we need a better un-
derstanding of how to represent and how to exploit non-social (but nevertheless
complex) data using classic models.
    Especially, our work for the Social Book Search lab at CLEF 2015 focuses
on:
 – study a field-based retrieval model (BM25F [5]), exploiting various topics and
   documents fields, in order to build a strong baseline for further experiments;
 – compare BM25F with some other Information Retrieval models, especially
   the Log logistic (LGD [1]) retrieval model;
 – exploit some non-social data (but nevertheless related to the users), espe-
   cially the documents given as negative or positive examples for a topic.

    Our experiments were conducted using the Terrier Information Retrieval Sys-
tem3 [4], that implements various IR models and especially LGD and also some
field-based models as BM25F.

   The paper is organized as follows: Section 2 presents briefly the Information
Retrieval models used. Then, Section 3 details our approaches aiming at exploit-
ing the positive or negative documents related to each topic. Finally, Section 4
presents the official results obtained, before concluding in section 5.


2     BM25F vs LGD (runs UJM 1 and UJM 2)

The Social Book Search 2015 dataset contains many various data describing or
related to the documents, the topics and the users. Among all these informa-
tion, we have used the Terrier Information Retrieval System implementation of
the field-based models BM25F [5], in order to exploit the following fields from
documents and topics:

 – the fields title, summary, content and tags from the documents;
 – the fields title, mediated query and narrative from the topics;

   BM25F was used with the parameters values presented in Table 1, taken
from our participation to Social Book Search lab 2014 [2], generating our run
named UJM 1.


                           Table 1. BM25F parameters

                                              Fields
                                  title summary content tags
                      Parameter c 1.0     0.10      0.45 0.00
                      Weight w      1       2        1     6


   LGD is the Terrier implementation of the Log logistic model [1]. Grid opti-
mization of parameter c on SBS 2014 data led to fix it at 0.2, generating our
run named UJM 2.
3
    Terrier: http://www.terrier.org
3   Documents given as “examples” (runs UJM 3, UJM 4,
    UJM 5, UJM 6)
Our last goal was to exploit some non-social data (but nevertheless related to the
users): the list of documents given as negative or positive examples for each topic.
Our idea was that a user might be interested (respectively unsatisfied) if he finds
as answers documents that he has read and that he appreciated (respectively
disliked) and thus defined as positive (respectively negative) example for the
topic.
    We implemented this hypothesis in two ways:
Re-ranking (RR): Achieve a re-ranking where the document a user likes are
   boosted, and the documents he dislikes are removed for the result. After the
   score normalization between 0 and 1, we add 1 to the normalized score of
   document that the user likes and to set the score to 0 for the document that
   he dislikes. This process is then a post-processing of an existing run. It is
   worth noting that, if several retrieved documents are liked by the user, their
   relative initial ranking is preserved; It has been applied on our BM25F run
   UJM 1 (generating our run UJM 4) and also on our Log logistic run LGD
   run UJM 2 (generating our run UJM 6);

Relevance Feedback (RF): Define a relevance feedback, positive for the doc-
   uments that the topic user likes, and negative for the documents he does
   not like. We achieved such relevance feedback on our BM25F UJM 1 run
   (generating our run UJM 5), and also on our Log logistic LGD run UJM 2
   (generating our run UJM 3). The relevance feedback uses all the positive
   documents and selects the top 10 terms according to the default selection of
   Terrier [4].


4   Results
The Table 2 presents the official results obtained by our 6 runs. Log logistic
(LGD) outperforms BM25F, regarding the official nDCG@10 measure as well as
regarding the 3 other measures, despite the fact that BM25F is designed to take
into account and to weight the different fields describing the documents.
    Our approaches “Re-ranking (RR)” and “Relevance Feedback (RF)”, both
exploiting the list of documents given as negative or positive examples for each
topic, lower the quality of the results. These approaches are based on a different
interpretation of this list of documents than the interpretation of the Social
Book Search organizers. Actually, these “examples” documents (the negative
ones as well as the positive ones) are not considered as relevant by the organizers.
Thus, re-ranking positively the positive examples (or using them as relevant
documents in a relevance feedback process) can only lower the results. On the
other hand, removing the negative examples from our runs (or using them as
irrelevant documents in a relevance feedback process) may sometimes improve
the results. All in all, the quality of our results is lowered.
                   Table 2. Official results for the 6 UJM runs

    Rank         Run        nDCG@10             MRR      MAP      R@1000 Profiles
     17     UJM 2 (LGD)       0.088             0.174    0.065     0.483   no
     20   UJM 6 (LGD + RR)    0.084             0.160    0.060     0.483   no
     24    UJM 1 (BM25F)      0.081             0.167    0.056     0.471   no
     29   UJM 3 (LGD + RF)    0.079             0.155    0.059     0.485   no
     30  UJM 4 (BM25F + RR)   0.079             0.158    0.055     0.471   no
     33  UJM 5 (BM25F + RF)   0.074             0.150    0.054     0.471   no


5     Conclusion
This paper describes the work of the LaHC lab of Saint-Étienne for the Social
Book Search lab at CLEF 2015. Our quite basic experiments show that Log
logistic (LGD) outperforms BM25F. Four of our six runs were based on a mis-
interpretation of the documents given as negative or positive examples for a
topic.
    Our 2015 participation allows us to build a strong basis in order to experi-
ment in the future some more advanced some Personalized Information Retrieval
approaches.


Acknowledgment
This work is supported by Région Rhône-Alpes through the ReSPIr project.


References
1. Clinchant, S., Gaussier, E.: A Log-Logistic Model for Information Retrieval. In:
   Conference on Information and Knowledge Management (CIKM’09). Hong-Kong,
   China (2009)
2. Hafsi, M., Géry, M., Beigbeder, M.: LaHC at INEX 2014: Social Book Search Track.
   In: Working Notes for CLEF 2014 Conference. pp. 514–520 (2014)
3. Koolen, M., Bogers, T., Kamps, J., Kazai, G., Preminger, M.: Overview of the INEX
   2014 Social Book Search Track. In: Working Notes for CLEF 2014 Conference. pp.
   462–479 (2014)
4. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A
   High Performance and Scalable Information Retrieval Platform. In: SIGIR Work-
   shop on Open Source Information Retrieval (OSIR’06) (2006)
5. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple
   weighted fields. In: Conference on Information and Knowledge Management. pp.
   42–49. CIKM’04, New York, NY, USA (2004)

</pre>