<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-Document Search Engine For Book Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrice Bellot</string-name>
          <email>patrice.bellot@lsis.org</email>
          <email>patrice.bellot@openedition.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix-Marseille Université, CNRS, LSIS UMR 7296 13397, Marseille. France Aix-Marseille Université</institution>
          ,
          <addr-line>CNRS, CLEO OpenEdition UMS 3287, 13451 13397, Marseille.</addr-line>
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Chahinez Benkoussas Aix-Marseille Université, CNRS, LSIS UMR 7296 13397, Marseille. France Aix-Marseille Université</institution>
          ,
          <addr-line>CNRS, CLEO OpenEdition UMS 3287, 13451 13397, Marseille.</addr-line>
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A new combination of multiple Information Retrieval approaches are proposed for book recommendation based on complex users' queries. We used di erent theoretical retrieval models: probabilistic as InL2 (Divergence From Randomness model) and language models and tested their interpolated combination. We considered the application of a graph based algorithm in a new retrieval approach to related document network comprised of social links. We called Directed Graph of Documents (DGD) a network constructed with documents and social information provided from each one of them. Speci cally, this work tackles the problem of book recommendation in the context of CLEF Labs precisely Social Book Search track. We established a speci c strategy for queries searching after separating query set into two genres \Analogue" and \Non-Analogue" after analyzing users' needs. Series of reranking experiments demonstrate that combining retrieval models and exploiting linked documents for retrieving yield signi cant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to di erent environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Document retrieval</kwd>
        <kwd>InL2</kwd>
        <kwd>language model</kwd>
        <kwd>book recommendation</kwd>
        <kwd>PageRank</kwd>
        <kwd>graph modeling</kwd>
        <kwd>Social Book Search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>CBRecSys 2015, September 20, 2015, Vienna, Austria.</p>
      <p>
        Copyright remains with the authors and/or original copyright holders
There has been much work both in the industry and academia
on developing new approaches to improve the performance of
retrieval and recommendation systems over the last decade.
The aim is to help users to deal with information
overload and provide recommendation for books, restaurants or
movies. Some vendors have incorporated recommendation
capabilities into their commerce services, such as Amazon.
Existing document retrieval approaches need to be improved
to satisfy users' information needs. Most systems use
classic information retrieval models, such as language models or
probabilistic models. Language models have been applied
with a high degree of success in information retrieval
applications [29{31]. This was rst introduced by Ponte and Croft
in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. They proposed a method to score documents, called
query likelihood in two steps: estimate a language model
for each document and then rank documents according to
the likelihood scores resulting from the estimated language
model. Markov Random Field model, proposed by Metzler
and Croft in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] considers query term proximity in
documents by estimating term dependencies in the context of
language modeling approach. Alternatively, Divergence From
Randomness model, proposed by Amati and Van
Rijsbergen [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], measures the global informativeness of the term in
the document collection. It is based on the idea :\The more
the term occurrences diverge from random throughout the
collection, the more informative the term is " [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. One limit
of such models is that the distance between query terms in
documents is not considered.
      </p>
      <p>Users' queries di er by their type of needs. In book
recommendation, we identi ed two genres of queries : \Analogue"
and \Non-Analogue" that we describe in the following
sections. In this paper, the rst proposed approach combines
probabilistic and language models to improve the retrieval
performances and show that the two models act much better
in the context of book recommendation.</p>
      <p>
        In recent years, an important innovation in information
retrieval is the exploitation of relationships between
documents, e.g. Google's PageRank [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. It has been
successful in Web environments, where the relationships are
provided by hyperlinks between documents. We present a new
approach for linking documents to construct a graph
structure that is used in retrieving process. In this approach,
we exploit the PageRank algorithm for ranking documents
with respect to users' queries. In the absence of
manuallycreated hyperlinks, we use social information to create a
Directed Graph of Documents (DGD) and argue that it can
be treated in the same manner as hyperlink graphs. Our
experiments will show that incorporating graph analysis
algorithms in document retrieval improves the performance in
term of the standard ranked retrieval metrics.
      </p>
      <p>Our work focuses on search in the book recommendation
domain, in the context of CLEF Labs Social Book Search track.
We tested our approaches on collection contains
Amazon/LibraryThing book descriptions and set of queries, called
topics, extracted from the LibraryThing discussion forums.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        This work is rst related to the area of document retrieval
models, more specially language models and probabilistic
models. The unigram language models are most often used
for ad hoc Information Retrieval work but several researchers
explored the use of language modeling for capturing higher
order dependencies between terms. Bouchard and Nie in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
showed signi cant improvements in retrieval e ectiveness
with a new statistical language model for the query based on
completing the query by terms in the user's domain of
interest, reordering the retrieval results or expanding the query
using lexical relations extracted from the user's domain of
interest.
      </p>
      <p>
        Divergence From Randomness (DFR) is one of several
probabilistic models that we have used in our work. Abolhassani
and Fuhr have investigated several possibilities for
applying Amati's DFR model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for content-only search in XML
documents. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Social Book Search (SBS) task1 aims to evaluate the value
of professional and user's metadata for book search on the
Web. The main goal is to exploit search techniques to deal
with complex information needs and complex information
sources that include user pro les, personal catalogs, and
book descriptions.</p>
      <p>
        The SBS task provides a collection of 2.8 million book
description crawled by the University of Duisburg-Essen from
Amazon2 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and enriched with content from LibraryThing3,
which is an online service to help people catalog their books
easly. Books are stored in XML les and identi ed by an
ISBN. They contains information like: title information,
Dewey Decimal Classi cation (DDC) code (for 61% of the
books), category, Amazon product description, etc.
Amazon records contain also social information generated by
users like: tags, reviews, ratings (see Figure 1. For each
book, Amazon suggests a set of \Similar Products" which
represents a result of computed similarity based on content
information and user behavior (purchases, likes, reviews,
etc.) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        There has been an increasing use of techniques based on Figure 1: Example of book from the Amazon/LibraryThing
graphs constructed by implicit relationships between doc- collection in XML format
uments. Kurland and Lee performed structural reranking
based on centrality measures in graph of documents which
has been generated using relationships between documents SBS task provides a set of queries called topics where users
based on language models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Lin demonstrates the describe what they are looking for (books for a particular
possibility to exploit document networks de ned by automatically- genre, books of particular authors, similar books to those
generated content-similarity links for document retrieval in that have been already read, etc.). These requests for
recthe absence of explicit hyperlinks. He integrates the PageR- ommendations are natural expressions of information needs
ank scores with standard retrieval score and shows a signi - for a large collection of online book records. The topics are
cant improvement in ranked retrieval performance. His work crawled from LibraryThing discussion Forums.
was focused on search in the biomedical domain, in the
context of PubMed search engine. Perhaps the main contrast The topic set consists of 680 topics in 2014. Each topic has
with our work is that links were not induced by generation a narrative description of the information need and other
probabilities or linguistic items. elds as illustrated in Figure 2.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. INEX SOCIAL BOOK SEARCH TRACK</title>
    </sec>
    <sec id="sec-4">
      <title>AND TEST COLLECTION</title>
      <p>1http://social-book-search.humanities.uva.nl/
2http://www.amazon.com/
3http://www.librarything.com/</p>
    </sec>
    <sec id="sec-5">
      <title>4. RETRIEVAL MODELS</title>
      <p>This section describes the retrieval models we used for book
recommendation and their combination.</p>
    </sec>
    <sec id="sec-6">
      <title>4.1 InL2 of Divergence From Randomness</title>
      <p>
        We used InL2, Inverse Document Frequency model with
Laplace after-e ect and normalization 2. This model has
been used with success in di erent works [
        <xref ref-type="bibr" rid="ref10 ref26 ref3 ref6">3,6,10,26</xref>
        ]. InL2 is
a DFR-based model (Divergence From Randomness) based
on the Geometric distribution and Laplace law of succession.
      </p>
    </sec>
    <sec id="sec-7">
      <title>4.2 Sequential dependence Model of Markov</title>
    </sec>
    <sec id="sec-8">
      <title>Random Field</title>
      <p>
        Language models are largely used in Document Retrieval
search for book recommendation [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Metzler and Croft
proposed Markov Random Field (MRF) model [
        <xref ref-type="bibr" rid="ref18 ref20">18, 20</xref>
        ] that
integrates multi-word phrases in the query. Speci cally, we
used the Sequential Dependence Model (SDM), which is a
special case of MRF. In this models co-occurrence of query
terms is taken into consideration. SDM builds upon this idea
by considering combinations of query terms with proximity
constraints which are: single term features (standard
unigram language model features, fT ), exact phrase features
(words appearing in sequence, fO) and unordered window
features (require words to be close together, but not
necessarily in an exact sequence order, fU ).
      </p>
      <p>Finally, documents are ranked according to the following
scoring function:</p>
      <p>SDM (Q; D) = T</p>
      <p>X fT (q; D)+
q2Q
i=1
i=1
jQj 1
+ O X fO(qi; qi + 1; D)</p>
      <p>
        jQj 1
+ U X fU (qi; qi + 1; D)
Where feature weights are set based on the authora^AZs
recommendation ( T = 0:85, O = 0:1, U = 0:05) in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. fT
, fO and fU are the log maximum likelihood estimates of
query terms in document D, computed over the target
collection using a Dirichlet smoothing. We applied this model
      </p>
    </sec>
    <sec id="sec-9">
      <title>4.3 Combining Search Systems</title>
      <p>
        Combining the output of many search systems, in contrast to
using just a single one improves the retrieval e ectiveness as
proved in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] where Belkin combined the results of
probabilistic with vector space models. On the basis of this approach,
In our work, we combined the probabilistic model, InL2 with
language model SDM. This combination takes into account
both the informativeness of query terms and their
dependencies in the document collection. Each retrieval model
uses di erent weighting schemes therefore the scores should
be normalized. We used the maximum and minimum scores
according to Lee's formula [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        normalizedScore =
oldScore
maxScore
minScore
minScore
It has been shown in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that InL2 and SDM models have
di erent levels of retrieval e ectiveness, thus it is necessary
to weight individual model scores depending on their overall
performance. We used an interpolation parameter ( ) that
we varied to improve retrieval e ectiveness.
      </p>
    </sec>
    <sec id="sec-10">
      <title>5. GRAPH MODELING</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the authors have exploited networks de ned by
automaticallygenerated content-similarity links for document retrieval.
We provided document analysis to nd new way to link
them. In our case, we exploited a special type of
similarity based on several factors. This similarity is provided by
Amazon and corresponds to \Similar Products" given
generally for each book. The degree of similarity depends on social
information like: number of clicks or purchases and
contentbased information like book attributes (book description,
book title, etc.). The exact formula used by Amazon to
combine social and content based information to compute
similarity is proprietary. The idea behind this linking method
is that documents linked with such type of similarity, the
probability that they are in the same context is higher than
if they are not connected.
      </p>
      <p>To perform data modeling into DGD, we extracted the
\Similar Products" links between documents in order to
construct the graph structure. Once used it to enrich results
from the retrieval models, in the same spirit as
pseudorelevance-feedback. Each node in the DGD represents
document (Amazon description of book), and has set of
properties:</p>
      <sec id="sec-10-1">
        <title>ID: book's ISBN</title>
        <p>content : book description that include many other
properties (title, product description, author(s), users'
tags, content of reviews, etc.)
M eanRating : average of ratings attributed to the
book</p>
        <sec id="sec-10-1-1">
          <title>4http://www.lemurproject.org/indri/</title>
          <p>5http://www.lemurproject.org/lemur/
IndriQueryLanguage.php</p>
        </sec>
      </sec>
      <sec id="sec-10-2">
        <title>P R : book's PageRank</title>
        <p>Edges in the DGD are directed and correspond to Amazon
similarity, so given nodes fA; Bg 2 S , if A points to B,
B is suggested as Similar Product to A. In the Figure 3,
we show an example of DGD, network of documents. The
DGD network contains 1 645 355 nodes (89:86% of nodes are
within the collection and the rest are outside) and 6 582 258
edges.
In this section, the collection of documents is denoted by
C. In C, each document d has a unique ID. The set of
queries called topics is denoted by T , the set Dinit C refers
to the documents returned by the initial retrieval model.
StartingN ode identi es a document from Dinit used as
input to the graph processing algorithms in the DGD. The
set of documents present in the graph is denoted by S. Dti
indicates the documents retrieved for topic ti 2 T .</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>5.1 Our Approach</title>
      <p>The DGD network contains useful information about
documents that can be exploited for document retrieval. Our
approach is based, rst on results of a traditional retrieval
engine, then on the DGD network to nd new documents.
The idea is to suppose that the suggestions given by
Amazon can be relevant to the user queries.</p>
      <p>Algorithm 1 takes as inputs: Dinit returned list of
documents for each topic by the retrieval techniques described
in Section 3, DGD network and parameter which is the
number of the top selected StartingN ode from Dinit
denoted by DStartingNodes. We xed to 100 (10% of the
returned list for each topic). The algorithm returns a list
of recommendations for each topic denoted by \Dfinal". It
processes topic by topic, and extracts the list of all neighbors
for each StartingN ode. It performs mutual Shortest Paths
computation between all selected StartingN ode in DGD.
The two lists (neighbors and nodes in computed Shortest
Paths) are concatenated after that all duplicated nodes are
deleted. The set of documents in returned list is denoted by
Dgraph. A second concatenation is performed between initial
list of documents and Dgraph (all duplications are deleted) in
new nal list of retrieved documents, Dfinal reranked using
di erent reranking schemes.</p>
      <p>Algorithm 1 Retrieving based on DGD feedback
1: Dinit Retrieving Documents for each ti 2 T
2: for each Dti 2 Dinit do
3: DStartingNodes rst documents 2 Dti
4: for each StartingN ode in DStartingNodes do
5:
6:</p>
    </sec>
    <sec id="sec-12">
      <title>6. EXPERIMENTS AND RESULTS</title>
      <p>In this section, we describe the experimental setup we used
for our experiments. Furthermore, we present the di erent
reranking schemes used in previously de ned approaches.
We discuss the results we achieved by using the InL2
retrieval model, its combination to the SDM model, and
retrieval system proposed in our approach that uses the DGD
network.
For our experiments, we used di erent tools that implement
retrieval models and handle the graph processing. First,
we used Terrier (TERabyte RetrIEveR)6 Information
Retrieval framework developed at the University of Glasgow
[21{23]. Terrier is a modular platform for rapid
development of large-scale IR applications. It provides indexing
and retrieval functionalities. It is based on DFR framework
and we used it to deploy InL2 model described in section
4.1. Further information about Terrier can be found at
http://ir.dcs.gla.ac.uk/terrier.</p>
      <p>A preprocessing step was performed to convert INEX SBS
corpus into the Trec Collection Format7, by considering that
the content of all tags in each XML le is important for
indexing; therefore the whole XML le was transformed on
one document identi ed by its ISBN. Thus, we just need
two tags instead of all tags in XML, the ISBN and the whole
content (named text).</p>
      <p>Secondly, Indri8, Lemur Toolkit for Language Modeling and
Information Retrieval was used to carry out a language
model (SDM) described in section 4.2. Indri is a framework
that provides state-of-the-art text search methods and a rich
structured query language for big collections (up to 50
million documents). It is a part of the Lemur project and
developed by researchers from UMass and Carnegie Mellon
University. We used Porter stemmer and performed Bayesian
smoothing with Dirichlet priors (Dirichlet prior = 1500).
In section 5.1, we have described our approach based on
DGD which includes graph processing. We used NetworkX9
tool of Python to perform shortest path computing,
neigh</p>
      <sec id="sec-12-1">
        <title>6http://terrier.org/ 7http://lab.hypotheses.org/1129 8http://www.lemurproject.org/indri/ 9https://networkx.github.io/</title>
        <p>borhood extraction and PageRank calculation.</p>
        <p>
          To evaluate the results of retrieval systems, several
measurements have been used for SBS task: Discounted Cumulative
Gain (nDCG), the most popular measure in IR [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], Mean
Average Precision (MAP) which calculates the mean of
average precisions over a set of queries, and other measures:
Recip Rank and Precision at the rank 10 (P@10).
        </p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>6.2 Reranking Schemes</title>
      <p>Two approaches were proposed. The rst one (see section
4.3) merges the results of two di erent information retrieval
models which are the Language Model (SDM) and DFR
model (InL2). For topic ti, the models give 1000 documents
and each retrieved document has an associated score. The
linear combination method uses the following formula to
calculate nal score for each retrieved document d by SDM and
InL2 models:</p>
      <p>Sfinal(d; ti) =</p>
      <p>SInL2(d; ti) + (1
) SSDM (d; ti)
Where SInL2(d; ti) and SSDM (d; ti) are normalized scores.</p>
      <p>is the interpolation parameter set up at 0:8 after several
tests on the 2014 topics.</p>
      <p>The second approach (described in 5.1) uses the DGD
constructed from the \Similar Products" information. The
document set returned by the retrieval model are fused to the
documents in neighbors set and Shortest Path results. We
tested many reranking methods that combine the retrieval
model scores and other scores based on social information.
For each document in the resulting list, we calculated the
following scores:</p>
      <p>
        PageRank, computed using NetworkX tool. It is
a well-known algorithm that exploits link structure
to score the importance of nodes in a graph.
Usually, it was been used for hyperlink graphs such as the
Web [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The values of PageRank are given by the
following formula.
      </p>
      <p>P R(A) = (1
d) + d(P R(T1)=C(T1)</p>
      <p>+::: + P R(Tn)=C(Tn))
Where document A has documents T1...Tn which point
to it (i.e., Similar products). The parameter d is a
damping factor set between 0 and 1 (0:85 in our case).
C(A) is de ned as the number of links going out of
page A.</p>
      <p>Likeliness, computed from information generated by
users (reviews and ratings). It is based on the idea that
more the book has a lot of reviews and good ratings,
the more interesting it is (it may not be a good or
popular book but a book that has a high impact).
Likeliness(D) = log(#reviews(D))</p>
      <p>Pr2RD r
#reviews(D)
Where #reviews(D) is the number of reviews attributed
to D, RD is the set of reviews of D.</p>
      <p>The computed scores were normalized using this formula:
normalizedscore = oldscore=maxscore. After that, to
combine the results of retrieval systems and each of
normalized scores, an intuitive solution is to weight the retrieval
model scores with the previously described scores
(normalized PageRank and Likeliness). However, this would favor
documents with high PageRank and Likeliness scores even
though their content is much less related to the topics.</p>
    </sec>
    <sec id="sec-14">
      <title>6.3 Results</title>
      <p>We used two topic sets provided by INEX SBS task in 2014
(680 topics). The systems retrieve 1000 documents per topic.
We assessed the narrative eld of each topic and provided
automatic classi cation of the topic set into 2 genres. Analogue
topics (261) in which users give the already read books
(generally, titles and authors) to have similar books. In the
second genre \Non-Analogue" (356 topics), users describe their
needs by de ning the thematic, interested eld, event, etc.
without citing other books. Notify that, 63 topics are
ignored because of their ambiguity.</p>
      <p>In order to evaluate our IR methodologies described in
sections 4.3, 5 we performed retrieving for each topic genre
individually. The experimental results, which describe the
performance of the di erent retrieval systems on
Amazon/LibraryThing document collection, are shown in Table 1.
As illustrated in Table 1, the system that combines
probabilistic model InL2 and the Language Model SDM (InL2 SDM)
achieves a signi cant improvement for each topic set
comparing to InL2 model (Baseline) but the improvement is highest
for Non-Analogue topic set where the content of queries are
more explicit than the other topic set. This improvement is
mainly due to the increase of the number of relevant
documents that are retrieved by both systems.
forming reranking with PageRank improves signi cantly
performances but in contrast, it lowers the baseline
performances when using the Non-Analogue topic set. This can
be explained by the fact that Analogue topics contain
examples of books (Figure 6) which require the use of graph
to extract the similar connected books.
Using Likeliness scores (in InL2 DGD MnRtg) to rerank
retrieved documents decreases signi cantly the baseline e
ciency for the two topic sets. This means that ratings given
by users don't provide any improvement for the reranking
performances.
Figure 7 compares the number of improved, deteriorated
and same results' topics between the baseline (InL2) and the
proposed retrieval systems in term of MAP measure. The
proposed systems based on DGD graph provide the highest
number of improved topics compared with the combination
of IR systems. More precisely, using PageRank to rerank
document produces better results in term of improved
topics. This results prove the positive impact of linked structure
on document retrieval systems for book recommendation.
The results of run InL2 DDG PR using the Analogue topic
set con rm that exploiting structured documents and
perThe depicted results con rm that we are starting with
competitive baseline, suggesting that improvements contribute
by combining output retrieval systems and social link
analysis are indeed meaningful.
track and the proposed topics in 2014 divided into two classes
Analogue and \Non-Analogue".</p>
    </sec>
    <sec id="sec-15">
      <title>7. HUMANITIES AND SOCIAL SCIENCES</title>
    </sec>
    <sec id="sec-16">
      <title>COLLECTION: GRAPH MODELING AND</title>
    </sec>
    <sec id="sec-17">
      <title>RECOMMENDATION</title>
      <p>We tested the proposed approach of recommendation based
on linked documents on Revues.org10 collection. Revues.org
is one of the four platforms of OpenEdition11 portal
dedicated to electronic resources in the humanities and social
sciences (books, journals, research blogs, and academic
announcements). Revues.org was founded in 1999 and today
it hosts over 400 online journals, i.e. 149000 articles,
proceedings ans editorials.</p>
      <p>
        We built a network of documents from ASp12 journal. It
publishes research articles, publication listings and reviews
related to the eld of English for Speci c Purposes (ESP) for
both teaching and research. The network contains 500
documents and 833 relationships which represent bibliographic
citations. Each relationship is constructed using BILBO
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the reference parsing software. BILBO is constructed
with annotated corpora from Digital Humanities articles
from OpenEdition Revues.org platform. It automatic
annotates bibliographic references in the bibliography section
of each document and obtains the corresponding DOI
(Digital Object Identi er) via CrossRef13 API if such an identi er
exists.
      </p>
      <p>Each node in the citation network have a set of properties
(ID which is its URL, type, it can be article, editorial,
review of book, etc., and readers' clicks number that we called
popularity ). The recommender system applied on this
network takes as input user query, generally a small set of short
keywords, and performs retrieval step using Solr14 search
engine. The system extend the returned results with
documents in the citation network by using graph algorithms
(neighborhood search and shortest path algorithm) as
described in section5.1. After that, we rerank documents
according to the popularity property of each document.
We tested the system manually for a small set of user queries,
and found that for most queries, the results were satisfying.</p>
    </sec>
    <sec id="sec-18">
      <title>8. CONCLUSION AND FUTURE WORK</title>
      <p>In this paper, we proposed and evaluated approaches of
document retrieval in the context of book recommendation. We
used the test collection of CLEF Labs Social Book Search
10http://www.revues.org/
11http://www.openedition.org
12http://www.openedition.org/6457
13http://www.crossref.org/
14http://lucene.apache.org/solr/
We presented the rst approach that combines the outputs
of probabilistic model (InL2) and Language Model (SDM)
using a linear interpolation after normalizing scores of each
retrieval system. We have shown a signi cant improvement
of baseline results using this combination.</p>
      <p>A novel approach was proposed, based on Directed Graph
of Documents (DGD) constructed from social relationships.
It exploits link structure to enrich the returned document
list by traditional retrieval model (InL2). We performed a
reranking method using PageRank and Likeliness of each
retrieved document.</p>
      <p>In the future, we would like to construct an evaluation
corpora from Revues.org collection and develop an evaluation
process similar to that of INEX SBS task. Another
interesting extension of our work would be using the learning
to rank techniques to automatically adjust the settings of
re-ranking parameters.</p>
    </sec>
    <sec id="sec-19">
      <title>9. ACKNOWLEDGMENT</title>
      <p>This work was supported by the French program
Investissements d'Avenir FSN and the French Region PACA under
the projects InterTextes and Agoraweb.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abolhassani</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          .
          <article-title>Applying the divergence from randomness approach for content-only search in XML documents</article-title>
          . pages
          <volume>409</volume>
          {
          <fpage>419</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. J. Van</given-names>
            <surname>Rijsbergen</surname>
          </string-name>
          .
          <article-title>Probabilistic models of information retrieval based on measuring the divergence from randomness</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>357</volume>
          {
          <fpage>389</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati and C. J. van Rijsbergen</surname>
          </string-name>
          .
          <article-title>Probabilistic models of information retrieval based on measuring the divergence from randomness</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>357</volume>
          {
          <fpage>389</fpage>
          ,
          <year>October 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Beckers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pharo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nordlie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Fachry</surname>
          </string-name>
          .
          <article-title>Overview and results of the INEX 2009 interactive track</article-title>
          .
          <source>In Research and Advanced Technology for Digital Libraries, 14th European Conference, ECDL</source>
          <year>2010</year>
          , Glasgow, UK, September 6-
          <issue>10</issue>
          ,
          <year>2010</year>
          . Proceedings, pages
          <volume>409</volume>
          {
          <fpage>412</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Belkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Kantor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Fox</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Shaw.</surname>
          </string-name>
          <article-title>Combining the evidence of multiple query representations for information retrieval</article-title>
          .
          <source>Inf</source>
          . Process. Manage.,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <volume>431</volume>
          {
          <fpage>448</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Benkoussas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hamdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Albitar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ollagnier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellot</surname>
          </string-name>
          .
          <article-title>Collaborative ltering for book recommandation</article-title>
          .
          <source>In Working Notes for CLEF 2014 Conference, She eld, UK, September 15-18</source>
          ,
          <year>2014</year>
          ., pages
          <volume>501</volume>
          {
          <fpage>507</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bonnefoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deveaud</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellot</surname>
          </string-name>
          .
          <article-title>Do social information help book search</article-title>
          ? In P. Forner,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          Womser-Hacker, editors,
          <source>CLEF (Online Working Notes/Labs/Workshop)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bouchard</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          . ModA~ lles de langue appliquA~
          <article-title>l's A~ a la recherche d'information contextuelle</article-title>
          .
          <source>In CORIA</source>
          , pages
          <volume>213</volume>
          {
          <fpage>224</fpage>
          . UniversitA~ l' de Lyon,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Organizing and searching large les of document descriptions</article-title>
          .
          <source>PhD thesis</source>
          , Cambridge University,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>R. GuillA</surname>
          </string-name>
          <article-title>~ l'n. Gir with language modeling and dfr using terrier</article-title>
          . In C. Peters,
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , G. Jones,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>PeA~ sas</article-title>
          , and V. Petras, editors,
          <source>Evaluating Systems for Multilingual and Multimodal Information Access</source>
          , volume
          <volume>5706</volume>
          of Lecture Notes in Computer Science, pages
          <volume>822</volume>
          {
          <fpage>829</fpage>
          . Springer Berlin Heidelberg,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ja</surname>
          </string-name>
          <article-title>rvelin and</article-title>
          <string-name>
            <surname>J. Keka</surname>
          </string-name>
          <article-title>lainen. Ir evaluation methods for retrieving highly relevant documents</article-title>
          . In E. Yannakoudakis,
          <string-name>
            <given-names>N.</given-names>
            <surname>Belkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          , and M.
          <article-title>-</article-title>
          K. Leong, editors,
          <source>Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2000</year>
          ), pages
          <fpage>41</fpage>
          {
          <fpage>48</fpage>
          , New York, NY, USA,
          <year>2000</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Y.-M. Kim</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Bellot</surname>
            , E. Faath, and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dacos</surname>
          </string-name>
          .
          <article-title>Automatic annotation of bibliographical references in digital humanities books, articles and blogs</article-title>
          . In G. Kazai,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickho</surname>
          </string-name>
          , and P. Brusilovsky, editors,
          <source>BooksOnline</source>
          , pages
          <volume>41</volume>
          {
          <fpage>48</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Koolen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          , G. Kazai, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Preminger</surname>
          </string-name>
          .
          <article-title>Overview of the INEX 2014 social book search track</article-title>
          .
          <source>In Working Notes for CLEF 2014 Conference, She eld, UK, September 15-18</source>
          ,
          <year>2014</year>
          ., pages
          <volume>462</volume>
          {
          <fpage>479</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kurland</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>PageRank without hyperlinks: Structural re-ranking using links induced by language models</article-title>
          .
          <source>In Proceedings of SIGIR</source>
          , pages
          <volume>306</volume>
          {
          <fpage>313</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Combining multiple evidence from di erent properties of weighting schemes</article-title>
          .
          <source>In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '95</source>
          , pages
          <fpage>180</fpage>
          {
          <fpage>188</fpage>
          , New York, NY, USA,
          <year>1995</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Pagerank without hyperlinks: Reranking with pubmed related article networks for biomedical text retrieval</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Pagerank without hyperlinks: Reranking with pubmed related article networks for biomedical text retrieval</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Combining the language model and inference network approaches to retrieval</article-title>
          . Inf. Process. Manage.,
          <volume>40</volume>
          (
          <issue>5</issue>
          ):
          <volume>735</volume>
          {
          <fpage>750</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A markov random eld model for term dependencies</article-title>
          .
          <source>In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '05</source>
          , pages
          <fpage>472</fpage>
          {
          <fpage>479</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A markov random eld model for term dependencies</article-title>
          . In R. A.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ziviani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Marchionini</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Mo at, and J. Tait, editors,
          <source>SIGIR</source>
          , pages
          <volume>472</volume>
          {
          <fpage>479</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          .
          <article-title>Terrier: A High Performance and Scalable Information Retrieval Platform</article-title>
          .
          <source>In Proceedings of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR</source>
          <year>2006</year>
          ),
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          , G. Amati,
          <string-name>
            <surname>P. V.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , and Johnson. Terrier Information Retrieval Platform.
          <source>In Proceedings of the 27th European Conference on IR Research (ECIR</source>
          <year>2005</year>
          ), volume
          <volume>3408</volume>
          of Lecture Notes in Computer Science, pages
          <volume>517</volume>
          {
          <fpage>519</fpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          .
          <article-title>Research directions in terrier: a search engine for advanced retrieval on the web</article-title>
          . Novatica/UPGRADE Special Issue on Web Information Access,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Winograd</surname>
          </string-name>
          .
          <article-title>The pagerank citation ranking: Bringing order to the web</article-title>
          .
          <source>In Proceedings of the 7th International World Wide Web Conference</source>
          , pages
          <volume>161</volume>
          {
          <fpage>172</fpage>
          ,
          <string-name>
            <surname>Brisbane</surname>
          </string-name>
          , Australia,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Winograd</surname>
          </string-name>
          .
          <article-title>The pagerank citation ranking: Bringing order to the web</article-title>
          .
          <source>Technical Report 1999-66</source>
          ,
          <string-name>
            <surname>Stanford</surname>
            <given-names>InfoLab</given-names>
          </string-name>
          ,
          <year>November 1999</year>
          .
          <article-title>Previous number = SIDL-</article-title>
          <string-name>
            <surname>WP-</surname>
          </string-name>
          1999-0120.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          . University of glasgow at trec 2004:
          <article-title>Experiments in web, robust, and terabyte tracks with terrier</article-title>
          . In E. M. Voorhees and
          <string-name>
            <surname>L. P.</surname>
          </string-name>
          Buckland, editors,
          <source>TREC, volume Special Publication 500-261. National Institute of Standards and Technology (NIST)</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Ponte</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A language modeling approach to information retrieval</article-title>
          .
          <source>In Proc. SIGIR</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. J. van Rijsbergen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Porter</surname>
          </string-name>
          .
          <article-title>Probabilistic models of indexing and searching</article-title>
          .
          <source>In SIGIR</source>
          , pages
          <volume>35</volume>
          {
          <fpage>56</fpage>
          ,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>F.</given-names>
            <surname>Song</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A general language model for information retrieval</article-title>
          .
          <source>In Proceedings of the SIGIR Conference on Information Retrieval</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <article-title>Language model information retrieval with document expansion</article-title>
          . In R. C.
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Bilmes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chu-Carroll</surname>
          </string-name>
          , and M. Sanderson, editors,
          <source>HLT-NAACL. The Association for Computational Linguistics</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <article-title>Statistical Language Models for Information Retrieval</article-title>
          .
          <source>Synthesis Lectures on Human Language Technologies</source>
          . Morgan and Claypool Publishers,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>