<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Are There New BM25 \Expectations"?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emanuele Di Buccio</string-name>
          <email>dibuccio@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>dinunzio@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present some ideas about possible directions of a new interpretation of the Okapi BM25 ranking formula. In particular, we have focused on a full bayesian approach for deriving a smoothed formula that takes into account a-priori knowledge on the probability of terms. In fact, most of the e orts in improving the BM25 were done in capturing the language model (frequencies, length, etc.) but missed the fact that the constant equal to 0.5 used as a correction factor can be one of the parameters that can be modelled in a better way. This approach has been tested on a visual data mining tool and the initial results are encouraging.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The relevance weighting model, also known as RSJ by the name of its creators
(Roberston and Sparck-Jones), has been one of the most in uential model in the
history of Information Retrieval [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is a probabilistic model of retrieval that
tries to answer the following question:
      </p>
      <p>
        What is the probability that this document is relevant to this query?
`Query' is a particular instance of an information need, and `document' a
particular content description. The purpose of this question is to rank the documents
in order of their probability of relevance according the Probability Ranking
Principle [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
      </p>
      <p>If retrieved documents are ordered by decreasing probability of relevance
on the data available, then the system's e ectiveness is the best to be
gotten for the data.</p>
      <p>
        The probability of relevance is achieved by assigning weights to terms, the RSJ
weight hereafter named as wi, according to the following formula:
(1
p1)
(1
qi
q )
i ;
(1)
where pi is the probability that the document contains the term ti given that
the document is relevant, and qi is the probability that the document contains
the term ti given that the document is not relevant. If the estimates of these
probabilities are computed by means of a maximum likelihood estimation, we
obtain the following results:
qi =
pi =
ni
N
ri
R
ri
R
where ri is the number of relevant documents that contain term ti, ni the number
of documents that contain term ti, R and N the number of relevant documents
and the total number of documents, respectively. However, this estimation leads
to arithmetical anomalies; for example, if a term is not present in the set of
relevant documents, its probability pi is equal to zero and the logarithm of zero
will return a minus in nity. In order to avoid this situation, a kind of smoothing
is applied to the probabilities. By substituting Equation 2 and 3 in Equation 1
and adding a constant to smooth probabilities, we obtain:
BM25 estimates the full eliteness weight for a term from the RSJ score, then
approximates the term frequency behaviour with a single global parameter
controlling the rate of approach. Finally, it makes a correction for document length.
For a full explanation of how to interpret eliteness and integrate it into the BM25
formula read [6{9]. The resulting formula is summarised in the following way:
(2)
(3)
(4)
(5)
which is the actual RSJ score for a term. The choice of the constant 0.5 may
resemble some Bayesian justi cation related to the binary independence model.1
This idea is wrong, as Robertson and Sparck Jones explained in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and the real
justi cation can be traced back to the work of Cox [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The Okapi BM25 weighting schema takes a step further and introduces the
property of eliteness [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]:
      </p>
      <p>Assume that each term represent a concept, and that a given document
is about that concept or not. A term is `elite' in the document or not.</p>
      <p>wi0 = f (tfi) wi
where wi is the RSJ weight, and f (tfi) is a function of the frequency of the term
ti parametrized by global parameters.</p>
      <p>In this paper, we concentrate on the RSJ weight and in particular to a full
Bayesian approach for smoothing the probabilities and on a visual data analysis
to assess the e ectiveness of these new smoothed probabilities. In Section 2, we
present the Bayesian framework, then in Section 3 we describe the visualisation
approach; in Section 4, we describe the initial experiments on this approach.
Some nal remarks are given in Section 5.
1 In this model; documents are represented as binary vectors: a term may be either
present or not in a document and have a `natural' a priori probability of 0.5.</p>
    </sec>
    <sec id="sec-2">
      <title>Bayesian Framework</title>
      <p>
        In Bayesian inference, a problem is described by a mathematical model M with
parameters and, when we have observed some data D, we use Bayes' rule to
determine our beliefs across di erent parameter values [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]:
      </p>
      <p>P ( jD; M ) = P (Dj ; M )P ( jM ) ; (6)</p>
      <p>
        P (DjM )
the posterior distribution of our belief on is equal to a likelihood function
P (Dj ; M ), the mathematical model of our problem, multiplied by a prior
distribution P ( jM ), our belief in the values of the parameters of the model, and
normalized by the probability of the data P (DjM ). We control the prior by
choosing its distributional form along with its parameters, usually called
hyperparameters. Since the product between P (Dj ; M ) and P ( jM ) can be hard
to calculate, one solution is to nd a \conjugate" prior of the likelihood
function [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>In the case of a likelihood function which belongs to the exponential family,
there always exists a conjugate prior. Nave Bayes (NB) models have a
likelihood of this type and, since the RSJ weight is related to the Binary
Independence Model which is a multi-variate Bernoulli NB model, we can easily derive
a formula to estimate the parameter . The multi-variate Bernoulli NB model
represents a document d as a vector of V (number of words in the vocabulary)
Bernoulli random variables d = (t1; :::; ti; :::; tV ) such that:
We can write the probability of a document by using the NB assumption as:
(7)
(8)
(9)
(10)
ti</p>
      <p>Bern( ti ) :
P (dj ) =</p>
      <p>V V
Y ti = Y ixk (1
k=1
k=1
i)1 xk ;
where p^i is the new estimate of the probability pi. Accordingly, the probability
of a term in the non-relevant documents is:</p>
      <p>
        ^tijrel = N ni R r+i + + = q^i : (11)
where xi is a binary value that is equal either to 1 when the term ti is present
in the document or to 0 otherwise. With a Maximum Likelihood estimation, we
would end up with the result shown in Equation 2 and 3; instead, we want to
integrate the conjugate prior which in this case of a Bernoulli random variable
is the beta function:
betai = i
1(1
i)
1 ;
where i refers to the ith random variable ti. Therefore, the new estimate of the
probability of a term ti that takes into account the prior knowledge is given
by the posterior mean of Eq. 6 (see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for the details of this result). For the
relevant documents we obtain:
^tijrel =
      </p>
      <p>ri +
R +
+
= p^i ;
With this formula, we can recall di erent smoothing approaches; for example,
with = 0 and = 0 we obtain the Maximum Likelihood Estimation, with
= 1, = 1 the Laplace smoothing. We can even recall the RSJ score by
assigning = 0:5 and = 0:5.
3</p>
      <p>Probabilistic Visual Data Mining
Now that we have new estimates for the probabilities pi and qi, we need a way
to assess how the parameters and in uence the e ectiveness of the retrieval
system. In [11, 12], we presented a visual data mining tool for analyzing the
behavior of various smoothing methods, to suggest possible directions for nding
the most suitable smoothing parameters and to shed the light into new methods
of automatic hyper-parameters estimation. Here, we use the same approach for
analyzing a simpli ed version of the BM25 (that is Equation 5 ignoring the term
frequency function).</p>
      <p>In order to explain the visual approach, we present the problem of retrieval
in terms of a classi cation problem: classify the documents as relevant or non
relevant. Given a document d and a query q, we consider d relevant if:
P (reljd; q) &gt; P (reljd; q) ;
(12)
that is when the probability of being relevant is higher compared to the
probability of not being relevant. By using Bayes rule, we can invert the problem and
decide that d is relevant when:</p>
      <p>
        P (djrel; q)P (reljq) &gt; P (djrel; q)P (reljq) :
Note that we are exactly in the same situation of Equation (2.2) of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] where:
(13)
(14)
(15)
In fact, if we divide both members of Equation 13 by P (djrel; q)P (reljq) (we
assume that this quantity is strictly greater than zero), we obtain:
      </p>
      <sec id="sec-2-1">
        <title>P (djrel; q)P (reljq)</title>
        <p>P (reljd; q) / P (djrel; q)P (reljq) :</p>
      </sec>
      <sec id="sec-2-2">
        <title>P (djrel; q)P (reljq)</title>
        <p>P (djrel; q)P (reljq)
&gt; 1 ;
where the ranking of the documents is given by the value of the ratio on the left
(as in the BM25); moreover, we can classify a document as `relevant' if this ratio
is greater than one.</p>
        <p>The main idea of the two-dimensional visualization of probabilistic model
is to maintain the two probabilities separated and use the two numbers as two
coordinates, X and Y, on the cartesian plane:</p>
        <p>
          P (djrel; q)P (reljq) &gt; P (djrel; q)P (reljq) :
| {Xz } | {Yz }
(16)
If we take the logs, a monotonic transformation that maintains the order, and if
we model the document as a multivariate binomial (as in the Binary
Independence Model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]), we obtain for the coordinate X:
        </p>
        <p>X xi log
i2V
|</p>
        <p>p^i
1</p>
        <p>p^i
P (d{jrzel;q)
+ X log(1
i2V</p>
        <p>
          }
p^i) + log(P (reljq)) :
| P (r{ezljq) }
Since we are using the Bayesian estimate p^i, we can modulate it by adjusting
the hyper parameters and of Equation 10. If we want to consider the terms
that appear in the query, the rst sum is computed over the terms i 2 q, which
corresponds to Equation (2.6) of [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>We intentionally maintained explicit the two addends that are independent
of the document, respectively Pi2V log(1 p^i) and log(P (reljq)). These two
addends do not in uence the ordering among documents (it is a constant factor
independent of the document) but they can (and they actually do) a ect the
classi cation performance. If we rewrite the complete inequality and substitute
these addends with constants we obtain: 2</p>
        <p>X xi log
X xi log
i2q
i2q
1
1
p^i
p^i
p^i
p^i
+ c1 &gt; X xi log</p>
        <p>i2q
X xi log
i2q
1</p>
        <p>q^i
X xi log
i2q
|
1
p^i 1
p^i q^i
{z
RSJ</p>
        <p>q^i
1</p>
        <p>q^i
q^i
q^i
}
&gt; c2
&gt; c2
+ c2
c1
c1
(17)
(18)
(19)
(20)
that is exactly the same formulation of the RSJ weight with new estimates for pi
and qi, plus some indication about whether we classify a document as relevant
or not.
3.1</p>
        <sec id="sec-2-2-1">
          <title>A simple example</title>
          <p>Let us consider a collection of 1,000 documents, suppose that we have a query
with two terms, q = ft1; t2g, and the following estimates:
2 Note that we need to investigate how this reformulation is related to Cooper's linked
dependence assumption [13].
{ 10 relevant document (R = 10) for this query;
{ 20 documents that contain term t1 (n1 = 20) and three of them are known
to be relevant (r1 = 3);
{ 17 documents that contain term t2 (n2 = 17) and two of them are known to
be relevant (r2 = 2).</p>
          <p>For the log odds, we have:
1 = log
2 = log
1
1
p^1
p^2
p^1
p^2
= log
= log
3 +
7 +
2 +
8 +
;
;
1 = log
2 = log
1
1
q^1
q^2
q^1
q^2
= log
= log
Suppose that we want to rank two document d1 and d2, where d1 contains both
terms t1 and t2, while d2 contains only term t1. Let us draw the points in the
two-dimensional space, we assume the two constants c1 and c2 equal to zero:
Xd1 = x1;d1
Yd1 = x1;d1
Xd2 = x1;d2
Yd2 = x1;d2
1 + x2;d1
1 + x2;d1
1 + x2;d2
1 + x2;d2
2 = 1
2 = 1
2 = 1
2 = 1
1 + 1
1 + 1
1 + 0
1 + 0
2 '
2 '
2 '
2 '
2:86;
11:77;
1:10;
5:80
where xi;dj = 1 if term ti occurs in document dj , xi;dj = 0 otherwise.</p>
          <p>In Figure 1, the two points (Xd1 ; Yd1 ) and (Xd2 ; Yd2 ) are shown. The line
is a graphical help to indicate which point is ranked rst: the closer the point,
the higher the document in the rank. The justi cation of this statement is not
presented in this paper for space reasons, refer to [14] for further details. What
is important here is the possibility to assess the in uence of the parameter and
on the RSJ score. The objective is to study whether these two parameters can
drastically change the ranking of the documents or not. In graphical terms, if we
can \rotate" the points such that the closest to the line becomes the furthest.</p>
          <p>Moreover, there are some considerations we want to address:
{ when the number of terms in the query is small, it is very di cult to note
any change in the ranking list. Remember that with `n' query terms, we can
only have 2n points (or RSJ scores). In the event of a query constituted of a
single term, all the documents that contain that query term collapse in one
point.
{ the Okapi BM25 weight `scatters' the documents that are collapsed in one
point in the space by multiplying the RSJ score with a scaling factor f (tfi)
proportional to the frequency of the term in the document. Therefore, we
expect this Bayesian approach to be more e ective on the BM25 rather than
on the simple RSJ score.
3.2</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Visualization Tool</title>
          <p>The visualisation tool was designed and developed in R [15]. It consists of three
panels:</p>
          <p>Y</p>
          <p>{ View Panel : this displays the two-dimensional plot of the dataset according
to the choices of the user.
{ Interaction Panel : this allows for the interaction between the user and the
parameters of the probabilistic models.
{ Performance Panel : this displays the performance measures of the model.</p>
          <p>Figure 2 shows the main window with the three panels. In the centre-right,
there is the main view panel, the actual two-dimensional view of the documents
as points, blue and red for relevant and non-relevant, respectively. The green
line represents the ranking line, the closer the point the higher the rank in the
retrieval list. At the top and on the left, there is the interaction panel where
the user can choose di erent options: the type of the model (Bernoulli in our
case), the type of smoothing (conjugate prior), the value of the parameters
and . The bottom of the window is dedicated to the performance in terms of
classi cation (not used in this experiment).
Preliminary experiments were carried out on some topics of the TREC2001
Adhoc Web Track test collection.3 The content of each document was processed
during indexing except for the text contained inside the &lt;script&gt;&lt;/script&gt;
and the &lt;style&gt;&lt;/style&gt; tags. When parsing, the title of the document was
extracted and considered as the beginning of the document content. Stop words
were removed during indexing.4 For each topic we considered the set of
documents in the pool, therefore those for which explicit assessment are available.</p>
          <p>We considered two di erent experimental settings: (i) query-term based
representation and (ii) collection vocabulary-based representation of the documents.
In the former case, each document was represented by means of the descriptor
extracted from the title of the TREC topics, used as queries: therefore V
consisted of query terms; in the latter case V consisted of the entire collection
vocabulary | both settings did not consider stopwords as part of V .
3 http://trec.nist.gov/data/t10.web.html
4 The stop words list is that available
http://ir.dcs.gla.ac.uk/resources/linguistic utils/stop words
at
the
url</p>
          <p>In this paper, we report the experiments on topic 528. We selected this query
because it contains ve terms and it is easier to show the e ect of the
hyperparameters. In Figure 2, the cloud of points generated by the two-dimensional
approach is shown. Parameters and are set to the standard RSJ score
constant 0.5. The line corresponds to the decision line of a classi er, and it also
correspond to the `ranking' line: imagine this line spanning the plane from right
to left, each time the line touches a document, the document is added to the list
of retrieved documents.</p>
          <p>In Figure 3, the hyper-parameter was increased and was left equal to
0.5. When we increase , the probability p^i tends to one, and the e ect, in terms
of the two dimensional plot, is that points rotate anti-clockwise. In Figure 4,
the opposite e ect is obtained by increasing and leaving equal to 0.5. In
both situations, the list of ranked documents was signi cantly di erent from the
original list produced by using the classical RSJ score.
This paper presents a new direction for the study of the Okapi BM25 model. In
particular, we have focused on a full Bayesian approach for deriving a smoothed
formula that takes into account our a-priori knowledge on the probability of
terms. In fact, we think that many of the e orts in improving the BM25 were
done mostly in capturing the language model (frequencies, length, etc.) but
missed the fact that the 0.5 correction factor could be one of the parameters
that can be modelled in a better way.</p>
          <p>By starting from a slightly di erent approach, the classi cation of documents
into relevant and non relevant classes, we derived the exact same formula of the
RSJ weight but with more degrees of interaction. The two-dimensional
visualization approach helped in understanding why some of the constants factors can
be taken into account for the case of the classi cation and, more important, how
the hyper-parameters can be tuned to obtain a better ranking.</p>
          <p>After this preliminary experiment, we can draw some considerations: for the
rst time, it was possible to visualize the cluster of points that are generated by
the RSJ scores; it was clear that very short queries tend to create a very small
number of points making it hard to perform a good retrieval; hyper-parameters
do make a di erence in both classi cation and retrieval.</p>
          <p>There are still many open research questions we want to investigate in the
future:
{ so far, we have assumed that all the beta priors associated to each term
use exactly the same values for hyper-parameters and . A more selective
approach may be more e ective;
{ the coordinate of the points in the two-dimensional plot take into account
the two constants of Equation 17. In particular, the addend Pi2V log(1 p^i)
may be the cause of the `rotation' of the points, hence the radical change of
the ranking list;
{ The current approach assumes that the value of R and ri are known for
each term in the query: indeed these values are adopted to estimate the
coordinates of each document. A further research question is the e ect of
estimation based on feedback data on the capability of the probabilistic
visual data mining approach adopted in this paper.</p>
          <p>Acknowledgments. This work has been partially supported by the
QONTEXT project under grant agreement N. 247590 (FP7/2007-2013).
11. Di Nunzio, G., Sordoni, A.: How well do we know bernoulli? In: IIR. Volume 835
of CEUR Workshop Proceedings., CEUR-WS.org (2012) 38{44
12. Di Nunzio, G., Sordoni, A.: A visual tool for bayesian data analyisis: The impact of
smoothing on nave bayes text classi ers. In: Proceeding of the 35th International
ACM SIGIR 2012. Volume 1002., Portland, Oregon, USA (2012)
13. Cooper, W.S.: Some inconsistencies and misnomers in probabilistic information
retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference
on Research and development in information retrieval. SIGIR '91, New York, NY,
USA, ACM (1991) 57{61
14. Di Nunzio, G.: Using scatterplots to understand and improve probabilistic models
for text categorization and retrieval. Int. J. Approx. Reasoning 50 (2009) 945{956
15. Di Nunzio, G., Sordoni, A.: A Visual Data Mining Approach to Parameters
Optimization. In Zhao, Y., Cen, Y., eds.: Data Mining Applications in R. Elsevier
(2013, In Press)</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sparck Jones</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Relevance weighting of search terms</article-title>
          . In Willett, P., ed.:
          <article-title>Document retrieval systems</article-title>
          . Taylor Graham Publishing, London, UK, UK (
          <year>1988</year>
          )
          <volume>143</volume>
          {
          <fpage>160</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>The Probability Ranking Principle in IR</article-title>
          .
          <source>Journal of Documentation</source>
          <volume>33</volume>
          (
          <year>1977</year>
          )
          <volume>294</volume>
          {
          <fpage>304</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>A probabilistic model of information retrieval: development and comparative experiments</article-title>
          .
          <source>Inf. Process. Manage</source>
          .
          <volume>36</volume>
          (
          <year>2000</year>
          )
          <volume>779</volume>
          {
          <fpage>808</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Analysis of Binary Data</article-title>
          .
          <source>Monographs on Statistics and Applied Probability Series. Chapman &amp; Hall</source>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Understanding inverse document frequency: On theoretical arguments for idf</article-title>
          .
          <source>In: Journal of Documentation</source>
          . Volume
          <volume>60</volume>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Some simple e ective approximations to the 2-poisson model for probabilistic weighted retrieval</article-title>
          . In Croft, W.B.,
          <string-name>
            <surname>van Rijsbergen</surname>
          </string-name>
          , C.J., eds.: SIGIR, ACM/Springer (
          <year>1994</year>
          )
          <volume>232</volume>
          {
          <fpage>241</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hancock-Beaulieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatford</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Okapi at TREC-3</article-title>
          . In:
          <article-title>Proceedings of the Third Text REtrieval Conference (TREC), Gaithesburg</article-title>
          , USA (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On relevance weights with little relevance information</article-title>
          .
          <source>SIGIR Forum</source>
          <volume>31</volume>
          (
          <year>1997</year>
          )
          <volume>16</volume>
          {
          <fpage>24</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>The probabilistic relevance framework: Bm25 and beyond</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <volume>333</volume>
          {
          <fpage>389</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kruschke</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          :
          <article-title>Doing Bayesian Data Analysis: A Tutorial with R and BUGS. 1 edn</article-title>
          . Academic Press/Elsevier (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>