=Paper= {{Paper |id=None |storemode=property |title=The Bivariate 2-Poisson Model for IR |pdfUrl=https://ceur-ws.org/Vol-964/paper2.pdf |volume=Vol-964 |dblpUrl=https://dblp.org/rec/conf/iir/AmatiG13 }} ==The Bivariate 2-Poisson Model for IR== https://ceur-ws.org/Vol-964/paper2.pdf
                     The Bivariate 2-Poisson model for IR

                            Giambattista Amati1 and Giorgio Gambosi2
                         1
                           Fondazione Ugo Bordoni, Rome, Italy gba@fub.it
            2
                Enterprise Engineering Department of University of Tor Vergata, Rome, Italy
                                giorgio.gambosi@uniroma2.it




1     Introduction

Harter’s 2-Poisson model of Information Retrieval is a univariate model of the raw term
frequencies, that does not condition the probabilities on document length [2]. A bivariate
stochastic model is thus introduced to extend Harter’s 2-Poisson model, by conditioning the
term frequencies of the document to the document length. We assume Harter’s hypothesis:
the higher the probability f (X = x|L = l) of the term frequency X = x is in a document
of length l, the more relevant that document is. The new generalization of the 2-Poisson
model has 5 parameters that are learned term by term through the EM algorithm over term
frequencies data.
We explore the following frameworks:

    – We assume that the observation hx, li is generated by a mixture of k Bivariate Poisson
      (k-BP) distributions (with k ≥ 2) with or without some conditions on the form for the
      marginal of the document length, that can reduce the complexity of the model. We here
      reduce for the sake of simplicity to k = 2. In the case of the 2-BP we also assume the
      hypothesis that the marginal distribution of l is a Poisson. The elite set is generated by
      the BP of the mixture with higher value for the mean of term frequencies, λ1 .
    – The covariate variable Z3 of length and term frequency λ3 could be learned from co-
      variance [3, page 103]. Instead, we here consider Z3 a latent random variable which is
      learned by extending the EM algorithm in a standard way.
    – Our plan is to compare the effectiveness of the bivariate 2-Poisson model with respect to
      standard models of IR, and in particular with some additional baselines that are obtained
      in our framework as follows:
        • applying the Double Poisson Model, which is the 2-BP with the marginal distribu-
          tions that are independent.
        • Reducing to the univariate case (standard 2-Poisson model) by normalizing the term
          frequency x to a smoothed value tfn. For example, we can use the Dirichlet smooth-
          ing:

                        x + µ · p̂ 0
                tfn =             ·µ
                          l+µ

          where µ and µ0 are parameters and p̂ is the term prior.
2   The Bivariate 2-Poisson distribution

In order to define the bivariate 2-Poisson model we need first to remind the definition of
a bivariate Poisson model, that can be introduced in several ways, for example as limit of a
bivariate binomial, as a convolution of three univariate Poisson distributions, as a compound-
ing of a Poisson with a bivariate binomial. We find that the trivariate reduction method of the
convolution more convenient to easily extend Harter’s 2-Poisson model to the bivariate case.
Let us consider the random variables Z1 , Z2 , Z3 distributed according to Poisson distribu-
tions P (λi ), that is:
                            λxi
    p(Zi = x|λi ) = e−λi
                            x!
and the random variables X = Z1 + Z3 e Y = Z2 + Z3 distributed according to a bivariate
Poisson distribution, BP (Λ), where Λ = (λ1 , λ2 , λ3 ):
                                                       min(x,y)                     i
                                             λx1 λy2
                                                                          
                                                         X          x     y     λ3
    p(X = x, Y = y|Λ) = e−(λ1 +λ2 +λ3 )                                     i!
                                             x! y!       i=0
                                                                    i     i    λ1 λ2

The corresponding marginal distributions turn out to be Poisson
                     ∞
                     X
    p(X = x|Λ) =           p(X = x, Y = y|Λ) = P(λ1 + λ3 )
                     y=0
                     X∞
    p(Y = y|Λ) =           p(X = x, Y = y|Λ) = P(λ2 + λ3 )
                     x=0

with covariance Cov(X, Y ) = λ3 .
Let us now consider the mixture 2BP (Λ1 , Λ2 , α), where Λ1 = (λ11 , λ12 , λ13 ) and Λ2 =
(λ21 , λ22 , λ23 ), of two bivariate Poisson distributions
    p(x, y|Λ1 , Λ2 , α) = α·BP(Λ1 ) + (1 − α) · BP(Λ2 )
The corresponding marginal distributions are 2-Poisson
    p(x|Λ1 , Λ2 , α) = α · P(λ11 + λ13 ) + (1 − α) · P(λ21 + λ23 ) = 2P(λ11 + λ13 , λ21 + λ23 , α)
    p(y|Λ1 , Λ2 , α) = α · P(λ12 + λ13 ) + (1 − α) · P(λ22 + λ23 ) = 2P(λ12 + λ13 , λ22 + λ23 , α)
In our case, we consider the random variables x, number of occurrences of the term in the
document, and L− = l − x, document length out of the term occurrences, and set X = x
and Y = L− = l − x (hence, Y could possibly be 0): as a consequence, we have x = X =
Z1 + Z3 , L− = Y = Z2 + Z3 , and l = X + Y = Z1 + Z2 + 2Z3 .
Moreover, we want x to be distributed as a 2-Poisson and L− to be distributed as a Poisson.
By assuming λ12 = λ22 = λ2 and λ13 = λ23 = λ3 we obtain
    p(x|Λ1 , Λ2 , α) = α · P(λ11 + λ3 ) + (1 − α)·P(λ21 + λ3 ) = 2P(λ11 + λ3 , λ21 + λ3 )
    p(L− |Λ1 , Λ2 , α) = α · P(λ2 + λ3 ) + (1 − α)·P(λ2 + λ3 ) = P(λ2 + λ3 )
This implies that, apart from α, we assume five latent variables in the model, Z11 , Z12 , Z2 , Z3 , W
each Z is Poisson distributed with parameters λ11 , λ21 , λ2 , λ3 respectively and W is a binary
random variable Bernoulli distributed with parameter α. The resulting bivariate distribution
is
      p(x, L− |Λ1 , Λ2 , α) = α · p1 (x, L− |λ1 , λ2 , λ3 ) + (1 − α) · p2 (x, L− |λ21 , λ2 , λ3 )
                              = α·BP(λ11 , λ2 , λ3 ) + (1 − α) · BP (λ21 , λ2 , λ3 )


3     EM algorithm for the Bivariate Poisson
Given a set of observations X = {x1 , . . . , xn }, with xi = (xi , li ), we wish to apply maxi-
mum likelihood to estimate the set of parameters Λ of a bivariate Poisson distribution p(x|Θ)
fitting such data. We wish to derive the value of Θ by maximizing the log-likelihood, that is
computing
                                                           n
                                                           Y
        ∗
      Θ = arg max log L(Θ|X ) = arg max log                      p(xi |Θ)
                    Θ                           Θ          i=1

In our case (see also [1]), we are interested to a mixture of 2 Bivariate Poisson with latent
variables Z11 , Z12 , Z2 , Z3 , since with respect to the general case we have now Z21 = Z22 = Z2
and Z31 = Z32 = Z3 . Then, for each observed pair of values xi = (xi , li ), wi = 1 if xi is
generated by the first component, and wi = 2 if generated by the second one. Accordingly:
                                                        1
             1     2                                    zi1 + zi3 if wi = 1
    z i = (zi1  , zi1 , zi2 , zi3 ) are such that xi =    2
                                                        zi1 + zi3 if wi = 2
                                              and li = zi2 + zi3
EM algorithm requires, in our case, to consider the complete dataset
      (X , Z) = {(x1 , z 1 , w1 ) , . . . , (xn , z n , wn )}
                                                         
and the set of parameters is Θ = Λ1 ∪ Λ2 ∪ {α}, with Λk = λk1 , λ2 , λ3 . Let also Λ =
Λ1 ∪ Λ2 .

3.1    Maximization
Let us consider the k-th M-step for Θ. We can show the following estimates:
                n
            1 X (k−1)        (k−1)                     α(k−1) p(xi |Λk−1 )
 α(k) =           pi  where pi     =                 (k−1)
                                                                     1
                                                                                    (k−1)
            n i=1                    α (k−1) p(xi |Λ       ) + (1 − α (k−1) )p(xi |Λ      )
                                                                   1                             2

and p is the Bivariate Poisson with parameters Λi , and
              Pn 1 (k−1) (k−1)                           Pn 2 (k−1)           (k−1)
     1 (k)      i=1 b1i      pi                  2 (k)     i=1 b1i      (1 − pi     )
    λ1 =          Pn      (k−1)
                                               λ 1     =     P  n         (k−1)
                     i=1 pi                                     i=1 (1 − pi     )
                 n                                          n
       (k)    1 X (k−1)                            (k)   1 X (k−1)
     λ2 =           b2i                         λ3 =           b3i
              n i=1                                      n i=1
                (k)
where bjhi            = E[Zhj |W = j, xi , Λ(k) ] and bhi (k) = E[Zh |xi , Λ(k) ] with h = 1, 2, 3.


3.2    Expectation
                                                    (k)
We can show that the expectations bj1i                    and bhi (k) are:
                      min (xi ,li )
                                                                             (k)         (k)
                         X
      b3i (k) =                       r · p(Z3 = r|xi , Λ) where Λ(k) = Λ1 ∪ Λ2
                         r=0
             min (xi ,li )
                   X             (1 − α)p(Z3 = r, xi |W = 2, Λ(k) ) + αp(Z3 = r, xi |W = 1, Λ(k) )
       =                     r
                   r=0
                                                           p(xi |Λ(k) )

             (k)                                                                   (k)
      b11i         = E[X1 |W = 1, xi ] − E[Z3 |W = 1, xi ] = xi − b13i
             (k)                                                                                (k)
      b21i         = E[X|W = 2, xi , Λ(k) ] − E[Z3 |W = 2, xi , Λ(k) ] = xi − b23i
      b2i (k) = E[Y |xi , Λ(k) ] − E[Z3 |xi , Λ(k) ] = li − b3i (k)
where
                                                                               (k)
      p(Z3 = r, xi |W = j, Λ(k) ) = P0 (r|λ3 (k) ) · P0 (x − r|λj1                   ) · P0 (l − r|λ2 (k) )

and P0 is the univariate Poisson, p(xi |Λ(k) ) is the mixture of the bivariate Poisson. Efficient
implementation of the bivariate Poisson through recursion can be found in [4].


4     Conclusions

We have implemented the EM algorithm for the univariate 2-Poisson and we are currently
extending the implementation to the bivariate case.
The implementation will be soon available together with the results of the experimentation
at the web site http://tinyurl.com/cfcm8ma.


References
1. B RIJS , T., K ARLIS , D., S WINNEN , G., VANHOOF, K., W ETS , G., AND M ANCHANDA , P. A
   multivariate poisson mixture model for marketing applications. Statistica Neerlandica 58, 3 (2004),
   322–348.
2. H ARTER , S. P. A probabilistic approach to automatic keyword indexing. part I: On the distribution
   of specialty words words in a technical literature. Journal of the ASIS 26 (1975), 197–216.
3. KOCHERLAKOTA , S., AND KOCHERLAKOTA , K. Bivariate discrete distributions. Marcel Dekker
   Inc., New York, 1992.
4. T SIAMYRTZIS , P., AND K ARLIS , D. Strategies for efficient computation of multivariate poisson
   probabilities. Communications in Statistics-Simulation and Computation 33, 2 (2004), 271–292.