=Paper=
{{Paper
|id=None
|storemode=property
|title=The Bivariate 2-Poisson Model for IR
|pdfUrl=https://ceur-ws.org/Vol-964/paper2.pdf
|volume=Vol-964
|dblpUrl=https://dblp.org/rec/conf/iir/AmatiG13
}}
==The Bivariate 2-Poisson Model for IR==
The Bivariate 2-Poisson model for IR Giambattista Amati1 and Giorgio Gambosi2 1 Fondazione Ugo Bordoni, Rome, Italy gba@fub.it 2 Enterprise Engineering Department of University of Tor Vergata, Rome, Italy giorgio.gambosi@uniroma2.it 1 Introduction Harter’s 2-Poisson model of Information Retrieval is a univariate model of the raw term frequencies, that does not condition the probabilities on document length [2]. A bivariate stochastic model is thus introduced to extend Harter’s 2-Poisson model, by conditioning the term frequencies of the document to the document length. We assume Harter’s hypothesis: the higher the probability f (X = x|L = l) of the term frequency X = x is in a document of length l, the more relevant that document is. The new generalization of the 2-Poisson model has 5 parameters that are learned term by term through the EM algorithm over term frequencies data. We explore the following frameworks: – We assume that the observation hx, li is generated by a mixture of k Bivariate Poisson (k-BP) distributions (with k ≥ 2) with or without some conditions on the form for the marginal of the document length, that can reduce the complexity of the model. We here reduce for the sake of simplicity to k = 2. In the case of the 2-BP we also assume the hypothesis that the marginal distribution of l is a Poisson. The elite set is generated by the BP of the mixture with higher value for the mean of term frequencies, λ1 . – The covariate variable Z3 of length and term frequency λ3 could be learned from co- variance [3, page 103]. Instead, we here consider Z3 a latent random variable which is learned by extending the EM algorithm in a standard way. – Our plan is to compare the effectiveness of the bivariate 2-Poisson model with respect to standard models of IR, and in particular with some additional baselines that are obtained in our framework as follows: • applying the Double Poisson Model, which is the 2-BP with the marginal distribu- tions that are independent. • Reducing to the univariate case (standard 2-Poisson model) by normalizing the term frequency x to a smoothed value tfn. For example, we can use the Dirichlet smooth- ing: x + µ · p̂ 0 tfn = ·µ l+µ where µ and µ0 are parameters and p̂ is the term prior. 2 The Bivariate 2-Poisson distribution In order to define the bivariate 2-Poisson model we need first to remind the definition of a bivariate Poisson model, that can be introduced in several ways, for example as limit of a bivariate binomial, as a convolution of three univariate Poisson distributions, as a compound- ing of a Poisson with a bivariate binomial. We find that the trivariate reduction method of the convolution more convenient to easily extend Harter’s 2-Poisson model to the bivariate case. Let us consider the random variables Z1 , Z2 , Z3 distributed according to Poisson distribu- tions P (λi ), that is: λxi p(Zi = x|λi ) = e−λi x! and the random variables X = Z1 + Z3 e Y = Z2 + Z3 distributed according to a bivariate Poisson distribution, BP (Λ), where Λ = (λ1 , λ2 , λ3 ): min(x,y) i λx1 λy2 X x y λ3 p(X = x, Y = y|Λ) = e−(λ1 +λ2 +λ3 ) i! x! y! i=0 i i λ1 λ2 The corresponding marginal distributions turn out to be Poisson ∞ X p(X = x|Λ) = p(X = x, Y = y|Λ) = P(λ1 + λ3 ) y=0 X∞ p(Y = y|Λ) = p(X = x, Y = y|Λ) = P(λ2 + λ3 ) x=0 with covariance Cov(X, Y ) = λ3 . Let us now consider the mixture 2BP (Λ1 , Λ2 , α), where Λ1 = (λ11 , λ12 , λ13 ) and Λ2 = (λ21 , λ22 , λ23 ), of two bivariate Poisson distributions p(x, y|Λ1 , Λ2 , α) = α·BP(Λ1 ) + (1 − α) · BP(Λ2 ) The corresponding marginal distributions are 2-Poisson p(x|Λ1 , Λ2 , α) = α · P(λ11 + λ13 ) + (1 − α) · P(λ21 + λ23 ) = 2P(λ11 + λ13 , λ21 + λ23 , α) p(y|Λ1 , Λ2 , α) = α · P(λ12 + λ13 ) + (1 − α) · P(λ22 + λ23 ) = 2P(λ12 + λ13 , λ22 + λ23 , α) In our case, we consider the random variables x, number of occurrences of the term in the document, and L− = l − x, document length out of the term occurrences, and set X = x and Y = L− = l − x (hence, Y could possibly be 0): as a consequence, we have x = X = Z1 + Z3 , L− = Y = Z2 + Z3 , and l = X + Y = Z1 + Z2 + 2Z3 . Moreover, we want x to be distributed as a 2-Poisson and L− to be distributed as a Poisson. By assuming λ12 = λ22 = λ2 and λ13 = λ23 = λ3 we obtain p(x|Λ1 , Λ2 , α) = α · P(λ11 + λ3 ) + (1 − α)·P(λ21 + λ3 ) = 2P(λ11 + λ3 , λ21 + λ3 ) p(L− |Λ1 , Λ2 , α) = α · P(λ2 + λ3 ) + (1 − α)·P(λ2 + λ3 ) = P(λ2 + λ3 ) This implies that, apart from α, we assume five latent variables in the model, Z11 , Z12 , Z2 , Z3 , W each Z is Poisson distributed with parameters λ11 , λ21 , λ2 , λ3 respectively and W is a binary random variable Bernoulli distributed with parameter α. The resulting bivariate distribution is p(x, L− |Λ1 , Λ2 , α) = α · p1 (x, L− |λ1 , λ2 , λ3 ) + (1 − α) · p2 (x, L− |λ21 , λ2 , λ3 ) = α·BP(λ11 , λ2 , λ3 ) + (1 − α) · BP (λ21 , λ2 , λ3 ) 3 EM algorithm for the Bivariate Poisson Given a set of observations X = {x1 , . . . , xn }, with xi = (xi , li ), we wish to apply maxi- mum likelihood to estimate the set of parameters Λ of a bivariate Poisson distribution p(x|Θ) fitting such data. We wish to derive the value of Θ by maximizing the log-likelihood, that is computing n Y ∗ Θ = arg max log L(Θ|X ) = arg max log p(xi |Θ) Θ Θ i=1 In our case (see also [1]), we are interested to a mixture of 2 Bivariate Poisson with latent variables Z11 , Z12 , Z2 , Z3 , since with respect to the general case we have now Z21 = Z22 = Z2 and Z31 = Z32 = Z3 . Then, for each observed pair of values xi = (xi , li ), wi = 1 if xi is generated by the first component, and wi = 2 if generated by the second one. Accordingly: 1 1 2 zi1 + zi3 if wi = 1 z i = (zi1 , zi1 , zi2 , zi3 ) are such that xi = 2 zi1 + zi3 if wi = 2 and li = zi2 + zi3 EM algorithm requires, in our case, to consider the complete dataset (X , Z) = {(x1 , z 1 , w1 ) , . . . , (xn , z n , wn )} and the set of parameters is Θ = Λ1 ∪ Λ2 ∪ {α}, with Λk = λk1 , λ2 , λ3 . Let also Λ = Λ1 ∪ Λ2 . 3.1 Maximization Let us consider the k-th M-step for Θ. We can show the following estimates: n 1 X (k−1) (k−1) α(k−1) p(xi |Λk−1 ) α(k) = pi where pi = (k−1) 1 (k−1) n i=1 α (k−1) p(xi |Λ ) + (1 − α (k−1) )p(xi |Λ ) 1 2 and p is the Bivariate Poisson with parameters Λi , and Pn 1 (k−1) (k−1) Pn 2 (k−1) (k−1) 1 (k) i=1 b1i pi 2 (k) i=1 b1i (1 − pi ) λ1 = Pn (k−1) λ 1 = P n (k−1) i=1 pi i=1 (1 − pi ) n n (k) 1 X (k−1) (k) 1 X (k−1) λ2 = b2i λ3 = b3i n i=1 n i=1 (k) where bjhi = E[Zhj |W = j, xi , Λ(k) ] and bhi (k) = E[Zh |xi , Λ(k) ] with h = 1, 2, 3. 3.2 Expectation (k) We can show that the expectations bj1i and bhi (k) are: min (xi ,li ) (k) (k) X b3i (k) = r · p(Z3 = r|xi , Λ) where Λ(k) = Λ1 ∪ Λ2 r=0 min (xi ,li ) X (1 − α)p(Z3 = r, xi |W = 2, Λ(k) ) + αp(Z3 = r, xi |W = 1, Λ(k) ) = r r=0 p(xi |Λ(k) ) (k) (k) b11i = E[X1 |W = 1, xi ] − E[Z3 |W = 1, xi ] = xi − b13i (k) (k) b21i = E[X|W = 2, xi , Λ(k) ] − E[Z3 |W = 2, xi , Λ(k) ] = xi − b23i b2i (k) = E[Y |xi , Λ(k) ] − E[Z3 |xi , Λ(k) ] = li − b3i (k) where (k) p(Z3 = r, xi |W = j, Λ(k) ) = P0 (r|λ3 (k) ) · P0 (x − r|λj1 ) · P0 (l − r|λ2 (k) ) and P0 is the univariate Poisson, p(xi |Λ(k) ) is the mixture of the bivariate Poisson. Efficient implementation of the bivariate Poisson through recursion can be found in [4]. 4 Conclusions We have implemented the EM algorithm for the univariate 2-Poisson and we are currently extending the implementation to the bivariate case. The implementation will be soon available together with the results of the experimentation at the web site http://tinyurl.com/cfcm8ma. References 1. B RIJS , T., K ARLIS , D., S WINNEN , G., VANHOOF, K., W ETS , G., AND M ANCHANDA , P. A multivariate poisson mixture model for marketing applications. Statistica Neerlandica 58, 3 (2004), 322–348. 2. H ARTER , S. P. A probabilistic approach to automatic keyword indexing. part I: On the distribution of specialty words words in a technical literature. Journal of the ASIS 26 (1975), 197–216. 3. KOCHERLAKOTA , S., AND KOCHERLAKOTA , K. Bivariate discrete distributions. Marcel Dekker Inc., New York, 1992. 4. T SIAMYRTZIS , P., AND K ARLIS , D. Strategies for efficient computation of multivariate poisson probabilities. Communications in Statistics-Simulation and Computation 33, 2 (2004), 271–292.