Recommendation from intransitive pairwise comparisons Marta Crispino Elja Arjas DEC, Bocconi University University of Helsinki and Milano, Italy OCBE, University of Oslo marta.crispino@phd.unibocconi.it earjas@mappi.helsinki.fi Valeria Vitelli Arnoldo Frigessi OCBE, University of Oslo OCBE, University of Oslo Oslo, Norway Oslo, Norway valeria.vitelli@medisin.uio.no arnoldo.frigessi@medisin.uio.no ABSTRACT preferences. The estimated individual rankings can be used In this paper we propose a full Bayesian probabilistic method for personalized recommendation. Inference is based on an to learn preferences from non-transitive pairwise comparison adapted version of the Markov Chain Monte Carlo algorithm data. Such lack of transitivity easily arises when the number in [3]. of pairwise comparisons is large, and they are given sequen- Previous work on pairwise preferences includes the Bradley tially without allowing for consistency check. We develop a Terry model [1] (but no personalized preferences are esti- Bayesian Mallows model able to handle such data through mated) and, within the framework of Mallows model, [2] a latent layer of uncertainty which captures the generation and [3], where transitivity is assumed. of preference misreporting. We then construct an MCMC algorithm, and test the procedure on simulated data. 2. METHODS Let N users independently express their preferences be- tween pairs of items in O = {O1 , ..., On }. We assume that 1. INTRODUCTION each user j receives a possibly different subset Cj = {Cj,1 , ..., We consider pairwise preference data of the form “x is pre- Cj,Mj } of pairs of dimension Mj ≤ n(n − 1)/2. Let Bj = ferred to y”, denoted x ≺ y, where x and y are some items {Bj,1 , ..., Bj,Mj } be the set of preferences given by user j, of interest. The challenge with this kind of data, is that where Bj,m is the order she gives to the pair Cj,m . For ex- pairwise preferences are not always transitive. For instance ample Bj,m = {O1 ≺ O2 } means that O1 is preferred to the data coming from a single user may contain preferences O2 if Cj,m = (O1 , O2 ). The considered data are incomplete of the form {x ≺ y , y ≺ z, z ≺ x}. Such non-transitivity since not all items are observed by each user. We assume no of preferences arises for many reasons, including the user’s ties in the data: users are forced to express their preference inattentiveness, actual ambiguity in the user’s preference, for all pairs in the list Cj assigned to them, and indifference multiple users with the same account and varying framing is not permitted. of the data collection. These situations are so common that The main assumption is that each user j has a personal most pairwise comparison data are in fact non-transitive, latent ranking, R̃j = (R̃j1 , ..., R̃jn ) ∈ Pn (the space of thus creating the need for methods able to learn users’ pref- n-dimensional permutations), distributed according to the erences from data that lack logical transitivity. Mallows model In this paper we assume that non-transitive data arise be- cause users make mistakes, i.e. switch the order between exp{−(α/n)d(R̃j , ρ)}1Pn (R̃j ) π(R̃j | α, ρ) = ∀j . two compared items, either simply by mistake or because Zn (α) the items compared have a rather similar rank for the user. We assume conditional independence between R̃1 , ..., R̃N The developed Bayesian methodology provides the poste- given ρ and α. Here ρ ∈ Pn is the shared consensus ranking, rior distribution of the consensus ranking of a homogeneous α > 0 is the scale parameter (describing the concentration group of users, as well as of the estimated individual rank- around the shared consensus), d(·, ·) : Pn × Pn → [0, ∞) is ings for each user. The consensus ranking can be seen as any right-invariant distance function (e.g. Kendall, Spear- a compromise which is formed from the individual pairwise man or footrule), and Zn (α) is the normalizing constant. See [3] for more details. We model the situation where each user j, when announc- ing her pairwise preferences, mentally checks where the items under comparison are in her latent ranking R̃j . Then, if the user is consistent with R̃j , the pairwise orderings in Bj are induced by R̃j according to the rule: (Ok ≺ Oi ) ⇐⇒ R̃jk < R̃ji . In this case Bj contains only transitive prefer- c 2016 Copyright held by the authors. ences. However, if the user is not fully consistent with her RecSys 2016 Poster Proceedings, Boston, MA, USA, 15th-19th September 2016. own latent ranking, the pairwise orderings in Bj can be non- transitive. In order to deal with this situation, we propose a probabilistic model based on the assumption that non- transitivities are due to mistakes in deriving the pair order from the latent raking R̃j . The likelihood assumed for a set of preferences Bj is X π( Bj |α, ρ) = π(Bj |R̃j = Rj )π(R̃j = Rj | α, ρ), Rj ∈Pn where π(Bj |R̃j = Rj ) is the probability of ordering the pairs in Cj as in Bj (possibly generating non-transitivities), when the latent ranking for user j is Rj . It is therefore the prob- ability of making mistakes instead of just following Rj . The posterior density is then:   YN X π(α, ρ| B1:N ) ∝ π(α)π(ρ)  π(Bj |Rj )π(Rj | α, ρ) , j=1 Rj ∈Pn where we assume a gamma prior, π(α), for α and a uniform Figure 1: Posterior CDFs of df (ρ, ρTRUE ) for varying prior on Pn , π(ρ), for ρ. We suggest two models for the parameters. probability of making a mistake: the Bernoulli model (BM) to handle random mistakes, and the logistic model (LM) for the number of users N increases (Figure 1, top right), be- mistakes that are due to difficulty in ordering similar items. comes worse as the probability of doing mistakes θ increases In both models we assume that any two pair comparisons (top left), improves as the dispersion of the individual latent made by a user are conditionally independent given her la- rankings R̃j,TRUE around ρTRUE decreases, i.e. when α in- tent ranking: (Bj,m1 ⊥ ⊥ Bj,m2 ) | R̃j , ∀m1 , m2 = 1, ...Mj . In creases, (bottom left), and becomes worse when the number BM we assume the following Bernoulli type model for the of pairwise comparisons diminishes, i.e. when λM decreases, probability of making a mistake: (bottom right). The new method performs generally well P(Bj,m = −B̃j,m (R̃j ) | θ, R̃j ) = θ , θ ∈ [0, 0.5) , also if the number of pair comparisons per user is around where −B̃j,m (R̃j ) is the reversed preference order, i.e. a one third of all possible pair comparisons (λM = 15). We mistake. We assign to θ the truncated Beta distribution on then studied the precision of the estimated individual pref- the interval [0, 0.5) as prior. In LM we assume the following erences by comparing R̃j with R̃j,TRUE in terms of top−3 logistic type model for the probability of making a mistake: detection. For each user j, we found the triplet of items    D3j = {Oi1 , Oi2 Oi3 } with the maximal estimated posterior logit P Bj,m = −B̃j,m (R̃j ) R̃j , β0 , β1 = β0 + β1 dR̃j ,m , probability of being ranked jointly among the top−3 items. Let H5j be the set of 5 items with the 5 highest ranks in where dR̃j ,m is the footrule or l1 distance of the individ- R̃j,TRUE . We computed the percentage of users for which ual ranks of the items compared: if Bj,m = (O1 ≺ O2 ), D3j ⊂ H5j . This success-percentage was 60% - 85% when the dR̃j ,m = |R̃j1 − R̃j2 |. We assign to β1 an exponential prior data were generated with α = 2 and Mj = 15 ∀j, which on the negative support1 and to β0 , conditioned on β1 , an is a hard problem. For easier experiments, with larger α exponential prior on the shifted support [−∞, −β1 ]2 . These (ranging from 3 to 4) and larger M (up to 30), the success- choices are motivated by the fact that we want to model a percentages increased to 90%-100%. negative dependence between the distance of the items and the probability of making a mistake. Also, we want to force 4. CONCLUSIONS the probability of making a mistake when the items have We extended the Bayesian method for learning preferences ranks differing by 1 to be less than 0.5. in [3] to non-transitive pairwise comparison data. We pro- duce estimates of the consensus ranking of all items under 3. EXPERIMENTAL RESULTS considerations, as well as a personal estimated ranking of We performed a number of experiments on simulated data, all items for each user. This can be directly used for per- generated according to the model in Section 2 with BM sonalized recommendations. We also obtain posterior distri- noise, a fixed ρTRUE and n = 10 items. For various val- butions for these preferences, allowing quantification of the ues of α we sampled latent rankings R̃j,TRUE from the Mal- uncertainty of recommendations of interest. lows density centered at ρTRUE , using which we generated the individual non-transitive pair comparisons. In order 5. REFERENCES to assess the performance of our methods, in Figure 1 we [1] F. Caron, A. Doucet, “Efficient Bayesian Inference for plotted the posterior CDF of the footrule distance of the generalized Bradley-Terry Models”, Journal of estimated consensus P ranking and the true generating one, Computational and Graphical Statistics, 21(1), 2012. df (ρ, ρTRUE ) = ni=1 |ρi − ρi,TRUE |, for varying parameters [2] T. Lu and C. Boutilier, “Effective sampling and N , α, θ and λM (the parameter of a Poisson from which the learning for Mallows models with pairwise preference number of comparisons assigned to each user is sampled). data”, Journal of Machine Learning Research, 2014. As expected, the performance of the method improves as [3] V. Vitelli, Ø. Sørensen, A. Frigessi and E. Arjas, 1 π(β1 ) = λ1 eλ1 β1 1(β1 < 0), λ1 > 0. “Probabilistic preference learning with the Mallows 2 π(β0 |β1 ) = λ0 eλ0 (β0 +β1 ) 1(β0 < −β1 ), λ0 > 0. rank model”, ArXiv e-prints, 2016.