1. INTRODUCTION

Recommendation from intransitive pairwise comparisons

Marta Crispino

marta.crispino@phd.unibocconi.it 0

Valeria Vitelli

valeria.vitelli@medisin.uio.no 1

Elja Arjas

earjas@mappi.helsinki earjas@mappi.helsinki.fi 2

Arnoldo Frigessi

arnoldo.frigessi@medisin.uio.no 1 0 DEC, Bocconi University , Milano , Italy 1 OCBE, University of Oslo , Oslo , Norway 2 University of Helsinki and, OCBE, University of Oslo

2016

In this paper we propose a full Bayesian probabilistic method to learn preferences from non-transitive pairwise comparison data. Such lack of transitivity easily arises when the number of pairwise comparisons is large, and they are given sequentially without allowing for consistency check. We develop a Bayesian Mallows model able to handle such data through a latent layer of uncertainty which captures the generation of preference misreporting. We then construct an MCMC algorithm, and test the procedure on simulated data.

1. INTRODUCTION

We consider pairwise preference data of the form \x is preferred to y", denoted x y, where x and y are some items of interest. The challenge with this kind of data, is that pairwise preferences are not always transitive. For instance the data coming from a single user may contain preferences of the form fx y ; y z; z xg. Such non-transitivity of preferences arises for many reasons, including the user's inattentiveness, actual ambiguity in the user's preference, multiple users with the same account and varying framing of the data collection. These situations are so common that most pairwise comparison data are in fact non-transitive, thus creating the need for methods able to learn users' preferences from data that lack logical transitivity.

In this paper we assume that non-transitive data arise because users make mistakes, i.e. switch the order between two compared items, either simply by mistake or because the items compared have a rather similar rank for the user. The developed Bayesian methodology provides the posterior distribution of the consensus ranking of a homogeneous group of users, as well as of the estimated individual rankings for each user. The consensus ranking can be seen as a compromise which is formed from the individual pairwise c 2016 Copyright held by the authors. preferences. The estimated individual rankings can be used for personalized recommendation. Inference is based on an adapted version of the Markov Chain Monte Carlo algorithm in [ 3 ].

Previous work on pairwise preferences includes the Bradley Terry model [ 1 ] (but no personalized preferences are estimated) and, within the framework of Mallows model, [ 2 ] and [ 3 ], where transitivity is assumed. 2.

METHODS

Let N users independently express their preferences between pairs of items in O = fO1; :::; Ong. We assume that each user j receives a possibly di erent subset Cj = fCj;1; :::; Cj;Mj g of pairs of dimension Mj n(n 1)=2. Let Bj = fBj;1; :::; Bj;Mj g be the set of preferences given by user j, where Bj;m is the order she gives to the pair Cj;m. For example Bj;m = fO1 O2g means that O1 is preferred to O2 if Cj;m = (O1; O2). The considered data are incomplete since not all items are observed by each user. We assume no ties in the data: users are forced to express their preference for all pairs in the list Cj assigned to them, and indi erence is not permitted.

The main assumption is that each user j has a personal latent ranking, R~ j = (R~j1; :::; R~jn) 2 Pn (the space of n-dimensional permutations), distributed according to the Mallows model (R~ j j ; ) = expf ( =n)d(R~ j ; )g1Pn ( R~j )

Zn( ) We assume conditional independence between R~ 1; :::; R~N given and . Here 2 Pn is the shared consensus ranking, > 0 is the scale parameter (describing the concentration around the shared consensus), d( ; ) : Pn Pn ! [0; 1) is any right-invariant distance function (e.g. Kendall, Spearman or footrule), and Zn( ) is the normalizing constant. See [ 3 ] for more details.

We model the situation where each user j, when announcing her pairwise preferences, mentally checks where the items under comparison are in her latent ranking R~j . Then, if the user is consistent with R~ j , the pairwise orderings in Bj are induced by R~ j according to the rule: (Ok Oi) () R~jk < R~ji. In this case Bj contains only transitive preferences. However, if the user is not fully consistent with her own latent ranking, the pairwise orderings in Bj can be nonRj2Pn ( Bj j ; ) = transitive. In order to deal with this situation, we propose a probabilistic model based on the assumption that nontransitivities are due to mistakes in deriving the pair order from the latent raking R~ j . The likelihood assumed for a set of preferences Bj is

(Bj j R~j = Rj ) ( R~j = Rj j ; ); where (Bj j R~j = Rj ) is the probability of ordering the pairs in Cj as in Bj (possibly generating non-transitivities), when the latent ranking for user j is Rj . It is therefore the probability of making mistakes instead of just following Rj . The posterior density is then: 3 (Bj jRj ) (Rj j ; )5 ; where we assume a gamma prior, ( ), for and a uniform prior on Pn, ( ), for . We suggest two models for the probability of making a mistake: the Bernoulli model (BM) to handle random mistakes, and the logistic model (LM) for mistakes that are due to di culty in ordering similar items. In both models we assume that any two pair comparisons made by a user are conditionally independent given her latent ranking: (Bj;m1 ?? Bj;m2 ) j R~j , 8m1; m2 = 1; :::Mj . In BM we assume the following Bernoulli type model for the probability of making a mistake:

P(Bj;m =

B~j;m( R~j ) j ; R~j ) = ; 2 [0; 0:5) ; where B~j;m( R~j ) is the reversed preference order, i.e. a mistake. We assign to the truncated Beta distribution on the interval [0; 0:5) as prior. In LM we assume the following logistic type model for the probability of making a mistake: logit P Bj;m =

B~j;m( R~j ) R~j ; 0; 1 = 0 + 1d R~j;m; where d R~j;m is the footrule or l1 distance of the individual ranks of the items compared: if Bj;m = (O1 O2), d R~j;m = jR~j1 R~j2j. We assign to 1 an exponential prior on the negative support1 and to 0, conditioned on 1, an exponential prior on the shifted support [ 1; 1]2. These choices are motivated by the fact that we want to model a negative dependence between the distance of the items and the probability of making a mistake. Also, we want to force the probability of making a mistake when the items have ranks di ering by 1 to be less than 0.5.

EXPERIMENTAL RESULTS

We performed a number of experiments on simulated data, generated according to the model in Section 2 with BM noise, a xed TRUE and n = 10 items. For various values of we sampled latent rankings R~ j;TRUE from the Mallows density centered at TRUE, using which we generated the individual non-transitive pair comparisons. In order to assess the performance of our methods, in Figure 1 we plotted the posterior CDF of the footrule distance of the estimated consensus ranking and the true generating one, df ( ; TRUE) = Pin=1 j i i;TRUEj, for varying parameters N , , and M (the parameter of a Poisson from which the number of comparisons assigned to each user is sampled). As expected, the performance of the method improves as 1 ( 1) = 1e 1 1 1( 1 < 0), 1 > 0. 2 ( 0j 1) = 0e 0( 0+ 1)1( 0 < 1), 0 > 0. the number of users N increases (Figure 1, top right), becomes worse as the probability of doing mistakes increases (top left), improves as the dispersion of the individual latent rankings R~ j;TRUE around TRUE decreases, i.e. when increases, (bottom left), and becomes worse when the number of pairwise comparisons diminishes, i.e. when M decreases, (bottom right). The new method performs generally well also if the number of pair comparisons per user is around one third of all possible pair comparisons ( M = 15). We then studied the precision of the estimated individual preferences by comparing R~j with R~ j;TRUE in terms of top 3 detection. For each user j, we found the triplet of items D3j = fOi1 ; Oi2 Oi3 g with the maximal estimated posterior probability of being ranked jointly among the top 3 items. Let H5j be the set of 5 items with the 5 highest ranks in R~j;TRUE. We computed the percentage of users for which D3j H5j . This success-percentage was 60% - 85% when the data were generated with = 2 and Mj = 15 8j, which is a hard problem. For easier experiments, with larger (ranging from 3 to 4) and larger M (up to 30), the successpercentages increased to 90%-100%.

CONCLUSIONS

We extended the Bayesian method for learning preferences in [ 3 ] to non-transitive pairwise comparison data. We produce estimates of the consensus ranking of all items under considerations, as well as a personal estimated ranking of all items for each user. This can be directly used for personalized recommendations. We also obtain posterior distributions for these preferences, allowing quanti cation of the uncertainty of recommendations of interest.

[1]

Caron , A . Doucet, \ E cient Bayesian Inference for generalized Bradley-Terry Models" , Journal of Computational and Graphical Statistics , 21 ( 1 ), 2012 .

[2]

Lu and

Boutilier , \ E ective sampling and learning for Mallows models with pairwise preference data" , Journal of Machine Learning Research , 2014 .

[3]

Vitelli , . S rensen,

Frigessi and E. Arjas, \ Probabilistic preference learning with the Mallows rank model" , ArXiv e-prints, 2016 .