Recommendation from intransitive pairwise comparisons

                                  Marta Crispino                                                      Elja Arjas
                             DEC, Bocconi University                                        University of Helsinki and
                                  Milano, Italy                                             OCBE, University of Oslo
                 marta.crispino@phd.unibocconi.it                                        earjas@mappi.helsinki.fi

                                  Valeria Vitelli                                             Arnoldo Frigessi
                           OCBE, University of Oslo                                       OCBE, University of Oslo
                               Oslo, Norway                                                   Oslo, Norway
                    valeria.vitelli@medisin.uio.no                                arnoldo.frigessi@medisin.uio.no


ABSTRACT                                                                     preferences. The estimated individual rankings can be used
In this paper we propose a full Bayesian probabilistic method                for personalized recommendation. Inference is based on an
to learn preferences from non-transitive pairwise comparison                 adapted version of the Markov Chain Monte Carlo algorithm
data. Such lack of transitivity easily arises when the number                in [3].
of pairwise comparisons is large, and they are given sequen-                   Previous work on pairwise preferences includes the Bradley
tially without allowing for consistency check. We develop a                  Terry model [1] (but no personalized preferences are esti-
Bayesian Mallows model able to handle such data through                      mated) and, within the framework of Mallows model, [2]
a latent layer of uncertainty which captures the generation                  and [3], where transitivity is assumed.
of preference misreporting. We then construct an MCMC
algorithm, and test the procedure on simulated data.                         2.    METHODS
                                                                                Let N users independently express their preferences be-
                                                                             tween pairs of items in O = {O1 , ..., On }. We assume that
1.    INTRODUCTION                                                           each user j receives a possibly different subset Cj = {Cj,1 , ...,
   We consider pairwise preference data of the form “x is pre-               Cj,Mj } of pairs of dimension Mj ≤ n(n − 1)/2. Let Bj =
ferred to y”, denoted x ≺ y, where x and y are some items                    {Bj,1 , ..., Bj,Mj } be the set of preferences given by user j,
of interest. The challenge with this kind of data, is that                   where Bj,m is the order she gives to the pair Cj,m . For ex-
pairwise preferences are not always transitive. For instance                 ample Bj,m = {O1 ≺ O2 } means that O1 is preferred to
the data coming from a single user may contain preferences                   O2 if Cj,m = (O1 , O2 ). The considered data are incomplete
of the form {x ≺ y , y ≺ z, z ≺ x}. Such non-transitivity                    since not all items are observed by each user. We assume no
of preferences arises for many reasons, including the user’s                 ties in the data: users are forced to express their preference
inattentiveness, actual ambiguity in the user’s preference,                  for all pairs in the list Cj assigned to them, and indifference
multiple users with the same account and varying framing                     is not permitted.
of the data collection. These situations are so common that                     The main assumption is that each user j has a personal
most pairwise comparison data are in fact non-transitive,                    latent ranking, R̃j = (R̃j1 , ..., R̃jn ) ∈ Pn (the space of
thus creating the need for methods able to learn users’ pref-                n-dimensional permutations), distributed according to the
erences from data that lack logical transitivity.                            Mallows model
   In this paper we assume that non-transitive data arise be-
cause users make mistakes, i.e. switch the order between                                            exp{−(α/n)d(R̃j , ρ)}1Pn (R̃j )
                                                                                  π(R̃j | α, ρ) =                                     ∀j   .
two compared items, either simply by mistake or because                                                       Zn (α)
the items compared have a rather similar rank for the user.
                                                                             We assume conditional independence between R̃1 , ..., R̃N
The developed Bayesian methodology provides the poste-
                                                                             given ρ and α. Here ρ ∈ Pn is the shared consensus ranking,
rior distribution of the consensus ranking of a homogeneous
                                                                             α > 0 is the scale parameter (describing the concentration
group of users, as well as of the estimated individual rank-
                                                                             around the shared consensus), d(·, ·) : Pn × Pn → [0, ∞) is
ings for each user. The consensus ranking can be seen as
                                                                             any right-invariant distance function (e.g. Kendall, Spear-
a compromise which is formed from the individual pairwise
                                                                             man or footrule), and Zn (α) is the normalizing constant.
                                                                             See [3] for more details.
                                                                                We model the situation where each user j, when announc-
                                                                             ing her pairwise preferences, mentally checks where the items
                                                                             under comparison are in her latent ranking R̃j . Then, if the
                                                                             user is consistent with R̃j , the pairwise orderings in Bj are
                                                                             induced by R̃j according to the rule: (Ok ≺ Oi ) ⇐⇒
                                                                              R̃jk < R̃ji . In this case Bj contains only transitive prefer-
c 2016 Copyright held by the authors.
                                                                             ences. However, if the user is not fully consistent with her
RecSys 2016 Poster Proceedings, Boston, MA, USA, 15th-19th September 2016.   own latent ranking, the pairwise orderings in Bj can be non-
transitive. In order to deal with this situation, we propose
a probabilistic model based on the assumption that non-
transitivities are due to mistakes in deriving the pair order
from the latent raking R̃j . The likelihood assumed for a set
of preferences Bj is
                    X
    π( Bj |α, ρ) =       π(Bj |R̃j = Rj )π(R̃j = Rj | α, ρ),
                     Rj ∈Pn

where π(Bj |R̃j = Rj ) is the probability of ordering the pairs
in Cj as in Bj (possibly generating non-transitivities), when
the latent ranking for user j is Rj . It is therefore the prob-
ability of making mistakes instead of just following Rj .
The posterior density is then:
                                                            
                           YN    X
π(α, ρ| B1:N ) ∝ π(α)π(ρ)             π(Bj |Rj )π(Rj | α, ρ) ,
                               j=1   Rj ∈Pn

where we assume a gamma prior, π(α), for α and a uniform             Figure 1: Posterior CDFs of df (ρ, ρTRUE ) for varying
prior on Pn , π(ρ), for ρ. We suggest two models for the             parameters.
probability of making a mistake: the Bernoulli model (BM)
to handle random mistakes, and the logistic model (LM) for
                                                                     the number of users N increases (Figure 1, top right), be-
mistakes that are due to difficulty in ordering similar items.
                                                                     comes worse as the probability of doing mistakes θ increases
In both models we assume that any two pair comparisons
                                                                     (top left), improves as the dispersion of the individual latent
made by a user are conditionally independent given her la-
                                                                     rankings R̃j,TRUE around ρTRUE decreases, i.e. when α in-
tent ranking: (Bj,m1 ⊥
                     ⊥ Bj,m2 ) | R̃j , ∀m1 , m2 = 1, ...Mj . In
                                                                     creases, (bottom left), and becomes worse when the number
BM we assume the following Bernoulli type model for the
                                                                     of pairwise comparisons diminishes, i.e. when λM decreases,
probability of making a mistake:
                                                                     (bottom right). The new method performs generally well
      P(Bj,m = −B̃j,m (R̃j ) | θ, R̃j ) = θ ,     θ ∈ [0, 0.5)   ,   also if the number of pair comparisons per user is around
where −B̃j,m (R̃j ) is the reversed preference order, i.e. a         one third of all possible pair comparisons (λM = 15). We
mistake. We assign to θ the truncated Beta distribution on           then studied the precision of the estimated individual pref-
the interval [0, 0.5) as prior. In LM we assume the following        erences by comparing R̃j with R̃j,TRUE in terms of top−3
logistic type model for the probability of making a mistake:         detection. For each user j, we found the triplet of items
                                                                 D3j = {Oi1 , Oi2 Oi3 } with the maximal estimated posterior
 logit P Bj,m = −B̃j,m (R̃j ) R̃j , β0 , β1    = β0 + β1 dR̃j ,m ,   probability of being ranked jointly among the top−3 items.
                                                                     Let H5j be the set of 5 items with the 5 highest ranks in
where dR̃j ,m is the footrule or l1 distance of the individ-         R̃j,TRUE . We computed the percentage of users for which
ual ranks of the items compared: if Bj,m = (O1 ≺ O2 ),               D3j ⊂ H5j . This success-percentage was 60% - 85% when the
dR̃j ,m = |R̃j1 − R̃j2 |. We assign to β1 an exponential prior       data were generated with α = 2 and Mj = 15 ∀j, which
on the negative support1 and to β0 , conditioned on β1 , an          is a hard problem. For easier experiments, with larger α
exponential prior on the shifted support [−∞, −β1 ]2 . These         (ranging from 3 to 4) and larger M (up to 30), the success-
choices are motivated by the fact that we want to model a            percentages increased to 90%-100%.
negative dependence between the distance of the items and
the probability of making a mistake. Also, we want to force          4.   CONCLUSIONS
the probability of making a mistake when the items have                 We extended the Bayesian method for learning preferences
ranks differing by 1 to be less than 0.5.                            in [3] to non-transitive pairwise comparison data. We pro-
                                                                     duce estimates of the consensus ranking of all items under
3.      EXPERIMENTAL RESULTS                                         considerations, as well as a personal estimated ranking of
   We performed a number of experiments on simulated data,           all items for each user. This can be directly used for per-
generated according to the model in Section 2 with BM                sonalized recommendations. We also obtain posterior distri-
noise, a fixed ρTRUE and n = 10 items. For various val-              butions for these preferences, allowing quantification of the
ues of α we sampled latent rankings R̃j,TRUE from the Mal-           uncertainty of recommendations of interest.
lows density centered at ρTRUE , using which we generated
the individual non-transitive pair comparisons. In order             5.   REFERENCES
to assess the performance of our methods, in Figure 1 we             [1] F. Caron, A. Doucet, “Efficient Bayesian Inference for
plotted the posterior CDF of the footrule distance of the                generalized Bradley-Terry Models”, Journal of
estimated consensus
                P ranking and the true generating one,                   Computational and Graphical Statistics, 21(1), 2012.
df (ρ, ρTRUE ) = ni=1 |ρi − ρi,TRUE |, for varying parameters        [2] T. Lu and C. Boutilier, “Effective sampling and
N , α, θ and λM (the parameter of a Poisson from which the               learning for Mallows models with pairwise preference
number of comparisons assigned to each user is sampled).                 data”, Journal of Machine Learning Research, 2014.
As expected, the performance of the method improves as               [3] V. Vitelli, Ø. Sørensen, A. Frigessi and E. Arjas,
1
    π(β1 ) = λ1 eλ1 β1 1(β1 < 0), λ1 > 0.                                “Probabilistic preference learning with the Mallows
2
    π(β0 |β1 ) = λ0 eλ0 (β0 +β1 ) 1(β0 < −β1 ), λ0 > 0.                  rank model”, ArXiv e-prints, 2016.