<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Latent-Feature Plackett-Luce Model for Dyad Ranking Completion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dirk Schafer</string-name>
          <email>dirk.schaefer@jivas.de</email>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>Dyad ranking is a speci c type of preference learning problem, namely the problem of learning a model that is capable of ranking a set of feature vector pairs, called dyads. In this paper a situation is considered where feature vectors consists of identi ers only. A new method based on learning latent-features is introduced for this purpose. The method is evaluated on synthetic data and is applied on the problem of ranking from implicit feedback.</p>
      </abstract>
      <kwd-group>
        <kwd>Preference Learning</kwd>
        <kwd>Dyad Ranking</kwd>
        <kwd>Plackett-Luce Model</kwd>
        <kwd>Matrix Factorization</kwd>
        <kwd>Implicit Feedback</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Preference learning is an emerging sub eld of machine learning, which deals with
the induction of preference models from observed or revealed preference
information [3]. Such models are typically used for prediction purposes, for example to
predict context-dependent preferences of individuals on various choice
alternatives. Depending on the representation of preferences, individuals, alternatives,
and contexts, a large variety of preference models are conceivable, and many
such models have already been studied in the literature.</p>
      <p>A speci c problem within the realm of preference learning is the problem of
label ranking, which consists of learning a model that maps instances to
rankings (total orders) over a nite set of prede ned alternatives [11]. An instance,
which de nes the context of the preference relation, is typically characterized
in terms of a set of attributes or features; for example, an instance could be a
person described by properties such as sex, age, income, etc. As opposed to this,
the alternatives to be ranked, e.g., the political parties of a country, are only
identi ed by their name (label), while not being characterized in terms of any
properties or features.</p>
      <p>In [9], dyad ranking is introduced as a practically motivated generalization of
the label ranking problem. In dyad ranking, not only the instances but also the
alternatives are represented in terms of attributes|a dyad is a pair consisting
of an instance and an alternative. Moreover, for learning in the setting of dyad
ranking, an extension of the Plackett-Luce model, a statistical model for rank
data, was proposed. The approach was based on modeling utility scores of dyads
via a bilinear product of feature vectors of instance and alternative respectively.</p>
      <p>In this paper a problem is addressed which is in a sense diametrical to the
above mentioned settings because instances and alternatives are not described
by any attributes. The reasons of this circumstance could be that observable
attributes do not relate well with the preference information or the attributes
contain many missing values or they do not exist at all. Data preprocessing and
feature engineering may fail in these cases. Often however, instances and
alternatives are identi ed by a unique ID, especially when they are organized in a
relational database. Otherwise training examples from a nite set can always be
enumerated and a unique ID can be assigned. For this scenario a new variation
of the bilinear Plackett-Luce model is introduced which is called Latent-Feature
Plackett-Luce model (LFPL). In contrast to the original formulation of the
bilinear Plackett-Luce model it can deal with rankings over dyads which are speci ed
by identi ers only.</p>
      <p>The rest of the paper is organized as follows. First a formal description of
the dyad ranking completion problem is given in Section 2. The LFPL method
to solve this problem based on latent-features is described in Section 3. The
relation between dyad ranking completion and the application of learning to
rank on implicit feedback data is described in Section 4. An overview of related
methods is provided in Section 5. Experimental results are presented in Section
6, prior to concluding in Section 7.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Setting</title>
      <sec id="sec-2-1">
        <title>Dyad Ranking</title>
        <p>The problem of a dyad ranking is to learn a model that accepts as input any set
of (new) dyads and produces as output a ranking of them. A dyad is a pair of
feature vectors z = (x; y) 2 Z = X Y, where the feature vectors are from two
(but not necessarily di erent) domains X and Y.</p>
        <p>
          Figure 1 shows the basic learning work ow: a learning algorithm creates a
model from a nite set of training examples. The trained model can afterwards
N
be used for producing rankings on (new) dyads. The training set D = f ngn=1
consists of a set examples n (1 n N ), where each example is a dyad ranking
n : z(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
z(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
: : :
z(Mn); Mn
2;
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
of length Mn, where Mn can vary from example to example.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Dyad Ranking Completion</title>
        <p>The domain from which the training dyads and their members respectively stem
from is denoted by R X and C Y (indicating (r)ows and (c)columns).
Depending on the occurrence of dyad members within the training set one can
characterize the dyads which need to be ranked. Table 1 provides an overview of
the types of new dyads which can be encountered at the prediction phase. Dyad
ranking completion deals with dyads of type 1 which are dyads whose members</p>
        <p>
          ρ1
Prediction
z(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) ≻ z(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) ≻ z(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) ≻ z(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
z(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) z(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
z(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) z(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
z(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
have been encountered at the training phase individually but not yet jointly
together. They de ne the gaps in the R C matrix and lling these by ranking
all of them or subsets of them characterizes the task.
Ranking with Identi ers Dyads whose members are not described by
attributes can be utilized either by using existing database IDs or by assigning
natural numbers to them. When these dyads are encountered at the training
phase then any new dyads in the prediction phase must necessarily be of type 1.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>A Plackett-Luce Model with Latent Features</title>
      <sec id="sec-3-1">
        <title>Plackett-Luce Model</title>
        <p>The Plackett-Luce (PL) model is a parameterized probability distribution on
the set of all rankings over a set of alternatives y1; : : : ; yK . It is speci ed by a
parameter vector v = (v1; v2; : : : ; vK ) 2 R+K , in which vi accounts for the (latent)
utility or \skill" of the option yi. The probability assigned by the PL model to
a ranking is given by</p>
        <p>K
P( j v) = Y</p>
        <p>v (i)
i=1 v (i) + v (i+1) + : : : + v (K)</p>
        <p>K 1
= Y
i=1</p>
        <p>
          v (i)
PK
j=i v (j)
:
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
Obviously, the Plackett-Luce model is only determined up to a positive
multiplicative constant, i.e., P( j v) P( j s v) for all s &gt; 0.
        </p>
        <p>
          As an appealing property of the PL model is that its marginal probabilities
(probabilities of rankings on subsets of alternatives) are easy to compute and
can be expressed in closed form. More speci cally, the marginal of a PL model
on M &lt; K alternatives yi(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ); : : : ; yi(M) is again a PL model with parameters
vi(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ); : : : ; vi(M).
        </p>
        <p>Bilinear Plackett-Luce Model The bilinear model (BilinPL) [9] is based on
a functional formulation of the skill parameters as follows,</p>
        <p>v(z) = v(x; y) = exp x&gt;Wy
that is obtained by de ning</p>
        <p>as the Kronecker (tensor) product:
(x; y) = x</p>
        <p>y = x1 y1; x1 y2; : : : ; xr yc = vec xy&gt; ;
which is a vector of length p = r c consisting of all pairwise products of the
components of x and y, also known as cross-products. Thus, the inner product
hw; (x; y)i can be rewritten as a bilinear form x&gt;Wy with an r c matrix
W = (wi;j ); the entry wi;j can be considered as the weight of the interaction
term xiyj .
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Latent-Feature based PL Model</title>
        <p>To circumvent the limitation of the Bilinear PL model on the dependence of
explicit features a novel extension of the Plackett-Luce model is proposed. It
is based on latent-features and is called LFPL. For this model the PL log-skill
parameter for a dyad z = (i; j) is de ned as log v(z) = U iV j&gt;.</p>
        <p>The following notation is used. The indices i and j refers to the entities
associated with the rst and the second dyad members, whereas U and V are
matrices with U 2 RjRj K and V 2 RjCj K , where K denotes the dimensionality
of latent feature vectors. jRj and jCj denote the number of entities in R and
C respectively. Both, R and C refer to special feature spaces which are one
dimensional and are made up of natural numbers. The notation U k refers to the
k th row vector of the matrix U , whereas U k;i refers to the i th latent factor
of the k th vector.</p>
        <p>The conditional probability of observing a dyad ranking with the parameters
= fU ; V g can then be stated as:</p>
        <p>P( n j %n; ) =</p>
        <p>Mn 1
Y</p>
        <p>h i
exp U zn(k;1)V z&gt;n(k;2)</p>
        <p>
          h
PlM=nk exp U zn(l;1)V z&gt;n(l;2)
i :
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
The function zn(k; i) in the likelihood (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) refers to the i th member of a dyad
z observed in sample %n, where i 2 f1; 2g. The index k refers to the dyad z(nj)
which is put at the k th rank of n, s.t. n(k) = j.
        </p>
        <p>The model parameters = fU ; V g can be found by minimizing the negative
log-likelihood (NLL) of the data,</p>
        <p>N
` = X `n :</p>
        <p>n=1
In (6) the NLL of a single example (dyad ranking) is given by
`n =</p>
        <p>log P ( n j %n; )</p>
        <p>Mn 1
= X log
k=1</p>
        <p>Mn h
X exp U zn(l;1)V z&gt;n(l;2)
l=k
i!</p>
        <p>Mn 1
X
k=1</p>
        <p>U zn(k;1)V z&gt;n(k;2) :
The overall objective which is to be optimized is hence of the form:
1fs=zn(k;1)gV zn(k;2);i ; (10)
1ft=zn(k;2)gU zn(k;1);i ; (11)
`( ; D) + ( ) ;
where ( ) is a regularization term for controlling the model capacity or likewise
a countermeasure for preventing over- tting. Note that the objective (9) is not
convex and its minimization requires special considerations.</p>
        <p>Learning Algorithm As learning algorithm the rst order optimization
technique alternating gradient descent is used. With this approach the model weights
are updated by keeping one matrix U or V xed respectively. The rst
derivatives are given by
h</p>
        <p>Mn 1 PlM=nk 1fs=zn(l;1)gV zn(l;2);i exp U zn(l;1)V z&gt;n(l;2)
= X h i
k=1 PlM=nk exp U zn(l;1)V z&gt;n(l;2)
i
with s 2 Rn and 1 i K. Here Rn refers to the set of entities represented by
numerical IDs that occur as rst dyad members in the sample n. For example,
for a dyad ranking n : (x3; y2) (x3; y144) (x5; y9), Rn would be f3; 5g and
Cn = f2; 9; 144g. The derivatives for V can be stated similarly by treating the
components of U as constants:
h</p>
        <p>Mn 1 PlM=nk 1ft=zn(l;2)gU zn(l;1);i exp U zn(l;1)V z&gt;n(l;2)
= X h i
k=1 PlM=nk exp U zn(l;1)V z&gt;n(l;2)
i
Mn 1
X
k=1
Mn 1
X
(6)
(7)
(8)
(9)
with t 2 Cn and 1 i K.</p>
        <p>Algorithm 1 uses these expressions together with the AdaGrad strategy for
adaptive learning rate updates1. AdaGrad provides the advantage of freeing the
user from determining the initial learning rate values and update schedules while
being simple to implement [2]. The actual implementation utilizes a quadratic
regularization term
( ) =
where k kF refers to the Frobenius matrix norm. This approach is theoretically
justi ed by considering it as a problem of learning low rank matrices with a
convex relaxation; a relaxation that is based on the nuclear (trace) norm
regularization [1].</p>
        <p>Algorithm 1 LFPL Alternating Gradient Descent with AdaGrad
Require: Initial matrices U 2 RjRj K , V 2 RjCj K , learning rate , number of
iterations T , dyad rankings DT r = f(%; )ng.
1: function train lfpl (DT r, U , V , T )
2: for i 1 to T do
3:
4:
5:
rU Formula (10) on all samples (%; )n 1
HU = HU + (rU )2
U U (HU ) 1=2</p>
        <p>rU
6: rV Formula (11) on all samples (%; )n 1
7: HV = HV +(rV )2
8: V V (HV ) 1=2 rV
9: end for
10: return (U ; V )
11: end function
n
n</p>
        <p>N
N
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Predictions with LFPL</title>
        <p>
          With LFPL it is possible to solve the dyad ranking completion problem from
Section 2.2 by making prediction involving dyads of type 1 (c.f. Table 1). They
are all of the form z(k) = (ik; jk), where ik; jk 2 N and both indices ik and jk
appeared independently at other dyads during training but not jointly together yet.
The most probable ranking can be predicted by calculating v(k) = exp(U ik V j&gt;k )
and then arranging the dyads in descending order to the scores.
1 Note, that any other learning rate update strategy for gradient descent could be
used too.
There are two ways to relate the LFPL model to the Bilinear PL model. The rst
relation can be seen by setting W = U V &gt;. By describing each entity i by means
of a label embedding ei (e.g., via 1-of-k encoding), it is possible to cast (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) to the
Bilinear PL model formulation (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ). The second relation can be seen by treating
the latent feature vectors in the rows of U and V as object features and the
bilinear weight matrix W as identity matrix IK . Both relations justify to classify
the LFPL to be a bilinear model, too. But in contrast to BilinPL (Section 3.1)
the LFPL model does not require the engineering of explicit features but learns
latent-feature vectors under the same bi-linearity assumption instead. Without
considering W to be of lower rank it is not possible for the BilinPL approach to
generalize to unseen dyad rankings of type 1 where dyad members are described
by identi ers only.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Learning from Implicit Feedback</title>
      <p>Implicit feedback is probably the most basic kind of preference statement. It
refers to the selection of a single item from a set of choice alternatives at a time.
A reasonable interpretation for that action is that the selected item is probably
preferred over other items present at the time of selection. This kind of data
can be represented in a rectangular schema of rows and columns, e.g., where
they represent users and items. Cells in that schema indicate if an interaction
between a particular row and a column happened. In what follows we denote
row indices as ik 2 R and column indices as jl 2 C. Formally, implicit feedback
data is de ned as the subset of indices F R C. An index pair (i; j) in F can
be interpreted as an action that happened between entities i and j. The goal is
to provide a ranking of dyads (i; j) for which no interactions have been observed
so far, i.e., (i; j) 2= F .</p>
      <p>Without the loss of generality we turn now to the special case of the LFPL
for contextual implicit feedback. Let the set of items on which interactions have
been recorded for a context i be denoted as Ci+ := fj 2 C j (i; j) 2 F g. We
treat this scenario as a completion problem which we want to solve with dyad
ranking. Figure 2 illustrates the conversion of original implicit feedback data to
contextual dyad rankings. For each row ik (or user) a separate dyadic preference
graph is generated. All graphs combined form a training data set DF , which is
DF = f(i; ja)</p>
      <p>(i; jb) j ja 2 Ci+ ^ jb 2 F n Ci+g :</p>
      <p>To overcome the problem of over- tting on high frequent items, the learning
procedure needs a slight modi cation. The new procedure uses sampling from
DF similar to BPRMF [7] but reuses the existing Algorithm 1 for this purpose.
Instead of acting on single training examples it generates subsets of examples
(batches) of size L. This new approach is called LFPL/IF (for implicit feedback)
and is stated as Algorithm 2. As stopping criterion either the convergence of the
performance on a hold-out set or a prede ned maximal number of iterations can
be used.</p>
      <p>R i3 ?
...</p>
      <p>j1
j2
j3
j4
j1
j2
j3
j4
Gradients of LFPL/IF The calculations of the gradient matrices rU in Eq.
(10) and rV in Eq. (11) of Algorithm 1 simplify drastically in the case of implicit
feedback data. Let ja; jb denote the indices of items, where the corresponding
item ja is preferred over jb. Let furthermore i be the index of a user and 1
k K be a latent feature dimension. The NLL of a training example is then
given by
`n = log(exp(UiVja ) + exp(UiVjb ))</p>
      <p>UsVja :
And the derivatives can be calculated using following expressions:
=</p>
      <p>Vja;k exp(UiVja ) + Vjb;k exp(UiVjb )
exp(UiVja ) + exp(UiVjb )</p>
      <p>Vja;k ;</p>
      <p>Ui;k exp(UiVjb )
exp(UiVja ) + exp(UiVjb )</p>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>Plackett-Luce Networks (PLNet) is a method for dyad ranking that is able to
learn joint-feature representations [10]. Being based on a neural network it is
possible to express non-linear relationships between dyads and rankings. It is
applicable of solving the dyad ranking completion problem too but it is not
designed for the special case where dyad inputs are expressed by identi ers only.</p>
      <p>Probabilistic Matrix Factorization (PMF) [8] is a classical preference
completion method for preferences in form of ratings. A substantial property of
PMF is the learning and the prediction of quantitative instead of qualitative
preferences by lling gaps of an incomplete rating matrix. Although LFPL and
PMF di er fundamentally in this point, they also share some aspects. Firstly,
both approaches are based on the same model class, i.e., matrices U and V are
multiplied together to give a real valued matrix of dimensionality N M . In
both cases, higher matrix values indicate stronger preferences. Secondly, both
approaches put Gaussian priors over U and V to control model capacity.</p>
      <p>The methods BPRMF and WRMF are standard methods for learning on
implicit feedback data. The LFPL method is closely related to BPRMF (the
matrix factorization variant of Bayesian Personalized Ranking) although the rst
acts on dyad rankings of the form (i; ja) (i; jb) and the latter on triplets ja i
jb [7]. Both approaches share the same model class and also the assumption of
Gaussian priors. LFPL generalizes BPRMF by supporting rankings of arbitrary
lengths instead of pairwise preferences, i.e., rankings of length 2, and by providing
contextual and non-contextual variants. WRMF is an approach that is also based
on matrix factorization [6]. It shares with LFPF and BPRMF the same model
class but in contrast to them it is a pointwise approach (one item) instead of a
pairwise approach (two items at a time).
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <sec id="sec-6-1">
        <title>Ranking Completion (Synthetic Data) The advantage of LFPL over Bil</title>
        <p>inPL and PLNet is shown for the problem of ranking completion with identi ers.
Given a set of row entities R and a set of column entities C, which are speci ed
by their IDs only. Furthermore, provided with a training set of incomplete
contextualized rankings over C, where contexts are provided by entities from R, the
task is to create a model which can be used to predict arbitrary rankings for a
set of new dyads in R C.</p>
        <p>One thousand rankings of lengths 10 were sampled from the LFPL model
distribution. In a 10-fold CV experiment where 900 of them were used for training
and 100 for testing for each run. The LFPL model used for the generation of
rankings was determined by the matrices U 2 RjRj K and V 2 RjCj K which
were generated by sampling from the normal distribution, with K = 5, the
number of latent factors. To apply the dyads with identi ers on BilinPL and
PLNet the natural numbers used as identi ers were converted into vector format
using 1-of-k encoding. PLNet is con gured to contain one hidden layer with 10
neurons. The results given in Table 2 con rmed the hypotheses that BilinPL
and PLNet do not generalize well beyond the observed preferences. Especially
PLNet is prone to over tting. LFPL in contrast to BilinPL is able to share model
parameters under the assumption that preferences are representable with a lower
rank matrix W .
Implicit Feedback (Real Data) In this experiment the capabilities of LFPL/IF
on implicit feedback data was tested. The data set Movielense (1M) was utilized
due to its availability and high degree of popularity [5]. It is about 6040 users
which provided a total amount of 1 million explicit ratings on 3900 movies.
Similar to the experimental setup of [7] the rating data set was converted into implicit
feedback data. Each rating 1 r 5 is interpreted as a feedback and the task
corresponds to estimating how likely a user would rate the residual entries.</p>
        <p>For the evaluation we repeated the following leave-one-out procedure: One
random rating entry per user is removed from the data. These entries are not
considered for training but for benchmarking the trained model afterwards.
Training and test sets are thus disjunct subsets of user associated item indices, i.e.,
CiT r; CiT e Ci+, 1 i jU j and CiT r \ CiT e = ;. For evaluation of the models
the average AUC is used:</p>
        <p>AU C =
in which the evaluation pairs E (u) are given by</p>
        <p>E (u) = n(ja; jb) j ja 2 CuT r ^ jb 2= (CuT r [ CuT e)o :
As baseline methods random guessing and the user-independent most
popular approach are used. Most popular determines v^(u; i) as the number of users
that interacted with item i. The implementations of the BPRMF and WRMF
methods were taken from the software package MyMediaLite 3.11 [4] and their
default parameters were adopted. For LFPL the parameters were: dimension of
latent feature space set to 10, regularization values were xed to 0.01, number
of iterations was 5000 and the batch size was set to 2000.</p>
        <p>The results provided in Figure 3 re ect my expectation that LFPL/IF
performs similarly as BPRMF and WRMF. All three methods outperform the
base0.9
0.8
lines methods to a large extent. In direct comparison LFPL/IF is slightly superior
to WRMF and inferior to BPRMF in terms of AUC performance.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We presented a new dyad ranking approach based on the Plackett-Luce model
with latent-feature vectors applicable for the dyad ranking completion problem.
The advantage of LFPL is its applicability in scenarios where no information
on dyad members are available but their unique identi ers. The problem of
learning and prediction given implicit feedback data could be addressed as an
application of dyad ranking completion. The learning algorithm is based on
alternating gradient descent to deal with the non-convexity of the objective
function. An obvious disadvantage is that predictions with latent features can
be carried out in a straightforward way only on dyads within R C. LFPL in
its current form is also a bilinear model and may be limited to model more
complicated non-linear relationships between dyads and preferences.
6. Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative ltering for implicit
feedback datasets. In Proceedings of the 2008 Eighth IEEE International
Conference on Data Mining, ICDM '08, pages 263{272. IEEE Computer Society, 2008.
7. Ste en Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.</p>
      <p>Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of
the 25th conference on uncertainty in arti cial intelligence, pages 452{461. AUAI
Press, 2009.
8. Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In</p>
      <p>Advances in Neural Information Processing Systems, volume 20, 2008.
9. Dirk Schafer and Eyke Hullermeier. Dyad ranking using a bilinear Plackett-Luce
model. In Proceedings ECML/PKDD{2015, European Conference on Machine
Learning and Knowledge Discovery in Databases, Porto, Portugal, 2015. Springer.
10. Dirk Schafer and Eyke Hullermeier. Plackett-Luce networks for dyad ranking. In</p>
      <p>Workshop LWDA, \Lernen, Wissen, Daten, Analysen", Potsdam, Germany, 2016.
11. S. Vembu and T. Gartner. Label ranking algorithms: A survey. In J. Furnkranz
and E. Hullermeier, editors, Preference Learning. Springer-Verlag, 2011.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Ciliberto</surname>
          </string-name>
          , Dimitris Stamos, and
          <string-name>
            <given-names>Massimiliano</given-names>
            <surname>Pontil</surname>
          </string-name>
          .
          <article-title>Reexamining low rank matrix factorization for trace norm regularization</article-title>
          .
          <source>CoRR, abs/1706.08934</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. John Duchi, Elad Hazan, and
          <string-name>
            <given-names>Yoram</given-names>
            <surname>Singer</surname>
          </string-name>
          .
          <article-title>Adaptive subgradient methods for online learning and stochastic optimization</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2121</fpage>
          {
          <fpage>2159</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Fu</surname>
          </string-name>
          <article-title>rnkranz and Eyke Hullermeier. Preference learning: An introduction</article-title>
          . In Johannes Furnkranz and Eyke Hullermeier, editors,
          <source>Preference Learning</source>
          , pages
          <volume>1</volume>
          {
          <fpage>17</fpage>
          . Springer-Verlag,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Zeno</given-names>
            <surname>Gantner</surname>
          </string-name>
          , Ste en Rendle, Christoph Freudenthaler, and
          <string-name>
            <surname>Lars</surname>
          </string-name>
          Schmidt-Thieme.
          <article-title>MyMediaLite: A free recommender system library</article-title>
          .
          <source>In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys</source>
          <year>2011</year>
          ),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>F.</given-names>
            <surname>Maxwell</surname>
          </string-name>
          <article-title>Harper and Joseph A. Konstan. The movielens datasets: History and context</article-title>
          .
          <source>ACM Trans. Interact. Intell. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <volume>19</volume>
          :1{
          <fpage>19</fpage>
          :
          <fpage>19</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>