Choice-based recommender systems

                 Paula Saavedra                            Pablo Barreiro                         Roi Durán
                      CITIUS                                   CITIUS                              CITIUS
             University of Santiago de                University of Santiago de           University of Santiago de
                   Compostela                               Compostela                          Compostela
             Santiago de Compostela,                  Santiago de Compostela,             Santiago de Compostela,
                       Spain                                    Spain                               Spain
           paula.saavedra@usc.es                     pablobv70@gmail.com                  roiduram@gmail.com
               Rosa Crujeiras                            María Loureiro                   Eduardo Sánchez Vila
              School of Mathematics                     School of Business                         CITIUS
             University of Santiago de                University of Santiago de           University of Santiago de
                   Compostela                               Compostela                          Compostela
             Santiago de Compostela,                  Santiago de Compostela,             Santiago de Compostela,
                       Spain                                    Spain                               Spain
            rosa.crujeiras@usc.es                    maria.loureiro@usc.es             eduardo.sanchez.vila@usc.es

ABSTRACT                                                              item’s attributes [2]. These preferences can be used to pre-
Choice-based models are proposed to overcome some of the              dict the utility of any given item by comparing them with the
limitations found in traditional rating-based strategies. The         values of item’s attributes. Collaborative recommenders, on
new approach is grounded on decision-making paradigms,                the other hand, take advantage of previous ratings provided
such as choice and utility theories. Specifically, random             by the available decision-makers to predict the utility of any
utility models were applied in a recommendation problem.              given user-item pair [6]. This approach has been widely
Prediction accuracy was compared with state-of-art rating-            adopted as it removes the burden of knowing and managing
based algorithms in a gastronomy dataset. The results show            item attributes as well as their corresponding values.
the superior performance of choice-based models, which may               Many algorithms and models have been proposed under
suggest that real choices could bring more predictive power           the collaborative paradigm. Among them, two families have
than ratings.                                                         gained major attraction: neighborhood algorithms and la-
                                                                      tent factor models. The neighborhood approach was the
                                                                      first to implement to collaborative concept and became the
CCS Concepts                                                          reference model in this research area [9, 4]. The method con-
•Information systems → Collaborative filtering; So-                   sists on representing vectors of ratings on either the decision-
cial recommendation;                                                  maker or item space. The distance between any pair of these
                                                                      vectors determine the similarity between either the decision-
Keywords                                                              makers or the items that these vectors represent. Individuals
                                                                      with similar rating’s vectors are considered to possess similar
Choice models; Random Utility Models; Logit probabilities;            tastes or preferences, while items are considered to have sim-
Tourism                                                               ilar attributes. The latent factor strategy, in turn, attempts
                                                                      to explain ratings by means of characterizing both users and
1.   INTRODUCTION                                                     items with a limited set of factors. Factors are considered
   Recommender systems are personalization tools aimed at             unknown variables that can be inferred from the ratings de-
suggesting relevant items on the basis of available informa-          clared by the users. The inference or learning problem can
tion on items as well as decision-makers [5]. Broadly speak-          be solved with factorization techniques. The classical fac-
ing, recommenders can be classified in two different cate-            torization method is called Singular Value Decomposition
gories. Content-based recommenders generate a profile for             (SVD) and was applied successfully to identify and reduce
each decision-maker by considering items experienced in the           the number of relevant factors [10]. However, the method
past. The profile typically represents the preferences of the         requires complete knowledge of the rating matrix and fill-in
decision-maker, i.e the taste of the decision-maker on each           methods to populate sparse rating matrix come at a cost of
                                                                      inaccurate factor learning. Recently, new factorization tech-
                                                                      niques have been successfully developed that are capable of
                                                                      learning the factors from sparse rating matrices [7]. Each
                                                                      rating is explained by means of two vectors whose dimen-
                                                                      sions correspond with the set of latent factors. The first
                                                                      vector represents the item in terms of its degree of posse-
                                                                      sion of each factor, while the second vector represent the
Copyright held by the author(s).
RecTour 2016 - Workshop on Recommenders in Tourism held in conjunc-
                                                                      decision-maker on the basis of her preference on each factor.
tion with the 10th ACM Conference on Recommender Systems (RecSys),    These item and decision-maker vectors constitute a pair of
September 15, 2016, Boston, MA, USA.
matrices whose values have to be inferred. The learning
problem is solved by means of minimizing the regularized
                                                                              CR(A, ) = {a0 ∈ A k a0  a, ∀a ∈ A}              (2)
error on the set of known ratings.
   Despite the success of current recommender systems, the         where CR stands for ”choice rule” and the  operator de-
experience with state-or-art approaches reveal some impor-         notes the relationship ”preferred to, or at least as preferred
tant limitations. First, the degree of performance of a recom-     as”. Basically, it means that the chosen alternative will be
mender algorithm depends on the specific issues of the prob-       the one from which the decision-maker shows a higher pref-
lem at hand. Therefore, heuristic models and trial-and-error       erence. The preference operator needs to be quantified to
methodologies are often used to look for the best solution         allow a numerical comparison between the alternatives.
for any given situation. The problem may be approached                The utility theory comes to the rescue to solve this prob-
in a more theoretical and consistent way if recommenders           lem. One of the axioms of this theory states that it is pos-
were considered as agents predicting the decisions taken by        sible to define a utility function such that:
decision-makers. Under this scope, the first limitation could
                                                                                      a  b ⇐⇒ U (a) ≥ U (b).                   (3)
be stated as follows: (L1) Current state-of-art approaches
are mostly based on heuristic models rather than decision-         And then, the choice rule in equation 2 can be represented
making theories. Second, some popular paradigms assume             in terms of the utility function and a numerical operator:
a direct relationship between preferences and ratings: (1)
                                                                          CR(A, ≥) = {a0 ∈ A k U (a0 ) ≥ U (a), ∀a ∈ A}.        (4)
the neighborhood approach considers that decision-makers
with similar ratings on a set of items will have similar prefer-     It is now clear that the new choice rule is mathemati-
ences, and (2) factorization techniques assume that ratings        cally equivalent to the recommendation problem described
can be the result of a product between item’s latent factors       in equation 1:
and decision-maker preferences about that factors. In these
paradigms unobserved preferences are usually inferred from           a0 = arg max U (c, a) ⇐⇒
                                                                             a∈A
observed ratings. The issue here comes from the fact that
ratings could be mostly explained by variables different to                 CR(A, ≥) = {a0 ∈ A k U (a0 ) ≥ U (a), ∀a ∈ A}.      (5)
preferences. The quality of the item, the user-item context,
                                                                     As the recommendation problem can be understood as a
and in general any factor involved during the process of ex-
                                                                   choice prediction problem, then the powerful models and
periencing the item, they all could provide more explanatory
                                                                   techniques developed in this field can be naturally applied
power about ratings than preferences do. Therefore, the sec-
                                                                   to generate recommendations.
ond limitation could be described as follows: (L2) Prefer-
ences are usually derived from ratings without any support-        2.2    Choice models with random utility
ing evidence about the relationship between these variables.          The choice rule models how decision-makers take their
   This work proposes choice-based recommender systems to          decisions. However, the problem of predicting such deci-
overcome these limitations. The concept is grounded on             sions is a different task. In real problems the researcher
choice and utility theory, where real choices replace ratings      does not have access to all the factors and variables that
as the key data to learn the decision-maker’s preferences as       decision-makers include to estimate utilities. For a concrete
well as to make recommendations. The proposed models are           individual cn , the researcher only knows some attributes
then evaluated in the tourism domain with a gastronomy             of the alternatives, labeled xj for all aj alternatives with
dataset that includes both choices and ratings. In what            j ∈ {1, · · · , J}, and some attributes of the decision-maker,
follows, the choice-based models are presented, the meth-          labeled zn . A function that relates these observed factors to
ods are described, and the models evaluated and compared           the decision-maker’s utility can be specified. This function
against state-of-art rating-based algorithms. The discussion       is denoted by Vnj = V (xj , zn ) and it is often called repre-
will comment on the results and highlight the major contri-        sentative utility. It usually depends on parameters that are
butions of the paper.                                              unknown and, therefore, they must be estimated.
                                                                      Since there are aspects of utility that the researcher does
2.    CHOICE MODELS                                                not or cannot observe, Vnj 6= Unj . Therefore, the utility can
                                                                   be decomposed as:
2.1    Recommendation as a choice problem                                                 Unj = Vnj + nj                       (6)
   The recommendation problem can be described as an op-
timization problem which consists on (1) estimating the util-      where nj captures the unknown factors that modify the
ity of each item a ∈ A, the available item set, for any given      utility and are not included in Vnj . This decomposition is
decision-maker c, and (2) choosing the item a0 that maxi-          fully general, since nj is defined as simply the difference
mizes U (c, a), the decision-maker utility on any item a [1]:      between true utility Unj and the part of utility that the
                                                                   researcher captures in Vnj . Given its definition, the charac-
                     a0 = arg max U (c, a)                  (1)    teristics of nj , such as its distribution, depend critically on
                            a∈A
                                                                   the researcher’s specification of Vnj . The researcher does not
   It is worth noting that this problem is conceptually the        know nj for all j and therefore these terms are considered
same as the one faced by the Rational Choice Theory, which         random variables that allow the researcher to make proba-
aims at explaining economic behaviour under choice situa-          bilistic statements about the decision-maker’s choice. The
tions [11]. The theory states that a decision-maker will max-      models derived under this assumptions are called random
imize her utility after satisfying some budget constraints.        utility models (RUM) [8].
More formally, the decision-maker will choose alternative a0          Now, the choice rule of equation 4, which is deterministic
from a choice set A according to the following rule:               under the decision-maker perpective, becomes probabilistic
under the perspective of the researcher. Then the rule for a            choosing the alternative that she was observed actually to
decision-maker cn choosing alternative ai is:                           choose is
                                                                                                    YY y
           CR(A, ≥) = {ai ∈ A k Pi ≥ Pj , ∀aj ∈ A}               (7)                        L(β) =        Pnini
                                                                                                         n       i
and the probability Pi is estimated as follows:
                                                                        where β denotes the vector of all model parameters. There-
  P(Uni > Unj for all j 6= i) =                                         fore, the log-likelihood function is
               P(nj − ni < Vni − Vnj for all j 6= i).          (8)                               XX
                                                                                          LL(β) =          yni log Pni
If the joint density of n = (n1 , ..., nJ ) is denoted by f , this                                n       i
cumulative probability can be rewritten as:                             and the estimator is the value of β that maximizes this func-
                                                                        tion. Importantly, it was proved that the log-likelihood func-
         Z
  Pni = I(nj − ni < Vni − Vnj for all j 6= i)f (n )dn (9)           tion with these choice probabilities is globally concave in
          
                                                                        parameters β, which helps in the numerical maximization
where I is the indicator function, equaling 1 when the term             procedures, see [8] for more details.
in parentheses is true and 0 otherwise. This is a multidimen-              A well-known issue of standard logit model deals with
sional integral over the density of the unobserved portion of           capturing the heterogeneity of population [12]. The impor-
utility, f (n ). Different choice models are obtained from dif-        tance that decision-makers place on each attribute of the
ferent specifications of this density, that is, from different as-      possible choices varies, in general, over decision-makers. Al-
sumptions about the distribution of the unobserved portion              though logit model is able to represent the taste variation
of utility. In addition, the choice of the density determines           related to observed characteristics of the decision-maker, it
whether the integral takes a closed form or not [12].                   can not represent differences in tastes that can not be linked
                                                                        to observed characteristics. Therefore, if taste variation is at
2.3    Standard and mixed logit models                                  least partly random, a logit model with random parameters
  The simplest and most widely used choice model is the                 should be considered instead. Under this considerations, β
standard logit model [8]. It is derived under the assumption            is now a vector of random coefficients and these coefficients
that the each unobserved portion of utility nj is distributed          vary over decision-makers in the population with density g.
independently, identically extreme value. In this case, f               In most applications that have actually been called mixed
denotes the density for Gumbel distribution:                            logit, g is specified to be continuous. For example, it can
                                            −nj                        be specified to be normal, lognormal, uniform, triangular
                     f (nj ) = e−nj e−e          .            (10)
                                                                        or, even, gamma. Therefore, this density is a function of
Following [8], the logit choice probability that decision-maker         parameters θ that represent, in the gaussian case, the mean
cn chooses alternative i is                                             and covariance of the random coefficient in the population.
                                                                        Then, the choice probabilities can be written as:
                                eVni
                         Pni = P Vnj .                          (11)                                          !
                                je                                                                  eVni (β)
                                                                                             Z
                                                                                      Pni =       P V (β) g(β|θ)dβ.                (12)
                                                                                                     je
                                                                                                         nj
This model presents a clear interpretation. According to
equation 11, if Vni rises, reflecting a matching between the            Since the previous integral has not a closed form, it must
observed attributes of the alternative and the preferences              be evaluated numerically through simulation. Once the re-
of the decision-maker, with Vnj for all j 6= i held constant,           searcher specifies a distribution g for the coefficients, the pa-
Pni approaches one. And Pni approaches zero when Vni                    rameters θ maximizing the simulated log-likelihood must be
decreases, since the exponential in the numerator approaches            estimated. Then, R draws of the coefficients are taken from
zero as Vni approaches −∞.                                              g and the logit probabilities are computed for every draw.
   The representative utility is usually specified to be linear         The unconditional probability in equation 12, that is the ex-
in the set alternative’s attributes: Vnj = βnj · xj , where xj          pected value of the conditional probabilities, is estimated as
is a vector containing, as before, the observed variables of            the average of R probabilities determined previously.
the alternative aj , and βnj denotes the model coefficients
vector which describes the preferences of decision-maker cn
on the attributes of the alternatives aj . The preferences βnj          3.    METHODS
(model coefficients) are estimated by fitting equation 11 to               The performance of choice-based models is compared with
a dataset of choices. Moreover, since the logit probabilities           a choice of relevant rating-based algorithms from a gastro-
take a closed form, maximum likelihood procedures are ap-               nomic dataset containing the choices of snacks made by a
plied for estimation. Concretely, the probability of person             set of decision-makers and their corresponding tapa ratings.
cn choosing the alternative that he was actually observed to            The dataset is described in Sections 3.1 and 3.2. Technical
choose can be expressed as                                              details on the two recommendation alternatives considered
                           Y y                                          in this work are briefly presented in Sections 3.3 and 3.4. Fi-
                               Pnini ,                                  nally, the error criteria used to compare them are introduced
                               i
                                                                        in Section 3.5.
where yni = 1 if the individual choses i and zero otherwise.
Since yni = 0 for non-chosen alternatives and Pni raised to             3.1    Experiment
the power of zero is 1, this term is simply the probability               In the context of the RECTUR project, an experiment
of the chosen alternative. Assuming that decision-maker’s               was carried out with real users in the context of Santi-
choices are independent, the probability of each individual             ago(é)Tapas, a gastronomic context that takes place every
year in Santiago de Compostela. In 2011 the fourth edi-           binary variables associated to each alternative (or snack)
tion was held with a total of 56 participating restaurants        were generated for fitting these two models. Next, the con-
proposing and elaborating up to three tapas that were sold        struction of the variables is briefly described through an ex-
at a price of 2 euro. The experiment was designed to gather       ample. The choice set associated to the old area contains, as
relevant data while preserving the spirit of the contest. Par-    possible choices, the set of tapas distributed in restaurants
ticipants were local users as well as Spanish and interna-        of this zone. For each one of these snacks, the dichoto-
tional tourists. A TapasPassport with the official informa-       mous variables cheese, egg, fish, meat, vegetable, shellfish
tion about the contest was made available to all partici-         and traditional are generated. According to Figure 2, the
pants. It contained: (i) the contest guidelines and other         main ingredient of t100 is meat. However, this tapa is not
related information to the participants, (ii) restaurants lo-     traditional. Therefore, only the variable meat will be equal
cation, (iii) the tapas offered on each restaurant, (iv) an       to 1. The rest of variables associated to t100 will take the
official seal to demonstrate that a participant has visited       value zero.
the minimum number of restaurants required to obtain con-            Within the discrete choice framework, the set of alterna-
testś gifts. Restaurant staff had to sign the TapasPassport      tives known as the choice set must verify three properties.
to certify that its owners have visited the place.                It has to be finite, exhaustive (the decision-maker always
   After consuming a tapa, participants were asked to evalu-      chooses one of the alternatives) and mutually exclusive (the
ate their experience. Users had to provide two ratings rang-      choice of one alternative necessarily implies not choosing
ing from 0 to 5: (i) a rating of the tapa, and (ii) a rating of   any of the other ones). Due to the last property, three dif-
the overall experience (service, place atmosphere, etc.). In      ferent choice subsets were established in this work. They
addition, they were informed about our research experiment        correspond to the three possible restaurant locations (old,
and asked to extend their feedback providing information          new and outlying areas of the city). Therefore, standard
about the temporal and social context in which the experi-        and mixed logit models are estimated separately from these
ence took place.                                                  three choice subsets that contain only the tapas associated
                                                                  to each zone. This assumption could be less general. For in-
3.2    RECTUR Dataset                                             stance, considering the set of tapas of a concrete restaurant
   The data gathered in the experiment was collected in the       would provide a new choice set and, as consequence, a new
RECTUR dataset. It is assumed that the choice of a tapa           choice problem.
depends on the user preferences about the levels of tapa at-         Estimations results for these six models are shown in Sec-
tributes, which will in turn depend on the user attributes        tion 4.2. For the same area of the city, standard and mixed
and context elements. The consumption of a tapa deter-            logit models present similar estimations for the coefficients.
mines a choice from a choice set and will elicit a satisfaction   As consequence, only prediction accuracy of the standard
response quantified as a user rating.                             logit model was compared with rating-based algorithms.
   For each tapa, we gathered the following attributes:
                                                                  3.4    Baselines: Rating-based models
   • Choice sets. Different choice sets could be defined for
                                                                     The proposed choice-based models were compared with
     each choice. We acquired information about the fol-
                                                                  two popular rating-based models: User-based collaborative
     lowing sets:
                                                                  filtering (UBCF) and matrix factorization (MF). User-based
        – Set of tapas in the same area of the city (outlying,    collaborative filtering assumes that individuals with similar
          new or old zone).                                       preferences will rate items in a similar way. Then, miss-
                                                                  ing ratings for a concrete user cn could be predicted find-
        – Set of tapas in the same restaurant.
                                                                  ing a neighborhood N (n) of similar users and aggregating
   • Tapa attributes. The gathered attributes are:                their ratings to calculate the corresponding prediction. The
                                                                  concept of similarity between users is used for defining this
        – Type: Cheese, egg, fish, meat, vegetable, shellfish     neighborhood given all users within a similarity threshold.
          and other. The main ingredient defined the type         In this work, the cosine similarity measure is taken into ac-
          of the tapa.                                            count and |N (n)| was fixed equal to 25. For an item i and
        – Character: Traditional or daring. Traditional tapas     an individual cn , the ratings predicted, r̂ni , can be written
          are those that follow popular well-known recipes,       as
          while daring tapas are creative and provide inno-                                     1    X
          vative recipes.                                                             r̂ni =              rji
                                                                                             |N (n)|
                                                                                                     j∈N (n)
        – Restaurant. The restaurant that offers the tapa
          was also categorized in terms of its location (out-     where | | denotes the cardinal of N (n).
          lying, new or old area), atmosphere and style.             Matrix factorization, on the other hand, characterizes both
                                                                  items and users by vectors of factors inferred from item rat-
   • Rating. The rating provided by each consumer.                ing patterns. For a given item i and a user cn , the vector
                                                                  qi measure the extent to which the item possesses those fac-
3.3    Choice-based models                                        tors and the vector pn , the extent of interest the user has
  The standard logit model as well as the mixed logit model       in items that are high on the corresponding factors. The
assuming Gaussian distribution on the coefficients, both de-      dot product qiT pn captures the user’s interest in the item’s
scribed in Section 2.3, were chosen as basic representatives of   characteristics. This approximates user cn ’s rating of item
the family of random utility choice-based models to be com-       i, rni , leading to the estimate
pared with rating-based algorithms. From attributes type
and character of each tapa described in Section 3.2, eight                                 r̂ni = qiT pn .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Daring


                                                                                                                                                                                       Other, 3.9
                              250


                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Shellfish, 3.9
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Traditional
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Maximum and minimum rating means


                                                                                                                                                                                                               Meat, 4.1
                              200


                                                                                                                                                                                                                                   Sweet, 4.3
                                                                                                                                    Shellfish, 3.7


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Meat, 4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Fish, 4.2
                                                                                                                                                                    Shellfish, 4.1


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Fish, 4.1
   Number of tapas consumed


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Vegetable, 4.2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Meat, 4.4
                              150


                                                                                                                                                                                                                                                                                                                                                                                    Other, 4.3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Fish, 3.9
                                                                                                                                                                                                                                                                                                                                                               Other, 4.5
                                                                                    Sweet, 4.4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Meat, 4.5

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Meat, 4.3
                                                          Sweet, 4.3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Meat, 3.4
                                                                                                                                                                                                                                                                                                                                                                                                              Meat, 4.2
                                                                                                                                                                                                                                                                                                                                  Cheese, 3.8


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Shellfish, 3.7
                                                                                                                                                                                                                                                                                                                                                                                                                                                             Fish, 3.8


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Shellfish, 3.4
                                                                                                                                                                                                                                                                                    Shellfish, 3.5


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Fish, 4.1
                              100


                                            Meat, 3.9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Meat, 3.3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Cheese, 4.2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Other, 4
                                                                                                                                                     Meat, 3.8


                                                                                                                                                                                                                                                                                                                                                                                                                                           Fish, 4.1
                                                                                                                                                                                                                                                                                                                                                  Sweet, 4.5
                                                                                                                                                                                                                                                                                                            Fish, 4
                                                                                                               Fish, 3.8
                              50


                                                                                                                                                                                                                                                  Fish, 4.2

                                                                                                                                                                                                                                                                 Egg, 3.8
                              0


                                            t107 t111 t112 t113                                                                  t12                 t15            t19                t22                     t23             t24                t47            t48                t49                     t50                   t51           t52            t53                  t54                       t59                          t6            t60             t61                 t62                    t63                     t66                 t67            t68          t69                         t7                     t70               t8               t88           t89            t90                          t91               t92               t93

                                                                                                                                                                                                                                                                                                                                                                                                          Tapas


Figure 1: Bar plot for number of different tapas consumed, main ingredient and mean of users’ ratings in the
new zone of the city.
                              500


                                                                   Shellfish, 4.5


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Maximum and minimum rating means
                              400


                                                                                                                                                        Meat, 4.2


                                                                                                                                                                                                                                                                                                                                                                                                                          Shellfish, 3.9
   Number of tapas consumed


                                                                                                                                                                                                                                                                                                                                                                                                                                                       Shellfish, 4.6
                              300


                                                                                                                                                                                                                                                Vegetable, 4.2


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Other, 3.3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Shellfish, 4.2
                                                                                                                                                                                                                                                                                                                                                                   Shellfish, 3.9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Meat, 3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Meat, 4.4
                                                                                                                                                                                                                                                                                                 Egg, 4.3
                              200


                                                                                                                                                                                                    Egg, 4.5
                                              Meat, 3.5


                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Shellfish, 4.4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Shellfish, 3.9
                                                                                                                                                                                                                           Shellfish, 4.6


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Sweet, 4.7
                                                                                                                                                                           Fish, 4.5


                                                                                                                                                                                                                                                                                                                                                                                                 Sweet, 4.5
                                                                                                                                                                                                                                                                                                                      Fish, 4.3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Shellfish, 3.2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Meat, 4.1


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Other, 4
                                                                                                                                                                                                                                                                        Fish, 3.8
                                                                                                                           Sweet, 3.3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Meat, 3.6
                              100


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Fish, 4.1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Other, 3.4
                                                                                                  Other, 3.2


                                                                                                                                                                                                                                                                                                                                                Sweet, 4.2
                              0


                                            t100               t101                              t108                      t109                      t14                   t16                      t17                    t18                  t25                   t27                            t28              t29                       t37             t40                         t41                           t5                           t55               t56                     t77                            t78                          t79               t82                 t83                             t84                     t85                    t87             t94                      t97                          t98                     t99

                                                                                                                                                                                                                                                                                                                                                                                         Daring tapas


Figure 2: Bar plot for number of different daring tapas consumed, main ingredient and mean of users’ ratings
in the old zone of the city.


Therefore, the challenge is computing the mapping of each                                                                                                                                                                                                                                                                                                                                                                        k k is the Euclidean norm and λ denotes a constant control-
item and user to vectors qi and pn . Here, singular value                                                                                                                                                                                                                                                                                                                                                                        ling the extent of regularization. In this work, λ = 1.5.
decomposition will be applied factoring the user-item rating
matrix that could be sparse. In order to learn the factor                                                                                                                                                                                                                                                                                                                                                                        3.5                                             Evaluation
vectors (pn and qi ), the regularized squared error on the set                                                                                                                                                                                                                                                                                                                                                                     Classical ranking error metrics could not be applied mainly
of known ratings is minimized:                                                                                                                                                                                                                                                                                                                                                                                                   because of the lack of information about all the relevant
              X                                                                                                                                                                                                                                                                                                                                                                                                                  tapas for the decision-maker on any choice situation. There-
        min         (rni − qiT pn )2 + λ(kqi k2 + kpn k2 )
                                    q∗,p∗                                                                                                                                                                                                                                                                                                                                                                                        fore, two error metrics are proposed in order to compare
                                            (u,i)∈K
                                                                                                                                                                                                                                                                                                                                                                                                                                 the behaviour of choice-based and rating-based algorithms.
where K is the set of the (cn , i) pairs for which rni is known,                                                                                                                                                                                                                                                                                                                                                                 The metrics are described considering that only the tapa
                              400
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Maximum and minimum rating means


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Fish, 4.3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Meat, 3.9
                                                                                                                                                                                                                                                                                                                                                                                                                                      Other, 3.9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Meat, 3.6
                              300


                                                                                                                                                                                                                                                                                                                                                                Shellfish, 3.8
                                                                                                                                                                                                                             Egg, 3.7
   Number of tapas consumed


                                                                                                                                                                                                                                                                                                                                                                                            Meat, 4
                                                                                                                                                          Shellfish, 4.1


                                                                                                                                                                                                                 Meat, 3.8
                                                                                                  Shellfish, 4.5
                              200


                                                                                                                    Meat, 3.9


                                                                                                                                                                                                                                                         Shellfish, 4.2


                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Meat, 3.5
                                                                                                                                                                                 Fish, 3.7


                                                                                                                                                                                                                                                                                                                                                                                                                                                   Meat, 3.7
                                                                                                                                                                                                                                            Other, 3.7


                                                                                                                                                                                                                                                                                                                                                                                                                 Meat, 3.7
                                    Fish, 4.1


                                                                                                                                                                                                                                                                                                                                                                                                      Egg, 3.7
                                                                                                                                Cheese, 3.5


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Shellfish, 3.6
                                                                                                                                                                                                                                                                                                                                                   Other, 3.6
                                                                                                                                                                                                                                                                                                        Shellfish, 3.4


                                                                                                                                                                                                                                                                                                                         Shellfish, 3.4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Meat, 3.8
                                                                                                                                              Fish, 2.9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Vegetable, 3.1
                                                                                                                                                                                             Meat, 3.1
                              100


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Shellfish, 3.2
                                                                                                                                                                                                                                                                          Shellfish, 4.1
                                                            Vegetable, 3.3


                                                                              Fish, 3.7
                              0


                                    t10                     t102             t103                 t11              t110         t20           t21         t26                    t30         t31                  t32        t33          t34            t35              t36                           t38              t39                       t4           t42                     t43           t64        t65                  t71          t72                     t73         t74                                   t75                t76          t80             t81              t86                          t9

                                                                                                                                                                                                                                                                                                 Traditional tapas


Figure 3: Bar plot for number of different traditional tapas consumed, main ingredient and mean of users’
ratings in the old zone of the city.
                              140


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Daring
                              120


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Traditional
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Maximum and minimum rating means
                                                                                                                                                                                                         Vegetable, 3.7


                                                                                                                                                                                                                                                                                                                                                                                 Shellfish, 4.6
                                                                                                                                                                                                                                                                                                                                           Fish, 4.8
                                                                                          Shellfish, 4.2
                              100


                                                                                                                                 Fish, 4.4
   Number of tapas consumed


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Missing ingredient, 3.3
                              80


                                                                                                                                                                    Other, 4.2


                                                                                                                                                                                                                                                                                                                                                                                                                         Sweet, 4.5
                              60


                                                Meat, 4.4


                                                                                                                                                                                                                                                                                                                                                                                                                                                               Fish, 3.5


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Fish, 3.9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Meat, 3.6
                                                                                                                                                                                                                                        Meat, 4.3


                                                                                                                                                                                                                                                                                           Sweet, 4.3
                              40
                              20
                              0


                                                   t1                                     t104                                    t105                               t106                                t13                                  t2                                           t3                                             t44                                         t45                             t46                                         t57                                            t58                                           t95                                             t96

                                                                                                                                                                                                                                                                                                                   Tapas


Figure 4: Bar plot for number of different tapas consumed, main ingredient and mean of users’ ratings in the
outlying zone of the city.


with the highest associated rating or probability is recom-
mended/predicted (top 1). Error I is equal to one if the item
predicted does not coincide with the true alternative chosen                                                                                                                                                                                                                                                                         The second measure of error, error II, is equal to the po-
                                                                                                                                                                                                                                                                                                                                  sition of the real choice in the ordered list of recommen-
by the individual and zero otherwise. Therefore, given an
individual cn , the true choice i and the recommended item                                                                                                                                                                                                                                                                        dation minus one. Therefore, if the item recommended is
j is:                                                                                                                                                                                                                                                                                                                             equal to the chosen one then the error is equal to zero. Let
                                                                                                                                                                                                                                                                                                                                 (i1 , ..., ik , ..., iJ ) be the list of ordered items to be recom-
                                 1     : if i 6= j                                                                                                                                                                                                                                                                                mended, the error for the user cn with true choice i can be
            error I (cn , i) =
                                 0 : otherwise.
written as:                                                         lowest ones correspond to t21 and t94. The highest ones, to
                                                                    t11 and t99.
               error II (cn , i) = k − 1 if ik 6= i.
                                                                       The number of snacks consumed in the outlying area of the
   For instance, if one decision-maker cn chose the snack t1        city is 743. Again, the number of different snacks associated
among the snacks (t1, t104, t105, · · · ) and the prediction        to this zone is smaller. Concretely, it is equal to 14; 3 of them
(ordered according to the highest ratings or probabilities)         present a traditional character and 11, a daring character.
is equal to (t105, t104, t1, · · · ), then error I (cn , t1) = 1.   Furthermore, the number of users in this area is 436. Figure
However, error II (cn , t1) = 2.                                    4 shows the total number of daring and traditional tapas
   Error I and error II can be generalized easily if a list of a    that users consumed for the 14 choices. According to the
concrete number of ordered items (in terms of probabilities         results, t44, t45, t104 and t105 were the most chosen snacks.
or ratings) is recommended instead of recommending only             However, t2 and t3 were rarely selected. The snacks t58 and
one alternative. These two errors are equal to zero if, for an      t44 correspond to the tapas with lowest and highest means
individual cn , the true choice i belongs to the recommended        of ratings, respectively. The main ingredient of t58 is a
list of items. Otherwise, error I will take the value one and       missing value. In addition, cheese and egg are not the main
error II, the position of the true choice i in the ordered list     component for any snack.
of non-recommended alternatives. In this work, a list of five
alternatives will be considered (top 5).                            4.2    Choice models fitting
                                                                       The standard and mixed logit models have been fitted
4.    RESULTS                                                       from the three choice sets described in Section 4.1. Due to
                                                                    the price is the same for every snack, the determinants of
4.1    Data description                                             these choices, xj , are eight dichotomous alternative specific
                                                                    variables. Seven of them indicate the main component of
   RECTUR dataset presented in Section 3.2 deals with 5517
                                                                    each tapa: Cheese, egg, fish, meat, shellfish, sweet and veg-
individuals, that make one or a sequential choices of one tapa
                                                                    etable. The eighth variable takes value equal to one when
among a set of 113 tapas distributed in Santiago de Com-
                                                                    the snack has a traditional character. In addition, for mixed
postela. Acording to comments in Section 3.3, three subsets
                                                                    logit model, Gaussian distribution was assumed on the co-
of the original dataset will be considered distinguishing three
                                                                    efficients and R = 100 was fixed.
different choice contexts or, equivalently, three zones of the
city.                                                                                    New zone     Old zone     Outlying zone
   Next, the three scenarios will be briefly described.                  Cheese            -0.07        -0.25
   The total number of tapas consumed in new area of the                  Egg              -2.48        0.31
city is 3888. However, the number of different tapas asso-                Fish             -0.46        -0.02            0.14
ciated to this zone is only 37; 18 of them present a tradi-               Meat              0.06         0.28           -0.44
tional character and 19, a daring character. Furthermore,               Shellfish          -0.03         0.21            0.38
the number of users in this area is 2030. Then, although                 Sweet              0.07        -0.46           -0.38
most of these individuals had only one snack, some of them             Vegetable           -0.18        -0.17            0.26
took several ones. Figure 1 shows the total number of tapas            Traditional         -0.62        -0.15           0.24
that users consumed for the 37 possible choices. According           Log-Likelihood:      -13772       -36757          -1913.8
to the results, t22 and t61 were the most common choices.
However, t47 and t48 were rarely selected. According to the
information in Figure 1, only one tapa is made of eggs and          Table 1: Estimation by maximum likelihood of the
vegetables; two tapas has cheese as main ingredient; four           standard logit model coefficients for different areas
tapas are made of a sweet component or other; shellfish is          of the city. Significant coefficients are in black.
the ingredient of six snacks; meat and fish are the most com-
mon components with ten and nine tapas, respectively. Tapa
ratings that users gave to one consumed tapa are available                               New zone     Old zone     Outlying zone
too. The values of these ratings are 0, 1, 2, 3, 4 and 5. High           Cheese            -0.07        -0.24
values for ratings are associated to a high customer satisfac-            Egg              -2.48        0.31
tion. Means of tapa ratings for the 37 tapas in the new area              Fish             -0.46        -0.01            0.13
are shown in Figure 1. The lowest means of tapa ratings are               Meat             -0.07         0.27           -0.67
associated to t67, t70, t69 and t49. However, all of these              Shellfish          -0.03         0.21            0.37
means are greater than 3. So, the level of satisfaction tends            Sweet            -0.003        -0.46           -0.38
to be high.                                                            Vegetable           -0.18        -0.17            0.26
   As for the old zone of the city, a total of 8948 tapas were         Traditional         -0.93        -0.09           -0.01
consumed. As before, the number of different snacks associ-          Log-Likelihood:      -13631       -36680          -1897.9
ated to this zone is only 62; 32 of them present a traditional
character and 30, a daring character. Furthermore, the num-         Table 2: Estimation of the means for mixed logit
ber of users in this area is 3953. As before, although most         model coefficients assuming normal distribution for
of these individuals had only one snack, some of them took          different areas of the city. Significant coefficients are
several ones. Figures 2 and 3 show the total number of dar-         in black.
ing and traditional tapas that users consumed for the 62
possible choices, respectively. According to the results, t101         The coefficients obtained are shown for each area of the
was the most common choice. However, t37, t103 and t102             city in Tables 1 and 2, respectively, and most of them are
were rarely selected. As regards means of snack ratings, the        significant in the three areas of the city. For the mixed logit
            Choice model        UBCF          MF                             Choice model          UBCF                MF
                                Top 1                                                              Top 1
 Error      CV1     CV2     CV1     CV2   CV1     CV2               Error    CV1      CV2      CV1      CV2       CV1      CV2
   I        0.895   0.876     1       1     1       1                 I     0.955    0.954       1       1          1        1
  II        5.057   4.741   8.885 9.006   8.868   8.824              II     14.552   14.060   25.438 25.475      25.511   25.499
                                Top 5                                                              Top 5
      I     0.408   0.409     1       1     1       1                  I    0.795    0.789       1       1          1        1
     II     1.841   1.859   6.262 6.295   6.128   6.115               II    10.606   10.481   22.606 22.640      22.658   22.506

Table 3: Cross validation predictions errors for stan-             Table 4: Cross validation predictions errors for stan-
dard logit choice model, user-based collaborative fil-             dard logit choice model, user-based collaborative fil-
tering and matrix factorization algorithms in the                  tering and matrix factorization algorithms in the
outlying area of the city. Random and leave-one-                   new area of the city. Random and leave-one-out
out cross validation are denoted by CV1 and CV2 ,                  cross validation are denoted by CV1 and CV2 , re-
respectively. In this zone, the number of different                spectively. In this zone, the number of different
tapas to be recommended is 14.                                     tapas to be recommended is 37.

                                                                             Choice model          UBCF                MF
model (Table 2), only the mean estimations of Gaussian dis-                                        Top 1
tributions are shown. As for the utility, positive coefficients,    Error    CV1      CV2      CV1      CV2       CV1      CV2
                                                                      I     0.982    0.987       1       1          1        1
see egg and meat in Table 1 for the old zone, increase its           II     27.005   26.921   43.908 43.862      43.843   43.787
value. However, negative coefficients, see egg and traditional                                     Top 5
in Table 1 for the new area, reduce it.                                I    0.905    0.904       1       1          1        1
                                                                      II    23.141   23.097   41.013 40.964      40.945   40.970
4.3       Choice-based vs rating-based predictions
   The behaviour of choice-based and rating-based models           Table 5: Cross validation predictions errors for stan-
for recommending tapas in the three areas of the city was          dard logit choice model, user-based collaborative fil-
analyzed using random sub-sampling and leave-one-out cross         tering and matrix factorization algorithms in the old
validation from RECTUR dataset.                                    area of the city. Random and leave-one-out cross
   For random sub-sampling validation, 100 iterations were         validation are denoted by CV1 and CV2 , respec-
considered using the 25% of randomly selected individuals          tively. In this zone, the number of different tapas to
as test data for predictions. Therefore, in each iteration and     be recommended is 62.
once the 25% of decision-makers was randomly selected, the
rest of individuals is used as trainning data for rating-based
algorithms or for fitting the choice model. Then, for each         els are: (1) preferences are learnt from choices, (2) the choice
decision-maker in the test data and for each recommendation        set of each choice situation is included as a relevant variable
method, prediction error measures introduced in Section 3.5        to both explain and predict future choices, and (3) unob-
can be determined. The procedure for leave-one-out cross           served factors affecting the decision-making process are cap-
validation is similar. In this case, the number of iterations      tured through random variables. On the basis of these ele-
is equal to the number of users and, in each iteration, the        ments the models presented in this paper differ from both
test data contains an only decision-maker.                         collaborative methods, as they infer preferences from rat-
   Tables 3, 4, and 5 contain the empirical means of errors        ings, and content-based techniques, as they do not handle
decribed previously for the new, old and outlying areas of the     the choice set of the items experienced in the past. Recent
city, respectively. According to results shown in Section 4.2,     content-based approaches share the same idea about the util-
standard and mixed logit models provide similar estimations        ity of user choices to derive preferences but are limited to
for model coefficients. Therefore, only the first choice-based     pairwise rather than full choice set comparisons [3].
model, the standard logit one, were taken into account to             With regard to the limitations stated in the introduction,
be compared with the rating-based algorithms.                      choice models face issue L1 by building random utility mod-
   The results show that choice-based models offer a better        els from solid decision-making theories, and solve issue L2 by
performance (lower prediction errors) compared with rating-        using choices, rather than ratings, to estimate preferences.
based schemes (UBCF and MF). See, in particular, error II          The drawback of gathering information about the domain
for the top 5 scheme taking into account the different num-        (attributes and values) is compensated in two ways: (1) by
ber of tapas recommended in each area of the city. Further-        using more accurate data, choices rather than ratings, and
more, the accuracy of predictions is reduced as long as the        (2) by removing the burden of interrogating decision-makers
choice set increases from the outlying to the old area, which      about their post-experience satisfaction. In summary, choice
indicates the importance of the choice set and the choice          modelling seems to be a promising paradigm in the field of
situation.                                                         recommender systems.

5.        DISCUSSION                                               Acknowledgments
  The main point of this work is that the recommendation           This research was sponsored by EMALCSA/Coruña Smart
problem can be considered as a choice prediction problem.          City under grant CSC-14-13, the Ministry of Science and In-
This is the main difference of our proposal compared with          novation of Spain under grant TIN2014-56633-C3-1-R, and
current paradigms in recommender systems that focus on             the Ministry of Economy and Competitiveness of Spain un-
rating prediction. The key aspects of our choice-based mod-        der grant MTM2013-41383P.
6.   REFERENCES
 [1] T. A. Adomavicius G. Toward the next generation of
     recommender systems: a survey of the state-of-the-art
     and possible extensions. IEEE Trans. on Knowl. and
     Data Eng., 17(6):734–749, 2005.
 [2] M. Balabanović and Y. Shoham. Fab: content-based,
     collaborative recommendation. Communications of the
     ACM, 40(3):66–72, 1997.
 [3] L. Blédaité and F. Ricci. Pairwise preferences
     elicitation and exploitation for conversational
     collaborative filtering. In Proceedings of the 26th ACM
     Conference on Hypertext & Social Media, pages
     231–236. ACM, 2015.
 [4] J. S. Breese, D. Heckerman, and C. Kadie. Empirical
     analysis of predictive algorithms for collaborative
     filtering. In Proceedings of the Fourteenth conference
     on Uncertainty in artificial intelligence, pages 43–52.
     Morgan Kaufmann Publishers Inc., 1998.
 [5] G. M. Burke R., Felfernig A. Recommender systems:
     An overview. AI Magazine, 32(3):13–18, 2011.
 [6] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry.
     Using collaborative filtering to weave an information
     tapestry. Communications of the ACM, 35(12):61–70,
     1992.
 [7] Y. Koren. Factorization meets the neighborhood: a
     multifaceted collaborative filtering model. In
     Proceedings of the 14th ACM SIGKDD international
     conference on Knowledge discovery and data mining,
     pages 426–434. ACM, 2008.
 [8] D. McFadden et al. Conditional logit analysis of
     qualitative choice behavior. 1973.
 [9] V. H. Resnick P. Recommender systems.
     Communications of the ACM, 40(3):56–58, 1997.
[10] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.
     Application of dimensionality reduction in
     recommender system-a case study. Technical report,
     DTIC Document, 2000.
[11] A. Sen. Rational behaviour. In Utility and probability,
     pages 198–216. Springer, 1990.
[12] K. E. Train. Discrete choice methods with simulation.
     Cambridge university press, 2009.