Choice-based recommender systems Paula Saavedra Pablo Barreiro Roi Durán CITIUS CITIUS CITIUS University of Santiago de University of Santiago de University of Santiago de Compostela Compostela Compostela Santiago de Compostela, Santiago de Compostela, Santiago de Compostela, Spain Spain Spain paula.saavedra@usc.es pablobv70@gmail.com roiduram@gmail.com Rosa Crujeiras María Loureiro Eduardo Sánchez Vila School of Mathematics School of Business CITIUS University of Santiago de University of Santiago de University of Santiago de Compostela Compostela Compostela Santiago de Compostela, Santiago de Compostela, Santiago de Compostela, Spain Spain Spain rosa.crujeiras@usc.es maria.loureiro@usc.es eduardo.sanchez.vila@usc.es ABSTRACT item’s attributes [2]. These preferences can be used to pre- Choice-based models are proposed to overcome some of the dict the utility of any given item by comparing them with the limitations found in traditional rating-based strategies. The values of item’s attributes. Collaborative recommenders, on new approach is grounded on decision-making paradigms, the other hand, take advantage of previous ratings provided such as choice and utility theories. Specifically, random by the available decision-makers to predict the utility of any utility models were applied in a recommendation problem. given user-item pair [6]. This approach has been widely Prediction accuracy was compared with state-of-art rating- adopted as it removes the burden of knowing and managing based algorithms in a gastronomy dataset. The results show item attributes as well as their corresponding values. the superior performance of choice-based models, which may Many algorithms and models have been proposed under suggest that real choices could bring more predictive power the collaborative paradigm. Among them, two families have than ratings. gained major attraction: neighborhood algorithms and la- tent factor models. The neighborhood approach was the first to implement to collaborative concept and became the CCS Concepts reference model in this research area [9, 4]. The method con- •Information systems → Collaborative filtering; So- sists on representing vectors of ratings on either the decision- cial recommendation; maker or item space. The distance between any pair of these vectors determine the similarity between either the decision- Keywords makers or the items that these vectors represent. Individuals with similar rating’s vectors are considered to possess similar Choice models; Random Utility Models; Logit probabilities; tastes or preferences, while items are considered to have sim- Tourism ilar attributes. The latent factor strategy, in turn, attempts to explain ratings by means of characterizing both users and 1. INTRODUCTION items with a limited set of factors. Factors are considered Recommender systems are personalization tools aimed at unknown variables that can be inferred from the ratings de- suggesting relevant items on the basis of available informa- clared by the users. The inference or learning problem can tion on items as well as decision-makers [5]. Broadly speak- be solved with factorization techniques. The classical fac- ing, recommenders can be classified in two different cate- torization method is called Singular Value Decomposition gories. Content-based recommenders generate a profile for (SVD) and was applied successfully to identify and reduce each decision-maker by considering items experienced in the the number of relevant factors [10]. However, the method past. The profile typically represents the preferences of the requires complete knowledge of the rating matrix and fill-in decision-maker, i.e the taste of the decision-maker on each methods to populate sparse rating matrix come at a cost of inaccurate factor learning. Recently, new factorization tech- niques have been successfully developed that are capable of learning the factors from sparse rating matrices [7]. Each rating is explained by means of two vectors whose dimen- sions correspond with the set of latent factors. The first vector represents the item in terms of its degree of posse- sion of each factor, while the second vector represent the Copyright held by the author(s). RecTour 2016 - Workshop on Recommenders in Tourism held in conjunc- decision-maker on the basis of her preference on each factor. tion with the 10th ACM Conference on Recommender Systems (RecSys), These item and decision-maker vectors constitute a pair of September 15, 2016, Boston, MA, USA. matrices whose values have to be inferred. The learning problem is solved by means of minimizing the regularized CR(A, ) = {a0 ∈ A k a0  a, ∀a ∈ A} (2) error on the set of known ratings. Despite the success of current recommender systems, the where CR stands for ”choice rule” and the  operator de- experience with state-or-art approaches reveal some impor- notes the relationship ”preferred to, or at least as preferred tant limitations. First, the degree of performance of a recom- as”. Basically, it means that the chosen alternative will be mender algorithm depends on the specific issues of the prob- the one from which the decision-maker shows a higher pref- lem at hand. Therefore, heuristic models and trial-and-error erence. The preference operator needs to be quantified to methodologies are often used to look for the best solution allow a numerical comparison between the alternatives. for any given situation. The problem may be approached The utility theory comes to the rescue to solve this prob- in a more theoretical and consistent way if recommenders lem. One of the axioms of this theory states that it is pos- were considered as agents predicting the decisions taken by sible to define a utility function such that: decision-makers. Under this scope, the first limitation could a  b ⇐⇒ U (a) ≥ U (b). (3) be stated as follows: (L1) Current state-of-art approaches are mostly based on heuristic models rather than decision- And then, the choice rule in equation 2 can be represented making theories. Second, some popular paradigms assume in terms of the utility function and a numerical operator: a direct relationship between preferences and ratings: (1) CR(A, ≥) = {a0 ∈ A k U (a0 ) ≥ U (a), ∀a ∈ A}. (4) the neighborhood approach considers that decision-makers with similar ratings on a set of items will have similar prefer- It is now clear that the new choice rule is mathemati- ences, and (2) factorization techniques assume that ratings cally equivalent to the recommendation problem described can be the result of a product between item’s latent factors in equation 1: and decision-maker preferences about that factors. In these paradigms unobserved preferences are usually inferred from a0 = arg max U (c, a) ⇐⇒ a∈A observed ratings. The issue here comes from the fact that ratings could be mostly explained by variables different to CR(A, ≥) = {a0 ∈ A k U (a0 ) ≥ U (a), ∀a ∈ A}. (5) preferences. The quality of the item, the user-item context, As the recommendation problem can be understood as a and in general any factor involved during the process of ex- choice prediction problem, then the powerful models and periencing the item, they all could provide more explanatory techniques developed in this field can be naturally applied power about ratings than preferences do. Therefore, the sec- to generate recommendations. ond limitation could be described as follows: (L2) Prefer- ences are usually derived from ratings without any support- 2.2 Choice models with random utility ing evidence about the relationship between these variables. The choice rule models how decision-makers take their This work proposes choice-based recommender systems to decisions. However, the problem of predicting such deci- overcome these limitations. The concept is grounded on sions is a different task. In real problems the researcher choice and utility theory, where real choices replace ratings does not have access to all the factors and variables that as the key data to learn the decision-maker’s preferences as decision-makers include to estimate utilities. For a concrete well as to make recommendations. The proposed models are individual cn , the researcher only knows some attributes then evaluated in the tourism domain with a gastronomy of the alternatives, labeled xj for all aj alternatives with dataset that includes both choices and ratings. In what j ∈ {1, · · · , J}, and some attributes of the decision-maker, follows, the choice-based models are presented, the meth- labeled zn . A function that relates these observed factors to ods are described, and the models evaluated and compared the decision-maker’s utility can be specified. This function against state-of-art rating-based algorithms. The discussion is denoted by Vnj = V (xj , zn ) and it is often called repre- will comment on the results and highlight the major contri- sentative utility. It usually depends on parameters that are butions of the paper. unknown and, therefore, they must be estimated. Since there are aspects of utility that the researcher does 2. CHOICE MODELS not or cannot observe, Vnj 6= Unj . Therefore, the utility can be decomposed as: 2.1 Recommendation as a choice problem Unj = Vnj + nj (6) The recommendation problem can be described as an op- timization problem which consists on (1) estimating the util- where nj captures the unknown factors that modify the ity of each item a ∈ A, the available item set, for any given utility and are not included in Vnj . This decomposition is decision-maker c, and (2) choosing the item a0 that maxi- fully general, since nj is defined as simply the difference mizes U (c, a), the decision-maker utility on any item a [1]: between true utility Unj and the part of utility that the researcher captures in Vnj . Given its definition, the charac- a0 = arg max U (c, a) (1) teristics of nj , such as its distribution, depend critically on a∈A the researcher’s specification of Vnj . The researcher does not It is worth noting that this problem is conceptually the know nj for all j and therefore these terms are considered same as the one faced by the Rational Choice Theory, which random variables that allow the researcher to make proba- aims at explaining economic behaviour under choice situa- bilistic statements about the decision-maker’s choice. The tions [11]. The theory states that a decision-maker will max- models derived under this assumptions are called random imize her utility after satisfying some budget constraints. utility models (RUM) [8]. More formally, the decision-maker will choose alternative a0 Now, the choice rule of equation 4, which is deterministic from a choice set A according to the following rule: under the decision-maker perpective, becomes probabilistic under the perspective of the researcher. Then the rule for a choosing the alternative that she was observed actually to decision-maker cn choosing alternative ai is: choose is YY y CR(A, ≥) = {ai ∈ A k Pi ≥ Pj , ∀aj ∈ A} (7) L(β) = Pnini n i and the probability Pi is estimated as follows: where β denotes the vector of all model parameters. There- P(Uni > Unj for all j 6= i) = fore, the log-likelihood function is P(nj − ni < Vni − Vnj for all j 6= i). (8) XX LL(β) = yni log Pni If the joint density of n = (n1 , ..., nJ ) is denoted by f , this n i cumulative probability can be rewritten as: and the estimator is the value of β that maximizes this func- tion. Importantly, it was proved that the log-likelihood func- Z Pni = I(nj − ni < Vni − Vnj for all j 6= i)f (n )dn (9) tion with these choice probabilities is globally concave in  parameters β, which helps in the numerical maximization where I is the indicator function, equaling 1 when the term procedures, see [8] for more details. in parentheses is true and 0 otherwise. This is a multidimen- A well-known issue of standard logit model deals with sional integral over the density of the unobserved portion of capturing the heterogeneity of population [12]. The impor- utility, f (n ). Different choice models are obtained from dif- tance that decision-makers place on each attribute of the ferent specifications of this density, that is, from different as- possible choices varies, in general, over decision-makers. Al- sumptions about the distribution of the unobserved portion though logit model is able to represent the taste variation of utility. In addition, the choice of the density determines related to observed characteristics of the decision-maker, it whether the integral takes a closed form or not [12]. can not represent differences in tastes that can not be linked to observed characteristics. Therefore, if taste variation is at 2.3 Standard and mixed logit models least partly random, a logit model with random parameters The simplest and most widely used choice model is the should be considered instead. Under this considerations, β standard logit model [8]. It is derived under the assumption is now a vector of random coefficients and these coefficients that the each unobserved portion of utility nj is distributed vary over decision-makers in the population with density g. independently, identically extreme value. In this case, f In most applications that have actually been called mixed denotes the density for Gumbel distribution: logit, g is specified to be continuous. For example, it can −nj be specified to be normal, lognormal, uniform, triangular f (nj ) = e−nj e−e . (10) or, even, gamma. Therefore, this density is a function of Following [8], the logit choice probability that decision-maker parameters θ that represent, in the gaussian case, the mean cn chooses alternative i is and covariance of the random coefficient in the population. Then, the choice probabilities can be written as: eVni Pni = P Vnj . (11) ! je eVni (β) Z Pni = P V (β) g(β|θ)dβ. (12) je nj This model presents a clear interpretation. According to equation 11, if Vni rises, reflecting a matching between the Since the previous integral has not a closed form, it must observed attributes of the alternative and the preferences be evaluated numerically through simulation. Once the re- of the decision-maker, with Vnj for all j 6= i held constant, searcher specifies a distribution g for the coefficients, the pa- Pni approaches one. And Pni approaches zero when Vni rameters θ maximizing the simulated log-likelihood must be decreases, since the exponential in the numerator approaches estimated. Then, R draws of the coefficients are taken from zero as Vni approaches −∞. g and the logit probabilities are computed for every draw. The representative utility is usually specified to be linear The unconditional probability in equation 12, that is the ex- in the set alternative’s attributes: Vnj = βnj · xj , where xj pected value of the conditional probabilities, is estimated as is a vector containing, as before, the observed variables of the average of R probabilities determined previously. the alternative aj , and βnj denotes the model coefficients vector which describes the preferences of decision-maker cn on the attributes of the alternatives aj . The preferences βnj 3. METHODS (model coefficients) are estimated by fitting equation 11 to The performance of choice-based models is compared with a dataset of choices. Moreover, since the logit probabilities a choice of relevant rating-based algorithms from a gastro- take a closed form, maximum likelihood procedures are ap- nomic dataset containing the choices of snacks made by a plied for estimation. Concretely, the probability of person set of decision-makers and their corresponding tapa ratings. cn choosing the alternative that he was actually observed to The dataset is described in Sections 3.1 and 3.2. Technical choose can be expressed as details on the two recommendation alternatives considered Y y in this work are briefly presented in Sections 3.3 and 3.4. Fi- Pnini , nally, the error criteria used to compare them are introduced i in Section 3.5. where yni = 1 if the individual choses i and zero otherwise. Since yni = 0 for non-chosen alternatives and Pni raised to 3.1 Experiment the power of zero is 1, this term is simply the probability In the context of the RECTUR project, an experiment of the chosen alternative. Assuming that decision-maker’s was carried out with real users in the context of Santi- choices are independent, the probability of each individual ago(é)Tapas, a gastronomic context that takes place every year in Santiago de Compostela. In 2011 the fourth edi- binary variables associated to each alternative (or snack) tion was held with a total of 56 participating restaurants were generated for fitting these two models. Next, the con- proposing and elaborating up to three tapas that were sold struction of the variables is briefly described through an ex- at a price of 2 euro. The experiment was designed to gather ample. The choice set associated to the old area contains, as relevant data while preserving the spirit of the contest. Par- possible choices, the set of tapas distributed in restaurants ticipants were local users as well as Spanish and interna- of this zone. For each one of these snacks, the dichoto- tional tourists. A TapasPassport with the official informa- mous variables cheese, egg, fish, meat, vegetable, shellfish tion about the contest was made available to all partici- and traditional are generated. According to Figure 2, the pants. It contained: (i) the contest guidelines and other main ingredient of t100 is meat. However, this tapa is not related information to the participants, (ii) restaurants lo- traditional. Therefore, only the variable meat will be equal cation, (iii) the tapas offered on each restaurant, (iv) an to 1. The rest of variables associated to t100 will take the official seal to demonstrate that a participant has visited value zero. the minimum number of restaurants required to obtain con- Within the discrete choice framework, the set of alterna- testś gifts. Restaurant staff had to sign the TapasPassport tives known as the choice set must verify three properties. to certify that its owners have visited the place. It has to be finite, exhaustive (the decision-maker always After consuming a tapa, participants were asked to evalu- chooses one of the alternatives) and mutually exclusive (the ate their experience. Users had to provide two ratings rang- choice of one alternative necessarily implies not choosing ing from 0 to 5: (i) a rating of the tapa, and (ii) a rating of any of the other ones). Due to the last property, three dif- the overall experience (service, place atmosphere, etc.). In ferent choice subsets were established in this work. They addition, they were informed about our research experiment correspond to the three possible restaurant locations (old, and asked to extend their feedback providing information new and outlying areas of the city). Therefore, standard about the temporal and social context in which the experi- and mixed logit models are estimated separately from these ence took place. three choice subsets that contain only the tapas associated to each zone. This assumption could be less general. For in- 3.2 RECTUR Dataset stance, considering the set of tapas of a concrete restaurant The data gathered in the experiment was collected in the would provide a new choice set and, as consequence, a new RECTUR dataset. It is assumed that the choice of a tapa choice problem. depends on the user preferences about the levels of tapa at- Estimations results for these six models are shown in Sec- tributes, which will in turn depend on the user attributes tion 4.2. For the same area of the city, standard and mixed and context elements. The consumption of a tapa deter- logit models present similar estimations for the coefficients. mines a choice from a choice set and will elicit a satisfaction As consequence, only prediction accuracy of the standard response quantified as a user rating. logit model was compared with rating-based algorithms. For each tapa, we gathered the following attributes: 3.4 Baselines: Rating-based models • Choice sets. Different choice sets could be defined for The proposed choice-based models were compared with each choice. We acquired information about the fol- two popular rating-based models: User-based collaborative lowing sets: filtering (UBCF) and matrix factorization (MF). User-based – Set of tapas in the same area of the city (outlying, collaborative filtering assumes that individuals with similar new or old zone). preferences will rate items in a similar way. Then, miss- ing ratings for a concrete user cn could be predicted find- – Set of tapas in the same restaurant. ing a neighborhood N (n) of similar users and aggregating • Tapa attributes. The gathered attributes are: their ratings to calculate the corresponding prediction. The concept of similarity between users is used for defining this – Type: Cheese, egg, fish, meat, vegetable, shellfish neighborhood given all users within a similarity threshold. and other. The main ingredient defined the type In this work, the cosine similarity measure is taken into ac- of the tapa. count and |N (n)| was fixed equal to 25. For an item i and – Character: Traditional or daring. Traditional tapas an individual cn , the ratings predicted, r̂ni , can be written are those that follow popular well-known recipes, as while daring tapas are creative and provide inno- 1 X vative recipes. r̂ni = rji |N (n)| j∈N (n) – Restaurant. The restaurant that offers the tapa was also categorized in terms of its location (out- where | | denotes the cardinal of N (n). lying, new or old area), atmosphere and style. Matrix factorization, on the other hand, characterizes both items and users by vectors of factors inferred from item rat- • Rating. The rating provided by each consumer. ing patterns. For a given item i and a user cn , the vector qi measure the extent to which the item possesses those fac- 3.3 Choice-based models tors and the vector pn , the extent of interest the user has The standard logit model as well as the mixed logit model in items that are high on the corresponding factors. The assuming Gaussian distribution on the coefficients, both de- dot product qiT pn captures the user’s interest in the item’s scribed in Section 2.3, were chosen as basic representatives of characteristics. This approximates user cn ’s rating of item the family of random utility choice-based models to be com- i, rni , leading to the estimate pared with rating-based algorithms. From attributes type and character of each tapa described in Section 3.2, eight r̂ni = qiT pn . Daring Other, 3.9 250 Shellfish, 3.9 Traditional Maximum and minimum rating means Meat, 4.1 200 Sweet, 4.3 Shellfish, 3.7 Meat, 4 Fish, 4.2 Shellfish, 4.1 Fish, 4.1 Number of tapas consumed Vegetable, 4.2 Meat, 4.4 150 Other, 4.3 Fish, 3.9 Other, 4.5 Sweet, 4.4 Meat, 4.5 Meat, 4.3 Sweet, 4.3 Meat, 3.4 Meat, 4.2 Cheese, 3.8 Shellfish, 3.7 Fish, 3.8 Shellfish, 3.4 Shellfish, 3.5 Fish, 4.1 100 Meat, 3.9 Meat, 3.3 Cheese, 4.2 Other, 4 Meat, 3.8 Fish, 4.1 Sweet, 4.5 Fish, 4 Fish, 3.8 50 Fish, 4.2 Egg, 3.8 0 t107 t111 t112 t113 t12 t15 t19 t22 t23 t24 t47 t48 t49 t50 t51 t52 t53 t54 t59 t6 t60 t61 t62 t63 t66 t67 t68 t69 t7 t70 t8 t88 t89 t90 t91 t92 t93 Tapas Figure 1: Bar plot for number of different tapas consumed, main ingredient and mean of users’ ratings in the new zone of the city. 500 Shellfish, 4.5 Maximum and minimum rating means 400 Meat, 4.2 Shellfish, 3.9 Number of tapas consumed Shellfish, 4.6 300 Vegetable, 4.2 Other, 3.3 Shellfish, 4.2 Shellfish, 3.9 Meat, 3 Meat, 4.4 Egg, 4.3 200 Egg, 4.5 Meat, 3.5 Shellfish, 4.4 Shellfish, 3.9 Shellfish, 4.6 Sweet, 4.7 Fish, 4.5 Sweet, 4.5 Fish, 4.3 Shellfish, 3.2 Meat, 4.1 Other, 4 Fish, 3.8 Sweet, 3.3 Meat, 3.6 100 Fish, 4.1 Other, 3.4 Other, 3.2 Sweet, 4.2 0 t100 t101 t108 t109 t14 t16 t17 t18 t25 t27 t28 t29 t37 t40 t41 t5 t55 t56 t77 t78 t79 t82 t83 t84 t85 t87 t94 t97 t98 t99 Daring tapas Figure 2: Bar plot for number of different daring tapas consumed, main ingredient and mean of users’ ratings in the old zone of the city. Therefore, the challenge is computing the mapping of each k k is the Euclidean norm and λ denotes a constant control- item and user to vectors qi and pn . Here, singular value ling the extent of regularization. In this work, λ = 1.5. decomposition will be applied factoring the user-item rating matrix that could be sparse. In order to learn the factor 3.5 Evaluation vectors (pn and qi ), the regularized squared error on the set Classical ranking error metrics could not be applied mainly of known ratings is minimized: because of the lack of information about all the relevant X tapas for the decision-maker on any choice situation. There- min (rni − qiT pn )2 + λ(kqi k2 + kpn k2 ) q∗,p∗ fore, two error metrics are proposed in order to compare (u,i)∈K the behaviour of choice-based and rating-based algorithms. where K is the set of the (cn , i) pairs for which rni is known, The metrics are described considering that only the tapa 400 Maximum and minimum rating means Fish, 4.3 Meat, 3.9 Other, 3.9 Meat, 3.6 300 Shellfish, 3.8 Egg, 3.7 Number of tapas consumed Meat, 4 Shellfish, 4.1 Meat, 3.8 Shellfish, 4.5 200 Meat, 3.9 Shellfish, 4.2 Meat, 3.5 Fish, 3.7 Meat, 3.7 Other, 3.7 Meat, 3.7 Fish, 4.1 Egg, 3.7 Cheese, 3.5 Shellfish, 3.6 Other, 3.6 Shellfish, 3.4 Shellfish, 3.4 Meat, 3.8 Fish, 2.9 Vegetable, 3.1 Meat, 3.1 100 Shellfish, 3.2 Shellfish, 4.1 Vegetable, 3.3 Fish, 3.7 0 t10 t102 t103 t11 t110 t20 t21 t26 t30 t31 t32 t33 t34 t35 t36 t38 t39 t4 t42 t43 t64 t65 t71 t72 t73 t74 t75 t76 t80 t81 t86 t9 Traditional tapas Figure 3: Bar plot for number of different traditional tapas consumed, main ingredient and mean of users’ ratings in the old zone of the city. 140 Daring 120 Traditional Maximum and minimum rating means Vegetable, 3.7 Shellfish, 4.6 Fish, 4.8 Shellfish, 4.2 100 Fish, 4.4 Number of tapas consumed Missing ingredient, 3.3 80 Other, 4.2 Sweet, 4.5 60 Meat, 4.4 Fish, 3.5 Fish, 3.9 Meat, 3.6 Meat, 4.3 Sweet, 4.3 40 20 0 t1 t104 t105 t106 t13 t2 t3 t44 t45 t46 t57 t58 t95 t96 Tapas Figure 4: Bar plot for number of different tapas consumed, main ingredient and mean of users’ ratings in the outlying zone of the city. with the highest associated rating or probability is recom- mended/predicted (top 1). Error I is equal to one if the item predicted does not coincide with the true alternative chosen The second measure of error, error II, is equal to the po- sition of the real choice in the ordered list of recommen- by the individual and zero otherwise. Therefore, given an individual cn , the true choice i and the recommended item dation minus one. Therefore, if the item recommended is j is: equal to the chosen one then the error is equal to zero. Let  (i1 , ..., ik , ..., iJ ) be the list of ordered items to be recom- 1 : if i 6= j mended, the error for the user cn with true choice i can be error I (cn , i) = 0 : otherwise. written as: lowest ones correspond to t21 and t94. The highest ones, to t11 and t99. error II (cn , i) = k − 1 if ik 6= i. The number of snacks consumed in the outlying area of the For instance, if one decision-maker cn chose the snack t1 city is 743. Again, the number of different snacks associated among the snacks (t1, t104, t105, · · · ) and the prediction to this zone is smaller. Concretely, it is equal to 14; 3 of them (ordered according to the highest ratings or probabilities) present a traditional character and 11, a daring character. is equal to (t105, t104, t1, · · · ), then error I (cn , t1) = 1. Furthermore, the number of users in this area is 436. Figure However, error II (cn , t1) = 2. 4 shows the total number of daring and traditional tapas Error I and error II can be generalized easily if a list of a that users consumed for the 14 choices. According to the concrete number of ordered items (in terms of probabilities results, t44, t45, t104 and t105 were the most chosen snacks. or ratings) is recommended instead of recommending only However, t2 and t3 were rarely selected. The snacks t58 and one alternative. These two errors are equal to zero if, for an t44 correspond to the tapas with lowest and highest means individual cn , the true choice i belongs to the recommended of ratings, respectively. The main ingredient of t58 is a list of items. Otherwise, error I will take the value one and missing value. In addition, cheese and egg are not the main error II, the position of the true choice i in the ordered list component for any snack. of non-recommended alternatives. In this work, a list of five alternatives will be considered (top 5). 4.2 Choice models fitting The standard and mixed logit models have been fitted 4. RESULTS from the three choice sets described in Section 4.1. Due to the price is the same for every snack, the determinants of 4.1 Data description these choices, xj , are eight dichotomous alternative specific variables. Seven of them indicate the main component of RECTUR dataset presented in Section 3.2 deals with 5517 each tapa: Cheese, egg, fish, meat, shellfish, sweet and veg- individuals, that make one or a sequential choices of one tapa etable. The eighth variable takes value equal to one when among a set of 113 tapas distributed in Santiago de Com- the snack has a traditional character. In addition, for mixed postela. Acording to comments in Section 3.3, three subsets logit model, Gaussian distribution was assumed on the co- of the original dataset will be considered distinguishing three efficients and R = 100 was fixed. different choice contexts or, equivalently, three zones of the city. New zone Old zone Outlying zone Next, the three scenarios will be briefly described. Cheese -0.07 -0.25 The total number of tapas consumed in new area of the Egg -2.48 0.31 city is 3888. However, the number of different tapas asso- Fish -0.46 -0.02 0.14 ciated to this zone is only 37; 18 of them present a tradi- Meat 0.06 0.28 -0.44 tional character and 19, a daring character. Furthermore, Shellfish -0.03 0.21 0.38 the number of users in this area is 2030. Then, although Sweet 0.07 -0.46 -0.38 most of these individuals had only one snack, some of them Vegetable -0.18 -0.17 0.26 took several ones. Figure 1 shows the total number of tapas Traditional -0.62 -0.15 0.24 that users consumed for the 37 possible choices. According Log-Likelihood: -13772 -36757 -1913.8 to the results, t22 and t61 were the most common choices. However, t47 and t48 were rarely selected. According to the information in Figure 1, only one tapa is made of eggs and Table 1: Estimation by maximum likelihood of the vegetables; two tapas has cheese as main ingredient; four standard logit model coefficients for different areas tapas are made of a sweet component or other; shellfish is of the city. Significant coefficients are in black. the ingredient of six snacks; meat and fish are the most com- mon components with ten and nine tapas, respectively. Tapa ratings that users gave to one consumed tapa are available New zone Old zone Outlying zone too. The values of these ratings are 0, 1, 2, 3, 4 and 5. High Cheese -0.07 -0.24 values for ratings are associated to a high customer satisfac- Egg -2.48 0.31 tion. Means of tapa ratings for the 37 tapas in the new area Fish -0.46 -0.01 0.13 are shown in Figure 1. The lowest means of tapa ratings are Meat -0.07 0.27 -0.67 associated to t67, t70, t69 and t49. However, all of these Shellfish -0.03 0.21 0.37 means are greater than 3. So, the level of satisfaction tends Sweet -0.003 -0.46 -0.38 to be high. Vegetable -0.18 -0.17 0.26 As for the old zone of the city, a total of 8948 tapas were Traditional -0.93 -0.09 -0.01 consumed. As before, the number of different snacks associ- Log-Likelihood: -13631 -36680 -1897.9 ated to this zone is only 62; 32 of them present a traditional character and 30, a daring character. Furthermore, the num- Table 2: Estimation of the means for mixed logit ber of users in this area is 3953. As before, although most model coefficients assuming normal distribution for of these individuals had only one snack, some of them took different areas of the city. Significant coefficients are several ones. Figures 2 and 3 show the total number of dar- in black. ing and traditional tapas that users consumed for the 62 possible choices, respectively. According to the results, t101 The coefficients obtained are shown for each area of the was the most common choice. However, t37, t103 and t102 city in Tables 1 and 2, respectively, and most of them are were rarely selected. As regards means of snack ratings, the significant in the three areas of the city. For the mixed logit Choice model UBCF MF Choice model UBCF MF Top 1 Top 1 Error CV1 CV2 CV1 CV2 CV1 CV2 Error CV1 CV2 CV1 CV2 CV1 CV2 I 0.895 0.876 1 1 1 1 I 0.955 0.954 1 1 1 1 II 5.057 4.741 8.885 9.006 8.868 8.824 II 14.552 14.060 25.438 25.475 25.511 25.499 Top 5 Top 5 I 0.408 0.409 1 1 1 1 I 0.795 0.789 1 1 1 1 II 1.841 1.859 6.262 6.295 6.128 6.115 II 10.606 10.481 22.606 22.640 22.658 22.506 Table 3: Cross validation predictions errors for stan- Table 4: Cross validation predictions errors for stan- dard logit choice model, user-based collaborative fil- dard logit choice model, user-based collaborative fil- tering and matrix factorization algorithms in the tering and matrix factorization algorithms in the outlying area of the city. Random and leave-one- new area of the city. Random and leave-one-out out cross validation are denoted by CV1 and CV2 , cross validation are denoted by CV1 and CV2 , re- respectively. In this zone, the number of different spectively. In this zone, the number of different tapas to be recommended is 14. tapas to be recommended is 37. Choice model UBCF MF model (Table 2), only the mean estimations of Gaussian dis- Top 1 tributions are shown. As for the utility, positive coefficients, Error CV1 CV2 CV1 CV2 CV1 CV2 I 0.982 0.987 1 1 1 1 see egg and meat in Table 1 for the old zone, increase its II 27.005 26.921 43.908 43.862 43.843 43.787 value. However, negative coefficients, see egg and traditional Top 5 in Table 1 for the new area, reduce it. I 0.905 0.904 1 1 1 1 II 23.141 23.097 41.013 40.964 40.945 40.970 4.3 Choice-based vs rating-based predictions The behaviour of choice-based and rating-based models Table 5: Cross validation predictions errors for stan- for recommending tapas in the three areas of the city was dard logit choice model, user-based collaborative fil- analyzed using random sub-sampling and leave-one-out cross tering and matrix factorization algorithms in the old validation from RECTUR dataset. area of the city. Random and leave-one-out cross For random sub-sampling validation, 100 iterations were validation are denoted by CV1 and CV2 , respec- considered using the 25% of randomly selected individuals tively. In this zone, the number of different tapas to as test data for predictions. Therefore, in each iteration and be recommended is 62. once the 25% of decision-makers was randomly selected, the rest of individuals is used as trainning data for rating-based algorithms or for fitting the choice model. Then, for each els are: (1) preferences are learnt from choices, (2) the choice decision-maker in the test data and for each recommendation set of each choice situation is included as a relevant variable method, prediction error measures introduced in Section 3.5 to both explain and predict future choices, and (3) unob- can be determined. The procedure for leave-one-out cross served factors affecting the decision-making process are cap- validation is similar. In this case, the number of iterations tured through random variables. On the basis of these ele- is equal to the number of users and, in each iteration, the ments the models presented in this paper differ from both test data contains an only decision-maker. collaborative methods, as they infer preferences from rat- Tables 3, 4, and 5 contain the empirical means of errors ings, and content-based techniques, as they do not handle decribed previously for the new, old and outlying areas of the the choice set of the items experienced in the past. Recent city, respectively. According to results shown in Section 4.2, content-based approaches share the same idea about the util- standard and mixed logit models provide similar estimations ity of user choices to derive preferences but are limited to for model coefficients. Therefore, only the first choice-based pairwise rather than full choice set comparisons [3]. model, the standard logit one, were taken into account to With regard to the limitations stated in the introduction, be compared with the rating-based algorithms. choice models face issue L1 by building random utility mod- The results show that choice-based models offer a better els from solid decision-making theories, and solve issue L2 by performance (lower prediction errors) compared with rating- using choices, rather than ratings, to estimate preferences. based schemes (UBCF and MF). See, in particular, error II The drawback of gathering information about the domain for the top 5 scheme taking into account the different num- (attributes and values) is compensated in two ways: (1) by ber of tapas recommended in each area of the city. Further- using more accurate data, choices rather than ratings, and more, the accuracy of predictions is reduced as long as the (2) by removing the burden of interrogating decision-makers choice set increases from the outlying to the old area, which about their post-experience satisfaction. In summary, choice indicates the importance of the choice set and the choice modelling seems to be a promising paradigm in the field of situation. recommender systems. 5. DISCUSSION Acknowledgments The main point of this work is that the recommendation This research was sponsored by EMALCSA/Coruña Smart problem can be considered as a choice prediction problem. City under grant CSC-14-13, the Ministry of Science and In- This is the main difference of our proposal compared with novation of Spain under grant TIN2014-56633-C3-1-R, and current paradigms in recommender systems that focus on the Ministry of Economy and Competitiveness of Spain un- rating prediction. The key aspects of our choice-based mod- der grant MTM2013-41383P. 6. REFERENCES [1] T. A. Adomavicius G. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng., 17(6):734–749, 2005. [2] M. Balabanović and Y. Shoham. Fab: content-based, collaborative recommendation. Communications of the ACM, 40(3):66–72, 1997. [3] L. Blédaité and F. Ricci. Pairwise preferences elicitation and exploitation for conversational collaborative filtering. In Proceedings of the 26th ACM Conference on Hypertext & Social Media, pages 231–236. ACM, 2015. [4] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pages 43–52. Morgan Kaufmann Publishers Inc., 1998. [5] G. M. Burke R., Felfernig A. Recommender systems: An overview. AI Magazine, 32(3):13–18, 2011. [6] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61–70, 1992. [7] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426–434. ACM, 2008. [8] D. McFadden et al. Conditional logit analysis of qualitative choice behavior. 1973. [9] V. H. Resnick P. Recommender systems. Communications of the ACM, 40(3):56–58, 1997. [10] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in recommender system-a case study. Technical report, DTIC Document, 2000. [11] A. Sen. Rational behaviour. In Utility and probability, pages 198–216. Springer, 1990. [12] K. E. Train. Discrete choice methods with simulation. Cambridge university press, 2009.